Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
KDE Bug Software IT Linux Technology

Too Perfect a Mirror 192

Carewolf writes "Jeff Mitchell writes on his blog about what almost became 'The Great KDE Disaster Of 2013.' It all started as simple update of the root git server and ended up with a corrupt git repository automatically mirrored to every mirror and deleting every copy of most KDE repositories. It ends by discussing what the problem is with git --mirror and how you can avoid similar problems in the future."
This discussion has been archived. No new comments can be posted.

Too Perfect a Mirror

Comments Filter:
  • by gweihir ( 88907 ) on Sunday March 24, 2013 @09:24AM (#43262511)

    Preferably, before using them? This sounds very much like plain old incompetence, possibly coupled with plain old arrogance. Thinking that using a version control system does absolve one from making backups is just plain old stupid. Then, with what I have seen from the KDE project, that would be consistent.

  • Not git related (Score:5, Insightful)

    by Rob Kaper ( 5960 ) on Sunday March 24, 2013 @09:24AM (#43262513) Homepage

    This is not a problem with git --mirror: rsync or any other mirroring tool would end up in the same situation.

    It's up to the master to deliver the goods and upgrading a master should include performing a test run as well as making a backup prior to the real upgrade. This was a procedural failure, not a software failure. But good to hear disaster was averted.

  • by maxwell demon ( 590494 ) on Sunday March 24, 2013 @09:36AM (#43262559) Journal

    Also, mirrors are not backups. Mirrors are intended to be identical to the original, so mirroring worked as expected. How should the software know that the removal of most repositories was not intentional?

  • Re:Not git related (Score:4, Insightful)

    by Carewolf ( 581105 ) on Sunday March 24, 2013 @09:44AM (#43262595) Homepage

    True, but git does have a mechanism for checking integrity, and the discussion here is where you should use the fast git --mirror which has no checks, and where the slower mechanism which does fits in.

  • No backups?! (Score:5, Insightful)

    by Blymie ( 231220 ) on Sunday March 24, 2013 @09:45AM (#43262597)

    Good grief!

    After all of that, not a single proposed solution is a proper, rotational backup.

    This is what rotational backups are FOR. They let you go back months in time, and even do post-corruption, or post-cracking examination of the machine that went down!

    Backups do *not* need to be done to tape, but a mirror or a raid card is NOT a backup. This is actually simple, simple stuff, and it seems like the admins at KDE are a bit wet behind the ears, in terms of backups.

    They probably think that because backups used to mean tape, that's old tech, and no one does that.

    Not so! Many organizations I admin, and many others I know of, simply do off-site rotational backups using rsync + rotation scripts. This is the key part, copies of the data as it changes over time. You *never* overwrite your backups, EVER.

    And with proper rotational backups, only the changed data is backed up, so the daily backup size is not as large as you might think. I doubt the entire KDE git tree changes by even 0.1% every day.

    Rotational backups -- works like a charm, would completely prevent any concern or issue with a problem like this, and IT IS WHAT YOU NEED TO BE DOING, ALWAYS!

  • by gweihir ( 88907 ) on Sunday March 24, 2013 @10:04AM (#43262701)

    No. Backup is out of scope for version control. Anybody with actual common sense would not expect it to make backups "magically" by itself and check to make sure. Then they would implement backups. But that does actually require said common sense.

  • by gweihir ( 88907 ) on Sunday March 24, 2013 @10:13AM (#43262733)

    Well, so this was _not_ a git failure, as there was an explicit warning that it does not cover this case. Not the fault of git but those that did not bother to find out. That a "mirror" operation does not check the repository is also no surprise at all.

    Incidentally, even if git had failed, that is why you have independent and verified backups. A competently designed and managed system can survive the failure of any one component.

  • Re:No backups?! (Score:5, Insightful)

    by Blymie ( 231220 ) on Sunday March 24, 2013 @10:17AM (#43262755)

    A 24 hour old sync isn't a backup. It's a slightly delayed mirror.

    "Rotational backups" isn't just a single thing. It's a whole ball of wax. Part of that ball of wax, are test restores. Another part of that are backups that only sync changes, something exceptionally easy with rotational backups, but not as was with a filesystem snapshot.

    In 10 seconds, I can run 'find' on a set of rotational backups I have, that go back FIVE YEARS and find every instance of a single file that has changed on a daily basis. How does someone do that with ZFS snapshots? This is something that is key when debugging corrupt , or looking for a point to start a restore from (someone hacks in).

    Not to mention that ZFS could be producing corrupt snapshots -- what an annoyance to have to constant restore those, then do tests on the entire snapshot to verify the data.

    What I see here is a reluctance to do the right thing, and a desire to think that the way people do traditional backups is silly.

  • by Anonymous Coward on Sunday March 24, 2013 @10:24AM (#43262791)

    "Not the fault of git but those that did not bother to find out"

    No, Git has the integrity check, the integrity check didn't work. If the integrity check had worked as claimed then their backups were solid.

    I know people are saying "keep backups", but they're really missing the point. A backup is a copy of something, the more up to date the better, better still if it keeps a historic set of backups. Perhaps with some sort of software to minimize the size, perhaps only keep changes..... you can see where I'm going with this.

    Git sync to a lot of drives IS A BACKUP. It is exactly what an ideal backup should be, historic, up to date, minimizes storage. What is that system if it isn't an automatic backup!

    Except for this bug, which needs to be fixed, and a little less faith in git too would also be a good thing.

    It's really no different than if you use the backup software, and it made careful backups and kept historic copies, and then one day your disk got corrupted, you promptly went to your backups only to find the backup software had been chomping those because it didn't notice the integrity was corrupt and had happily been corrupting the backups it was keeping.

    So I see comments saying they didn't have backups OMG! But no, their problem was they only used ONE TYPE OF BACKUP SOFTWARE Git sync. I bet all of you use only ONE type of backup software and are equally vulnerable to this failure.

  • Re:No backups?! (Score:5, Insightful)

    by gweihir ( 88907 ) on Sunday March 24, 2013 @10:36AM (#43262849)

    What really surprises me is that people still do not understand backup, after it has been solved for decades. Backup _must_ be independent. It _must_not_ be on the same hardware. It _must_not_ not even be on the same site, if the data is critical. It must protect against anything happening to the original system. Version control, mirrors, RAID, all do not qualify as backup. They are not independent of the system being backed up.

    However, the amount of incompetence displayed in the original story and the comments here explains a lot. Seems that in this time of "virtual everything" people do not even bother to learn the basics anymore and are then surprised when they make very, very basic mistakes.

  • by Antique Geekmeister ( 740220 ) on Sunday March 24, 2013 @10:55AM (#43262939)

    May I respectfully disagree? I've often seen such focus on what is "out of scope" used to limit cost and to limit the "turf" on which an employer or contractor needs access. But backup is _certainly_ a critical part of source control, just as security is. The ability to replicate a working source control system to other hardware or environments due to failure or corruption of the primary server is critical to any critical source tree. Calling it "out of scope" is like calling security "out of scope". By ignoring the consequences at the design stages of a source control system, very real risks are often taken without even thinking of the possible consequences, and the resources necessary to provide such critical features later can, and often do, multiply the cost of a project in unexpected ways.

    A nightly mirror on low-cost hardware with snapshot capability, for example, can provide very useful fallback capability. Even hardlink based softwaer snapshots can work well.. It requires thought to configure correctly, and to schedule the mirrors and make sure they don't conflict with other high bandwidth operations such as tape backup, and to handle "churn" diskspace requirements. And I've had some very good success with partners and clients who took such modest backup tools and saved enormous cost on high-speed tape backup systems high bandwidth connections for remote mirroring facilities, or who had difficulti4es meeting very short backup windows by using the mirror, or the snapshots, to do the tape backups for archival. It does inject a phase delay into the tape backups, and recovery from tape has to be tested, but it's been extremely effective.

    Several times, I've found that the problem is a political one. The backup system is often a very expensive, high performance capital cost, or some kind of proprietary "turf" of a manager who is very comfortable with and enamored of it, and they're concerned that adding this layer will make them look foolish for spending the money, or cost them their job as a proprietary owner of critical infrastructure. They already had the political battle purchasing the hardware in the first place and don't care to rehash their previous work. But it's often amazing what staging the backups this way can do for performance and user access to their backed up data. Most restoration cases are due to accidental file deletion or editing, and the users no longer need access to the tape backup system or off-site archival, and only to the snapshots which have read-only access with the same privileges as the original source material.

  • by gweihir ( 88907 ) on Sunday March 24, 2013 @11:52AM (#43263251)

    Yes, it is too much. How would the mirror operation ever know without full checks on everything? Quit asking for nanny-software that treats its users as incompetent and illiterate. Is it too much to ask for the admins to actually have a brief look at the description of the operation they are using as their primary redundancy mechanism? I don't think so. If they had done this very basic step, they would have known to run a repository check before mirroring. If they had any real IT knowledge, they would have known that mirrors are not backups and that you need backups in addition.

    Also, from what I gather from their grossly incomplete "analysis" is that they had a file that read back differently on multiple reads (not sure, they seem not to have checked that), which is not a filesystem corruption (the OS checks for that on access to some degree), but a hardware fault. Filesystems and application software routinely do not check for that. It is one of the reasons to always do a full data compare when making a backup.

  • by BitZtream ( 692029 ) on Sunday March 24, 2013 @05:31PM (#43265285)

    It is UNIX-style design where the user is expected to actually understand what they are doing.

    No, it is not, and never was. It is infact the opposite of that. man pages, as one obvious example, are there so people who don't know what they are doing can figure it out. It is designed to be intuitive and provide you with the information needed to get the job done. It was built to have small, simple tools that were easy to understand. They can perform simple tasks on their own or when working together, perform some complex ones ... hence the powerful unix command line. The original UNIX design considered but new, inexperienced users and how to bring them up to speed as well as how to empower users with more knowledge of the system.

    What you are referring to is a Linux/OSS attribute, not a UNIX attribute. Linux/OSS developers typically expect the user of the software to be a developer as well. This is the result of everyone scratching their own itch only and most code being written by people for themselves without any consideration of others. No one WANTS to write the things that makes it intuitive or easy for someone else who doesn't understand all the quirks. Obviously this isn't true for some of the paid developers, but the majority of them aren't.

For God's sake, stop researching for a while and begin to think!

Working...