Too Perfect a Mirror 192
Carewolf writes "Jeff Mitchell writes on his blog about what almost became 'The Great KDE Disaster Of 2013.' It all started as simple update of the root git server and ended up with a corrupt git repository automatically mirrored to every mirror and deleting every copy of most KDE repositories. It ends by discussing what the problem is with git --mirror and how you can avoid similar problems in the future."
Re:Not git related (Score:4, Interesting)
You can --mirror any time. If you actually have backups, not just mirrors and hope.
Re:But it is SUPPOSED to (Score:2, Interesting)
Git does not have the magic "integrity check" on making mirrors.
Why on earth not?
If they had bothered to look at the documentation they would have known.
There's no mention of this in any of the git-clone, git-push, git-pull or git-fetch man pages on my system, at least not near any instance of the word "mirror".
If they has thought about it for a second, they would have realized that expensive integrity checks might be switched off on a fast mirror operation.
Why? The point of the mirror option (at least as far as the documentation mentions) is to propagate all branch additions/deletions/forced updates automatically, not to make it fast. Git is advertised as having strong integrity checking as a feature, so why would you assume that would ever be turned off, except maybe with an explicit --no-check-hashes option?
If they had even be a bit careful, they would have checked the documentation and known. [...] This is correct and documented behavior.
Not documented in any of the obvious places to look, at least. Maybe if they'd bothered to read literally the entire Git documentation they might have found a mention of this somewhere, but reading the entire documentation every time you start using a new option just in case there might be some special non-obvious caveat goes way beyond "even a bit careful".
And no, nothing done within the system being backed up is a backup. A backup needs to be stored independent of the system being backed up.
The whole point of the mirrors is that they're not the same system as the original.
Re:Lean how your tool works? (Score:5, Interesting)
Re:No backups?! (Score:5, Interesting)
I like your summary of three important reasons for tape archive. I'll restate in different terms.
1. Mid-term to indefinite data retention.
2. Large quantities of data, where "large" is a value greater than a single hard drive can reasonably store.
3. Disaster recovery planning.
But there are more.
4. "Oops".
That's the category of this KDE git issue. Recovering from an "oops". People screw up. How do you recover? I'm a big fan of having multiple layers in that onion: online snapshots, near-line replicas, and off-line tape backups are a basic three-tiered framework for figuring out how to protect the data. I'm amazed as big as KDE is, they don't have storage/backup expertise helping them keep their data secure. Makes me think I may have found my next open-source niche to fill.
5. Reliability. Contrary to the "fragile, expensive" opinion above, tape failure rates are demonstrably lower than hard drive failure rates despite regular handling. Research left to the reader; hard drives fail at a rate about fifteen times higher than their rated MTBF, which was already considerably higher than tape. Data on tape is far more resilient than data on a hard drive.
6. Cost. If you have to store data long-term, consider tape. Administrative, electrical, power, cooling, and storage requirements are all cheaper.
That's what I can think of off the top of my head; I'm sure there are more reasons for tape to be a good choice. The reality for many people that want to store their data "in the cloud" also is this:
I back up your "cloud" storage onto tape drives. Your cloud storage is only as reliable as my ability to recover it from a disaster.
Re:programming != IT (Score:4, Interesting)