Azure Failure Was a Leap Year Glitch 247
judgecorp writes "Microsoft's Windows Azure cloud service was down much of yesterday, and the cause was a leap year bug as the service failed to handle the 29th day of February. Faults propagated making this a severe outage for many customers, including the UK Government's recently launched G-cloud service."
What is it with Microsoft and Leap Year? (Score:3, Informative)
Prepared for future (Score:4, Informative)
Arthur David Olson is my hero (Score:5, Informative)
Re:What is it with Microsoft and Leap Year? (Score:5, Informative)
Now, I'm not necessarily a Microsoft apologist, but I have to point out that it wasn't so long ago that other things near and dear to us geeks were experiencing similar problems.
I was trying to run some ant scripts yesterday that interact with an FTP server to delete some files. Those damned files wouldn't get deleted. They weren't even returned from a listing command. As it turns out, I was using a particularly old version of Apache Commons-Net library (this jar file was from 2005) which had a leap-year bug. It simply would not show me files with modification dates of 2/29. I was looking at the FTP server configuration, logging in with other clients, moving and renaming files, and all about ready to break out Wireshark... and then it occurred to me that it was leap day. Hoo-fucking-ray. "touch"ed the file, and sure enough, it was suddenly available. Those are a few hours of my life I'll never get back.
Some of the most common leap-year bugs (Score:5, Informative)
Some of the common leap year bugs that I've seen over the years:
1. A matrix with the number of days per month:
e.g. smallint dayspermonth[12]={31,28,31,30,31,30,31,31,30,31,30,31};
Indexing into the matrix for February (index 1) ignores leap years.
1. A matrix with 365 elements to represent a year's worth of something:
e.g. smallint hightemps[365];
This usually doesn't fail until Dec 31, when hightemp[mydate.dayofyear()-1] points to a non-existent element.
Of course, if dayofyear is calculated using the matrix in the prior bug, it will fail invisibly since that will be incorrect
as well.
2. Quck-n-dirty subtract one year math:
e.g. Convert date to char in YYYYDDMM format, convert char to int, subtract 10000, convert back to a char and then date.
Why people do this when you can dateadd(year,mydate,-1) is that easy, I have no clue. But it breaks horridly when
you use it to determine "one year ago today" from Feb 29.
Re:Prepared for future (Score:5, Informative)
Re:Same Story / Different Day (Score:4, Informative)
Yeah, it was a really stupid bug, especially when you consider the OS provides a very useful set of APIs for dealing with it (basically convert a SYSTEMTIME (day/month/year/mm/hh/ss) into a FILETIME (64-bit unsigned int similar to time_t), do your math (the compiler will handle the 64-bit computations for you) and convert it back. Two OS calls.
If you're having ot do leap year calculations or even any sort of date calculations, stop. The OS or library will probably already have a set of functions for doing date calculations without you have to do it manually. Given how easy they are to screw up, far better to leave it to someone else.
Hell, given Windows worked fine, I don't even want to know what Azure is doing - the fundamental OS and runtimes all handle leap year date calculations with aplomb. Heck, that might be some of the oldest code in the kernel these days because it was written a long time ago, works well and has been thoroughly debugged through the decades.
Re:Prepared for future (Score:2, Informative)
And then you get hosed every 100... except when it's every 400.
No, that's a feature that the Gregorian calendar added later.
The simple Julian calendar assumed that years were exactly 365.25 days long and had a leap day every four years starting from 46 BC.
gmuslera was right --- leap days have been around for over 2000 years.