Azure Failure Was a Leap Year Glitch 247
judgecorp writes "Microsoft's Windows Azure cloud service was down much of yesterday, and the cause was a leap year bug as the service failed to handle the 29th day of February. Faults propagated making this a severe outage for many customers, including the UK Government's recently launched G-cloud service."
Re:Same Story / Different Day (Score:5, Interesting)
What is with MS and their apparent inability to cope with leap years?
It wasn't just Microsoft... (Score:5, Interesting)
...they just had the most publicly catastrophic failure. I just noticed that all of the Google Chat messages I received yesterday were sent to me at various times on December 31, 1969.
And it also seems that I didn't even receive any of them until today, March 1, implying that they were incapable of even sending them yesterday.
Re:Same Story / Different Day (Score:5, Interesting)
Microsoft Never Has Been Good At Time (Score:5, Interesting)
Funnily enough, I used to work at IBM doing OS/2 tech support. OS/2 and Windows NT share a common heritage, so a lot of the behind-the-scenes problems I witnessed in OS/2 were (And sometimes still are) problems with Windows. I'm not sure if this is one of them, but I got a call once from a guy who was trying to use his OS/2 system to track satellites. The problem was, the OS/2 timer API specified that you could set milliseconds but it didn't seem to work. I tracked it down to a timing driver which tracked two separate interrupts. The first interrupt happened every few milliseconds and would update the clock millis when that happened. However, if the system was busy it was possible to not handle that interrupt. There was also a system periodic interrupt every 1 second. When that occurred, the system hard-reset the milli time and incremented the seconds. So you could set the millis, but the clock would become inaccurate 1 second later. Just one example of how time has been a thorn in my side for my entire career. I wrote an APAR up on it which was promptly closed "Working as Designed." Dunno if he ever got it fixed...
Re:Same Story / Different Day (Score:5, Interesting)
Re:Same Story / Different Day (Score:5, Interesting)
What is with MS and their apparent inability to cope with leap years?
I would like to know the same thing. This seems to be systemic.
Yeah; it's systemic. Or at least it used to be a few years back, and I wouldn't be surprised if they haven't fixed the basic problem yet. The problem is fairly simple: Windows' internal clock is in local time.
To a programmer with experience writing date/time code, I've found that this is all you need to tell them. Any software whose internal clock is in local time will be buggy, and it will never be completely fixed. Attempts to fix bugs will merely introduce bugs elsewhere in the chains of date/time handling. The sensible solution is to adopt a "universal time" internally, and convert at the last stage when you present the date/time to a human user. Yes, you theoretically can work with local time internally, but (teams of) humans can't actually make this work in practice. The best they can do is make it work in the "normal" cases. Bug fixes then tend to just move the time bugs around to different places in the code. But it can be very difficult to get management to accept this and agree to UT-only internally.
Java also used to specify local time internally (and may still do so, but I haven't used it in years). I worked on a number of projects where, after repeated date/time disasters at every switch to/from DST and every Feb 29, java was abandoned and everything was rewritten in a language (usually C++) whose libraries supported a UT timestamp and didn't have all those time bugs.
Does anyone know if MS Windows has introduced a UT internal time yet? If not, then we can reliably predict that such bugs will continue to plague their users.
Re:Who could have foreseen a leap year coming? (Score:4, Interesting)
In all fairness, Microsoft never figured anyone would still be using this service by the time a leap year rolled around.
Ah, that explains why Zunes went dark on New-Years 2009... [pcworld.com]
Think about this. You're a software dev, and you use a MS C++ compiler. They wrote their standard libs, including the "time.h" / <ctime> code... you use their time libraries.
Now two things:
0. MS employs some real nut-jobs that can't even use the standard time functions and instead write their own for each project...
or
1. MS doesn't even trust their own compiler / libraries to do the right thing?
It scares me to think that MS makes operating systems... IMHO, they should get back to BASICs.