Azure Failure Was a Leap Year Glitch 247
judgecorp writes "Microsoft's Windows Azure cloud service was down much of yesterday, and the cause was a leap year bug as the service failed to handle the 29th day of February. Faults propagated making this a severe outage for many customers, including the UK Government's recently launched G-cloud service."
Re:Same Story / Different Day (Score:5, Insightful)
Everything MS does as "me too" sucks. (Score:5, Insightful)
It seems that all of MS's copied products - hotmail, Azure, Zune are all done with a "me too" attitude of just having something so that they don't get left behind. They don't really try to make these "me too" products as industry leaders. But here's the catch. I know plenty of IT people who will always choose MS's offering because, as I was told "you don't get in trouble for choosing MS". And that knowledge seems to be built into MS's offerings.
Single Point of Failure (Score:5, Insightful)
A leap year issue? Are you SERIOUS? (Score:5, Insightful)
Given how many DECADES leap year calculations have had to be done and how many years it's been since we fixed the Y2K issues (at great expense, I might add), it is absolutely UNACCEPTABLE for someone to blame a leap year calculation for down time.
The DIRECTOR of the service division at Microsoft should be FIRED for this failure.
Expect lawsuits from customers, Microsoft. Because this was a problem you KNEW about and should have written code to deal with.
What a pathetic excuse for planning and testing on Microsoft's part.
Re:Dumb people never learn (Score:1, Insightful)
Never trust Microsoft for anything.
Never trust any vendor for anything.
FTFY, blah blah blah....
You whipper snappers don't remeber what it was like doing business with IBM back when they ruled the World and some of the things I see about Oracle here on Slashdot makes me cringe.
Re:Single Point of Failure (Score:5, Insightful)
Thats a flaw in the idea of a monoculture, true redundancy has different software implementing the same basic standards...
Like how the Internet is built from routers made by different vendors, cisco, juniper, software based linux/bsd devices etc. When new DoS vulnerabilities are found in one vendors kit it doesn`t take down the whole internet, because other vendors are immune.
Re:Who could have foreseen a leap year coming? (Score:5, Insightful)
Actually every hundred year is when a leap year doesn't come along. (unless it's divisible by 400, then it does)
Right; and I wonder how many computer failures will happen on the first of March, 2100, due to part of the software thinking it's the 29th of February, causing random problems while talking to other software that knows the correct date.
We all know it's gonna happen ...
Re:Same Story / Different Day (Score:5, Insightful)
I had a similar thought about code reuse, but an entirely different conclusion: I thought that they weren't re-using good code since the same problem has cropped up at least two times. That sounds more like a case of re-rolling things that definitely shouldn't be re-rolled (date/time handling) to me.
In either event, they're not using particularly good practices. Either they are constantly reinventing the wheel and apparently in error-prone ways, or they are re-using code but paying no attention to keeping that external code up to date.
The only other thing I can think of is that Azure is somehow so drastically different than anything else they have ever done that they had to do the code again from scratch -- which is probably a problem all by itself.