Forgot your password?

typodupeerror
Businesses IT Technology

Leap Second Bug Causes Crashes 230

Posted by samzenpus
from the slip-it-in-there dept.
An anonymous reader writes in with a Wired story about the problems caused by the leap second last night. "Reddit, Mozilla, and possibly many other web outfits experienced brief technical problems on Saturday evening, when software underpinning their online operations choked on the “leap second” that was added to the world’s atomic clocks. On Saturday, at midnight Greenwich Mean Time, as June turned into July, the Earth’s official time keepers held their clocks back by a single second in order to keep them in sync with the planet’s daily rotation, and according to reports from across the web, some of the net’s fundamental software platforms — including the Linux operating system and the Java application platform — were unable to cope with the extra second."
This discussion has been archived. No new comments can be posted.

Leap Second Bug Causes Crashes

Comments Filter:
  • by kthreadd (1558445) on Sunday July 01, 2012 @04:13PM (#40512121)
    So far all I've heard about is affected Linux systems, did Windows and OS X just fine?
  • Re:Why now? (Score:3, Interesting)

    by Znork (31774) on Sunday July 01, 2012 @04:41PM (#40512265)

    We will keep having these kinds of issues for as long as some people who fail to understand that time of day is an arbitrary number whose main utility lies in it being composed of predictable periods and divided into homogenous units. It should have no relation whatsoever to whatever time the sun happens to rise or set at any particular location and above all it should not be changed to accomodate fluctuations in the orbit of a rock circling an arbitrary star. Abominations like leap seconds or daylight savings make the whole system less useful by merely existing.

    But personally I wouldn't be surprised if people off the equator were to get summer minutes composed of 120 seconds during daytime (or even better, a scale!) to ensure the sun rises and sets at the same time year around. Or, hey, why not simply make the seconds longer? Or a combination of both plus we can define pi to be 3 to make things simpler.

  • Why not bundle them and apply them every 10 or 20 years?

    And apparently I'm not alone:

    http://en.wikipedia.org/wiki/Leap_second#Proposal_to_abolish_leap_seconds [wikipedia.org]

    Hogwash, Astronomers can find coping mechanisms, it's either that or these ridiculous levels of stress for systems admins.

  • by Anonymous Coward on Sunday July 01, 2012 @05:09PM (#40512383)

    I run Arch Linux with kernel 3.4.4 and it went haywire. My machine was very heavily loaded at the time and when the leap second happened mysqld, firefox, and ksoftirq processes started consuming 100% CPU. The load factor was well over 10 and the machine was grinding along. It didn't actually fail but it was loaded down.

    Even restarting the processes didn't fix it. The high load would go away once I stopped the processes but as soon as I started them again the load would come right back. I had Firefox open on a blank page not doing anything and it was slammed at 100% CPU and had a could ksoftirq tasks slammed at 100% CPU each too.

    I had to reboot the machine to get it back to normal.

    I have Ubuntu and Debian servers that for whatever reason did not add the leap second so they were fine. Their time was a second off today though (at least until ntp slowly corrected it or I manually intervened).

  • Only Linux affected? (Score:5, Interesting)

    by cpghost (719344) on Sunday July 01, 2012 @05:15PM (#40512399) Homepage
    I'm managing a cluster of 2,400 nodes running FreeBSD, and AFAICS, none was tripped off by leap second NTP adjustments. On the other hand, 4 out of 180 Linux nodes crashed simultaneously at that very moment. All this is exceedingly weird, but may indeed point to a subtle bug in the Linux kernel (only?). I've never witnessed this behavior in the past.
  • by Barryke (772876) on Sunday July 01, 2012 @05:43PM (#40512517) Homepage

    Google official blog: "Time, technology and leaping seconds" (sept 2011)
    http://googleblog.blogspot.in/2011/09/time-technology-and-leaping-seconds.html [blogspot.in]

    I wonder if the leap second has anything to do with the labs Chubby paper / site currently being offline..

  • Our problem was with a third party monitoring solution - its daemon process brought every single one of our servers to a near halt by consuming all available cpu cycles at the stroke of gmt midnight.

    The OS itself was fine.
    This monitoring software is common enough that it likely was behind a lot of the issues seen around the 'net.

  • by at10u8 (179705) on Sunday July 01, 2012 @07:01PM (#40512837)
    except that BIPM, the providers of TAI, have published this http://www.bipm.org/cc/CCTF/Allowed/18/CCTF_09-27_note_on_UTC-ITU-R.pdf [bipm.org] wherein the CCTF "stresses that TAI is the uniform time scale underlying UTC, and that it should not be considered as an alternative time reference." This appears to indicate that the CCTF and BIPM are not comfortable with the notion that operational systems might be employing TAI as their time scale. At the end of that paper they also discuss the possibility that TAI could cease to exist.
  • by Guy Harris (3803) <guy@alum.mit.edu> on Sunday July 01, 2012 @07:37PM (#40513029)

    As far as I can tell, all current operating systems handled it fine. It's applications that have problems, mainly server-type apps that actually use the clock for important things.

    Linux being heavily affected is just a side-effect of most servers running Linux (although apparently some older versions don't handle leap seconds so cleanly - maybe that has something to do with it?).

    Yes, at least one of the problems appears to be a Linux kernel problem [lkml.org]. However, as that thread indicates, the consequence of this isn't a kernel crash; it causes futexes [kernel.org] to repeatedly time out (or, at least, causing futexes with timeouts to repeatedly time out). I'm guessing, perhaps incorrectly, that this might mean that code waiting for a futex gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, gets a kernel wakeup due to a timeout, checks whether the condition being waited for has happened, discovers that it hasn't, sleeps in the futex again, lathers, rinses, repeats, so it makes no progress and chews up tons of CPU.

    If so, then:

    • this particular problem is specific to systems running Linux kernels with the problem (and hence specific to Linux);
    • applications that don't themselves have issues with leap seconds might be affected by this;

    so Linux being heavily affected might also be a side-effect of, well, some versions of the Linux kernel having a bug that's triggered by leap seconds.

    However, unless an application happens to use futexes in a fashion that trips over the bug, they won't be affected. It might be server applications that are most likely to do so, meaning that you might not see it on, say, a desktop or handheld Linux machine, or even on some servers.

  • Re: (Score:4, Interesting)

    by Guy Harris (3803) <guy@alum.mit.edu> on Sunday July 01, 2012 @07:51PM (#40513151)

    The hard system lock bug due to a leap second was patched in 2.6.29, so either you've got some weird related bug, or something is very wrong.

    Well, the weird related bug would arguably count as something being wrong. Apparently there is a bug in the handling of the insertion of positive leap seconds that could cause weird behavior with [lkml.org]futexes [kernel.org], and that bug appears not to have been fixed until at least July 1, 2012 (I'm guessing John Stultz has worked up a patch [lkml.org]).

Someday your prints will come. -- Kodak

Working...