The Leap Second Is Here! Are Your Systems Ready? 284
Tmack writes "The last time we had a leap second, sysadmins were taken a bit by surprise when a random smattering of systems locked up (including Slashdot itself) due to a kernel bug causing a race condition specific to the way leap seconds are handled/notified by ntp. The vulnerable kernel versions (prior to 2.6.29) are still common amongst older versions of popular distributions (Debian Lenny, RHEL/CentOS 5) and embedded/black-box style appliances (Switches, load balancers, spam filters/email gateways, NAS devices, etc). Several vendors have released patches and bulletins about the possibility of a repeat of last time. Are you/your team/company ready? Are you upgraded, or are you going to bypass this by simply turning off NTP for the weekend?"
Update: 07/01 03:14 GMT by S : ZeroPaid reports that this issue took down the Pirate Bay for a few hours.
Irony (Score:5, Funny)
Leap years = no problem.
Leap seconds = kernel panic.
I fear for teh internets if we try a leap millisecond.
Re:Irony (Score:5, Interesting)
Re: (Score:3)
It was not the largest of timer overflows that killed them, but the tiniest leap second...
WTF is the issue? (Score:3)
Perhaps this is just affecting some kernel versions or specific applications which behave poorly.
One data point: both of my servers were running all night, with NTP updates, and did not appear to have any issues. Both are still running right now, several hours after the leap second. They're Synology boxes running their own version of Linux (DSM 3.1-1636 and DSM 4.0-2196). FWIW, the box running DSM 3.1 has never had a problem with leap seconds, and has endured several since we've had it running almost con
Re:WTF is the issue? (Score:4, Insightful)
Re: (Score:3)
TAI, based on whatever flavor of atomic clock is currently the going standard, is markedly more regular than the observed time based on the movement of the earth(UT1). UTC ticks at the same rate as TAI; but is supposed to correspond to UT1, which ticks at an unpredictable rate. So, whenever UT1 and UTC drift too far apart(DUT1 approaches
Re: (Score:2)
Ziggy must be on the sauce again.
Re: (Score:2)
Re:Irony (Score:5, Interesting)
I'm interested to hear what the excuse is - because it will probably sound a lot like the things you all flame Windows users for...
A lot of Linux systems are on private networks and have to be up 24/7. Dealing with a known bug is considered less problematic than installing a new OS version and invalidating all the testing which has proven that the system can run 24/7 over the last few years.
So I guess you're right, it is very similar to the reasons why there are many unpatched Windows systems out there.
Re: (Score:2)
If there's a kernel bug to be patched, you download and install a newer kernel, with the bug patch, then reboot (at some time when the outage will be the least inconvenient) into the new version. And, if you really can't afford the downtime, there are ways to get a new kernel running without even that. No need for a complete OS upgrade, just to get a more recent kernel.
Re:Irony (Score:5, Insightful)
And what do you do when the kernel change causes your system to start crashing, when it had previously operated for years with no failures?
An acquaintance supports a system which has been in operation for years and breaks every time there's a leap second (not because of the Linux kernel but other software and hardware issues). That means every few years he spends a couple of hours rebooting the servers and verifying that it's up and running again afterwards. Fixing the software would mean a substantial amount of development work followed by weeks of testing.
Re: (Score:3)
I'd really, really love to respond by saying that That Just Doesn't Happen. Alas, I know better. My laptop is currently running Fedora 16 with a PAE kernel from Fedora 14 because every 3.x kernel I've tried on it hangs during boot while trying to do something with my card reader. And, if I have a card in the reader, it ends up rebooting itself. About all I can say
Re: (Score:2)
Re:Irony (Score:4, Insightful)
And what do you do when the kernel change causes your system to start crashing, when it had previously operated for years with no failures?
Er, you restart with the older kernel, which is right there on the grub boot menu.
... Sorry, was this a trick question or something...?
Re:Irony (Score:5, Insightful)
The excuse? I have a file server and a router that run 24/7/365.24 (+1/86400, on occasion), and they just work. I have no interest in even logging into them, and they will remain "stock" systems until either a critical SSL vulnerability (in the case of the router) or I absolutely need a feature not possible with that old of a system. And when I say "old", I mean, talking "Slackware 4" here until about a year ago.
One of the nice things about Linux - It just works. You don't get random reboots every two weeks when Microsoft decides you must install this particular update, It doesn't get "crufty" the same way the Windows registry does, it doesn't suddenly fail to boot one morning (though in fairness, the fact that we never shut them down probably leads to a bias in that regard). It just works, day after day, year after year. If it worked yesterday and no hardware failed overnight, it will work today.
Now... If you want to call that something that we complain about in Windows... Hey, I'll admit it, I want my software to "just work". Whether that means a Linux server that never goes down, or an XP desktop environment that (for the 18-24 months between puking) everything supports, I just want my hammer to pound nails and my crowbar to pull them back out, and I don't care if my screwdriver believes in Buddha or Jesus or Xenu.
Re:Irony (Score:4, Interesting)
Nope. I have windows update set to "check but ask" and occasionally I find that it's restarted due to updates without even informing me.
More important than kernel issues (Score:4, Funny)
For those wondering whether they get one more second of sleep tonight or one less, the rule is 'spring forwards, fall back, summer stand there looking confused'.
what about the metric time system? (Score:3)
what about the metric time system?
Re: (Score:2)
what about the metric time system?
We are using the metric time system. Under Chapter 4 Section 6 [bipm.org], days, hours, and minutes are acceptable units using the customary definitions.
as of about a year ago, I started defensive coding (Score:5, Interesting)
If you're testing if something that increments ever hits a number(like 10) and goes back to 0, instead of checking if it ==10, check if it is >9.
There are a lot of defensive coding mechanisms you can use. The downside of this is that when you debug, something can sneak by and put you outside of a state you want, so it makes it ever so slightly harder to debug. But if you're making software that will be used by the public that is hard to give updates, defensive programming can save the day here and there.
Re:as of about a year ago, I started defensive cod (Score:5, Funny)
Our servers run on octal, you insensitive clod.
out of a state you want (Score:3)
Oh I do hate when that happens at work. We end up so bewildered. "Are we dead? Or is this Ohio?"
( cookies for whoever gets the reference without Googling. )
Re: (Score:2)
Most of us are too old to still watch children's cartoons.
Re: (Score:2)
But just old enough to still remember them and collect that cookie, I see? :)
( Too many here are too young to even know the cartoons. )
Re: (Score:2)
I cannot confirm, nor deny, any recollection of the creatures in question. :)
Re: (Score:2)
But we're not too old for cookies.
obligatory xkcd [xkcd.com]
Re: (Score:2)
Re: (Score:2)
Leap second got Reddit? (Score:4, Interesting)
Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.
Re: (Score:2)
I noticed Reddit had problems with my submissions, my liked feeds since last night. When did leap supposed to start? Last night?
Re: (Score:2)
No I think it was okay for half an hour or so after 0000. Slashdot can make a profit for the next couple of hours anyway.
Yes! (Score:5, Informative)
https://twitter.com/redditstatus/status/219244389044731904 [twitter.com] just said so -- "We are having some Java/Cassandra issues related to the leap second at 5pm PST. We're working as quickly as we can to restore service." :D
Re:Leap second got Reddit? (Score:4, Informative)
Here's the tweet on @redditstatus [twitter.com]. According to them "We are having some Java/Cassandra issues related to the leap second at 5pm PST."
Re:Leap second got Reddit? (Score:5, Funny)
Looks like Reddit's systems weren't ready for the leap second. It been down since around midnight (UTC). You'd think a site as big as that would be ready for such an event.
Have you tried truing it off and turning it on again?
Terminology? (Score:2, Interesting)
If 2012 is a leap year, doesn't that make 2012-06-30 23:59 a leap minute?
ntpd doesn't keep accurate time anyway... (Score:2)
...at least not on any of my servers, so what's a leap second between friends?
looks like it hit reddit (Score:2)
One would think they'd have seen this coming, because I'm pretty sure there's a
GPS Time (Score:3)
Having issues with java/systems? try this (Score:5, Informative)
/etc/init.d/ntpd stop; date; date `date +"%m%d%H%M%C%y.%S"`; date;
Fixed the issues I was having. Credit goes to https://twitter.com/SilvioSantoZ/status/219250677522767872 [twitter.com]. I didn't have to restart anything after running it. YMMV
Re: (Score:2)
Alleged issue did not appear... (Score:3)
So the comments are confusing to me as to whether Debian "squeeze" is supposed to have a problem or not, but I have about fifty of these systems running, and as far as I can tell, they're all fine.
I got a whole bunch of these in the logs:
> Jun 30 19:59:59 kernel: [timestamp] Clock: inserting leap second 23:59:60 UTC
I have three of the machines configured as NTP peers to each other, and looking at a few tier-1 time servers. The rest of the machines all use the three local peers as time servers.
My Debian desktop systems at home also seem to be fine.
Exede satellite internet (Score:2)
The explanation, deadlock...do kill the messenger (Score:2)
Simple version:
"dont kill the messenger" except when the messenger is going to kill you. Its printk sending notice that the leap second happened that deadlocks against the timer doing the leap second (both vying for xtime_lock). Call it a "feature" of the NTP code. Hence the "turn off NTPD" workaround, if NTP doesnt get notified it should implement the leap second from somewhere upstream, it wont notify about it to the kernel, and the printk sho
Some folks are lucky... (Score:3)
I had 2 Debian Squeeze Blade servers in Thailand kernel panic on me at 3am (AEST). What strikes me as odd as out of the 6 blades that we have Debian running on (all running squeeze and kernel 2.6.32 with identical packages) only 2 of them had a Panic, and so much for the advisory saying it only affects kernel 2.6.29. There might be more to it than the kernel but sheesh, I'm on holiday!
Re:How is this an issue? (Score:5, Informative)
Re:How is this an issue? (Score:5, Informative)
It actually goes 23h 59m 59s, 23h 59m 60s, 00h 00m 00s. See http://www.nist.gov/pml/div688/leapseconds.cfm
Re:How is this an issue? (Score:4, Informative)
It put out the sequence 23:59:59, 23:59:59, 00:00:00 repeating second 59 instead of using second 60.
NTP server vs. NTP client, Unix Kernel, Unix Apps (Score:2)
There are different layers where you can run into problems. One of them is the ASCII value a time server hands a a time client - if it's 23h 59m 60s and the client chokes, that's a client problem. If the client tries to set an ASCII clock in the kernel to 23h 59m 60s, and the kernel chokes, that's a kernel bug. If a Unix application library can't cope with the interesting values, that's a library problem.
One obvious workaround is for the NTP server to never answer 23h 59m 60s, and for NTP clients to neve
What it sounds like on WWV (Score:3)
I listened to the leap second on WWV. It sounded like this:
tick (23:59:55)
tick (23:59:56)
tick (23:59:57)
tick (23:59:58)
(nothing) (23:59:59)
(nothing) (23:59:60)
BEEP (00:00:00)
It always sounds to me like WWV has gotten stuck or something.
...laura
Re:How is this an issue? (Score:5, Insightful)
Re:How is this an issue? (Score:4, Insightful)
Re: (Score:2)
$ man date
[...]
%S second (00..60)
[...]
Oh, maybe when you use that (or strftime).
Re: (Score:2)
I've seen systems that do that, but if it does that it's not UTC. UTC goes 23:59:58, ,23:59:59, 23:59:60, 00:00:00 when there's a leap second.
Re:How is this an issue? (Score:4, Informative)
When NTP tries to say that it is 12:34:61 and the computer only expects 1-60.
That will never happen.
Leap seconds are always asserted at UTC midnight on the last day of a month. I think the convention is only to have leap second opportunities at the end of March, June, September and December. Typically, they try to assert it at midnight December 31. It's unusual to have a mid-year leap-second.
Since the normal progression is 23:59:58, 23:59:59, 00:00:00, the extra second makes the time 23:59:60. 61 would be TWO leap seconds which won't happen any time soon. The Earth's rate of rotation would have to change by nearly two seconds in 3 months.
Re: (Score:2)
Re:How is this an issue? (Score:4, Informative)
Re:How is this an issue? (Score:5, Funny)
A leap anti-second?
Re: (Score:2)
"A positive or negative leap-second should be the last second of a UTC month, but first preference should be given to the end of December and June, and second preference to the end of March and September." -
Re: (Score:2)
That's because the Earth's average rate of rotation is just a little slower than one solar revolution per day. To cause that to speed up would take a large change in the Earth's moment of inertia. For instance, due to compaction of the Earth's core, cooling of the oceans or formation of massive glaciers at the poles.
Nevertheless, our timekeeping systems are designed to have both positive and negative leap seconds.
Re: (Score:2)
You're confusing the need for leap seconds with sidereal time. But note, the Earth actually rotates faster than one solar revolution per day.
Leap seconds are needed because when we changed the definition of a second from 1/86400th of a day to one based on a characteristic of the Caesium atom, a poor value was chosen, and the Earth is slowing down over time, primarily due to tidal acceleration by
Re:How is this an issue? (Score:5, Informative)
Contrary to what the GP said, the solar day is not too fast. It is what it is, by definition. Rather, the second is a bit too short.
On average, since the leap second was introduced in 1972, one has been needed about every 18 months. Over the long term, that rate will increase as tidal acceleration slows the earth. 1 sec/18 months ~= 2e-8, so that's how much the second has been off on average since 1972. The atomic value for the second is 9,192,631,770 periods of the radiation corresponding to the transition between the two hyperfine levels of the ground state of the caesium 133 atom. So, a better value might have been 9,192,631,967, which would make us about even to date. (Although, since leap seconds aren't distributed evenly, they would still have occurred, both positive and negative, just not as many.) The original value was based on measurements made over less than 3 years, and has worked for some shorter periods (there were no leap seconds between 1999 and 2004, for example), but the value chosen has proven to be too short over the 40 years of leap seconds.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re:Haha (Score:5, Funny)
Enjoy your free operating system that was stopped by an extra second.
Yes, because we've NEVER seen Windows have problems dealing with things like Daylight Savings...
Re: (Score:2, Informative)
Windows Azure is DOWN AS WE SPEAK: http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx [msdn.com] ... congrats on paying for your non working OS without any indemnity either.
Re: (Score:3)
What OS are you running, which thinks it's February 29, AS WE SPEAK?
Re: (Score:2)
Congrats on your epic reading comprehension/what-day-is-today fail.
Re: (Score:2)
Yes, and Netflix was out for a lot of folks yesterday and today even though alot of folks pay for it, because of massive power outages due to storms. Are you going to bitch about that too because, "OMG, an Internet service went down!"
What's that got to due with a bug in an OS? Your story comprehension is as good as your reading comprehension.
Re:Haha (Score:5, Insightful)
Yeah it had the wrong time but did not freeze up. What's your excuse?
You're really trying that hard to troll huh?
A free operating system has a bug in it so you want to exaggerate the existence of the bug to show that free operating systems are inferior in such a condescending and acerbic way.
I guess that can work. It's not like there is any paid OS out there that has decades long histories of serious instability, security flaws, and badly implemented ideas...... so yeah, you're completely safe making such an arrogant argument.
Re: (Score:2)
Re:Haha (Score:5, Funny)
Windows: 95. Scene: LAN party. Game: Descent. Hilarity: All the Windows users cursing loudly as their computers spontaneously reboot for DST. DOS users get to feel smug for a change.
Windows has been boning DST as long as Windows has handled your RTC.
Re: (Score:2)
When daylight savings got shifted to the current longer format - which was within this past decade, mind you - millions of Outlook users discovered that many of their already-existing appointments were shifted by one hour. And, unless you knew when the appointment was created, you had no way of knowing the reported time was correct, or if it was off by an hour.
People had to deal with this for three bloody weeks, until the calendar ticked over to the "old" daylight savings start date. That's a bit more hassl
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
Isn't it always invalid? If the NTP server does a correction between the first call and the second call, then you could still get time not moving forward.
Re: (Score:3)
It is perfectly valid for two back to back calls to gettimeofday to return the same value. It can happen at any time if the calls are closer together than the granularity of the time.
It is unusual, but perfectly possible for the second result to be less than the first if the clock has been reset. It is bad form for a program to panic over that. However, to try to avoid problems and to make logs a LOT less confusing, NTP prefers to slew the time rather than make hard adjustment. That is, it speeds up or slow
Re: (Score:2)
dunno whats ubuntu based off of (debian)
Re: (Score:2)
Read the summary. This was fixed it appears in 2.6.29 kernels still used from before this will be old (and ironically considered "stable").
Modern Desktop distro's will be using 3.x which has had had this fixed for a long time. Ubuntu will be using 3.3 i would expect.
The only distros using old kernels will be some versions of Debian, CentOS and some based off it because this is a well tested stable kernel.
Re: (Score:2)
Re: (Score:2)
Debian stable (sqeeze) is using 3.0, you can take it out of the list. I don't know about old-stable, but if even Debian already upgraded, there isn't probably any important distro using it anymore.
Re: (Score:2)
I expect they patched the kernel at some stage and the only people running the kernel either have not updated in years or know how to apply the workarounds.
Quantum appeared to be concerned if desktop Ubuntu that he was using was at risk of crashing.
Re: (Score:2)
No, Debian stable (Squeeze) uses 2.6.32. Where are you getting your information? If you're getting it from some random VPS provider, be advised that you're probably not running a native Debian kernel. Also, this bug appears to affect that kernel and others beyond the indicated level, judging by continuous reports coming in throughout the day.
Re: (Score:2)
There shouldn't be any problems with operating systems at all. The only problems I know about are with critical real time systems which use GPS to set the time at both ends of an interface, but which slew the time by different rules.
Confirmed... (Score:2)
West by about 800'
Re: (Score:2)
Can't say I've noticed it. And I was using GPS navigation shortly after the leap second, so I think it would have been pretty obvious to me.
Re: (Score:2)
I also have an install of 12.04 (latest LTS) which I'm slowly transitioning to that runs 3.x so if that doesn't work I reckon 2/3 of the internet will fall over...
Re: (Score:2)
Re: (Score:2)
> ...it would not create any real problem for hundreds of years.
It need not ever create any problems. Put the leapseconds in zoneinfo. There is no more reason to jigger the system clock to deal with the fact that the planet's rate of rotation varies than there is to jigger it to deal with the fact that the sun rises at different times at different longitudes.
Re: (Score:2)
That would work just fine if leap seconds occured at regular, predictable intervals the way leap days do, but they don't. AFAIK, there's no way to know in advance when it's going to be appropriate to have one, so there's nothing that can be put into zoneinfo to deal with them.
If we were a space-based civilization, living in ships, space colonies and bases on moons, asteroids and so forth, there'd be no reason for leap seconds but we're not. As long as we're stuck on th
Re: (Score:2)
Re: (Score:2)
For a start we all live in kludged up timezones anyway. I could be as much as an hour from "the correct solar time". People in China could be three hours from that time, because they have fewer timezones.
Difference is Astronomers going postal on you (Score:2)
There's a reason for leap seconds, and Astronomers will go postal on you if you try to make your clocks too dumb so they no longer track the stars. If you can't implement leap seconds out of respect, do it out of fear, okay?
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Funny)
All my Java processes peg the CPU since the leap second, even if I restart them. Maybe a reboot will help...
So just like before, then?
Re: (Score:3, Informative)
Normally java is just in "waste memory" mode. Now it's "waste memory AND CPU".
Re:Android (Score:4, Interesting)