The BBC Looks At Rollover Bugs, Past and Approaching 59

Posted by timothy on Tuesday May 05, 2015 @10:15AM from the ought-to-be-enough-for-anybody dept.

New submitter Merovech points out an article at the BBC which makes a good followup to the recent news (mentioned within) about a bug in Boeing's new 787. The piece explores various ways that rollover bugs in software have led to failures -- some of them truly disastrous, others just annoying. The 2038 bug is sure to bite some people; hopefully it will be even less of an issue than the Year 2000 rollover. From the article: It was in 1999 that I first wrote about this," comments [programmer William] Porquet. "I acquired the domain name 2038.org and at first it was very tongue-in-cheek. It was almost a piece of satire, a kind of an in-joke with a lot of computer boffins who say, 'oh yes we'll fix that in 2037' But then I realised there are actually some issues with this.

The BBC Looks At Rollover Bugs, Past and Approaching

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 59 Comments Log In/Create an Account

Comments Filter:

Ask Mel (Score:5, Funny)

by Sarten-X ( 1102295 ) writes: on Tuesday May 05, 2015 @10:24AM (#49620725) Homepage

It's not a rollover bug; It's a rollover feature [catb.org]!

- Re: (Score:2)
  
  by cant_get_a_good_nick ( 172131 ) writes:
  
  I already posted, so I can't mod... one of the coolest geek stories around.
You cant win... (Score:5, Insightful)

by jellomizer ( 103300 ) writes: on Tuesday May 05, 2015 @10:25AM (#49620727)

If you reuse code, you get rollover bugs.
If you start over from scratch you get brand new bugs.
Reusing the code, you have a lot of the issue from the past already fixed, so you are not introducing bugs that you had in the past.
Making new code, you can modernise the code set, so you don't run into particular troubled code, and is easier to follow.
Programmers are human beings, they make mistakes, they can't give 110% every day. Even the best of them will often have a stupid bug, that they can't believe that they had slip.

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  I personally have encountered rollover bugs not just in operating systems... but in machine firmware. One ultra-expensive piece of equipment had a firmware issue that would cause big issues if the machine stayed up for more than 18-24 months, so every so often, after a major patch for stuff was released, part of the maintenance window was physically shutting off an entire rack of stuff, depowering it (as in physically pulling the plugs), waiting 10-15 minutes for the capacitors to drain, then plugging it b
  - Re: (Score:2)
    
    by jim_deane ( 63059 ) writes:
    
    At 248 days on-line, the Boeing Dreamliner has a similar problem.
    http://www.wsj.com/articles/fa... [wsj.com]
    Not a problem I want to have at 40,000 feet (about 12 km).
- Re: (Score:2)
  
  by ArcadeMan ( 2766669 ) writes:
  
  If you're a programmer and you believe you can give 110% even for one day, you have other problems.
  - Re: (Score:2)
    
    by AndreiK ( 908718 ) writes:
    
    Sure you can. If you normally work for an hour a day, working an extra 6 minutes today means I gave 110%.
    - Re: (Score:2)
      
      by ArcadeMan ( 2766669 ) writes:
      
      If you normally work for an hour a day that means you're only giving ~0.041667% to begin with.
  - Re: (Score:2)
    
    by Mal-2 ( 675116 ) writes:
    
    Sure you can give 110%, just like an engine can deliver 110% of sustained power (which is how marine and aviation engines are rated, not instantaneous power). It just can't do that indefinitely, and it decreases reliability. Either service must be more frequent, or it's going to fail sooner, or both. Nonetheless, many marine engines are held in excess of 100% for hours or days on end.
    Similarly, if you take work output divided by calendar time, you can deliver 110%... for a while. If you try to sustain this,
OpenBSD is 2038-ready (Score:4, Informative)

by QuietLagoon ( 813062 ) writes: on Tuesday May 05, 2015 @10:28AM (#49620749)

Since the OpenBSD 5.5 release [openbsd.org] a year ago, the OS is fully ready for the onslaught of 2038.

- - Re: (Score:2)
    
    by QuietLagoon ( 813062 ) writes:
    
    Good to see proactive OS's taking care of this issue.
- So Is Mac OS X. (Score:5, Informative)
  
  by tlambert ( 566799 ) writes: on Tuesday May 05, 2015 @12:08PM (#49621669)
  
  So Is Mac OS X.
  I converted time_t to 64 bits on 64 bit systems (which include the most recent iPhones) as part of the changes for 64 bit binary support on the G5 when I wrote the 64 bit binary loader support into exec/fork/spawn, and again as part of UNIX Conformance. It's basically been fixed since Tiger.
  
  - - Re: (Score:2)
      
      by unrtst ( 777550 ) writes:
      
      OpenBSD now has a 64-bit time_t on 32-bit systems. time_t was always 64-bit on 64-bit systems on OpenBSD.
      IMO this is one of the things that is being mostly ignored. Back around 2000, many people were saying things along the lines of, "we'll all be on 64bit or larger systems by 2038, so it will solve itself". Many more people have ignorantly joined that line of thought, since almost all mainstream cpu's are 64bit now. That said,there are still a large number of 32bit cpu's being made (like almost every android device CPU there is, and most Apple iPhone/iPad things, and many of the chromebooks out there):
      All ARM
Comment removed (Score:5, Funny)

by account_deleted ( 4530225 ) writes: on Tuesday May 05, 2015 @10:31AM (#49620793)

Comment removed based on user account deletion

Volunteers (Score:3, Insightful)

by dfgfgfdgsgsdfsdf ( 4101383 ) writes: on Tuesday May 05, 2015 @10:37AM (#49620853)

Given that so much of the GNU/Linux code base is written by volunteers I wonder who it is exactly that is going to fix all of the code. I mean back when it was written Computer programming was much less of a gold rush. Nowadays everyone is competing for jobs that pay $120,000. Who is willing to go through all of the old code and fix it for free?

- Re: (Score:3)
  
  by Greyfox ( 87712 ) writes:
  
  Oh we went to a 64 bit time_t ages ago. You should just have to recompile, even if you use long instead of time_t. Assuming you ever upgraded your machine to a 64 bit platform, which won't be a problem for most people by 2038. Even the US military and NASA should be on 64 bit systems by then. So essentially we've already fixed the problem for Linux. Specific installations that don't upgrade might have some problems, but most of those systems won't last another couple of decades and will require replacement
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Specific installations that don't upgrade might have some problems, but most of those systems won't last another couple of decades and will require replacement sooner. Specific in-house software that was compiled 32 bits and the the source lost might also have problems.
    And also binary protocols where a timestamp is sent using 32bits on the wire. NTP for example. I'm sure there're dozens of others. Public ones should be fixed by 2038. In-house/proprietary ones could conceivable get missed if it's not well maintained or no-one knows the details of the protocol well enough to realise it's affected.
    NTP is actually a little odd in that it uses 32 unsigned bits of 64bits as seconds as seconds since 1st January 1900 so it's not a unix epoch. The result is NTP is affected 2 years
- Re:Volunteers (Score:4, Insightful)
  
  by belthize ( 990217 ) writes: on Tuesday May 05, 2015 @11:07AM (#49621123)
  
  Given that so much of the non-GNU/Linux code is written by paid programmers I wonder who it is exactly that is going to fix all the code. I mean back when it was written Computer programming was much less of a gold rush. Nowadays everyone is competing for jobs that pay $120,000. Who is willing to pay programmers to go through all of the old code to fix it.
  It's really not an issue. It's already fixed in OpenBSD. Certainly there's some user space code that also counts seconds since 1970 but if folks would simply start now there's no future fix necessary. The set of code written today which will be in use in 2038 will be vanishingly small. The remaining folks will pay some gray hair to knock it into shape. Missed code will make itself apparent sometime that Tuesday morning.
  
- Re: (Score:2)
  
  by jythie ( 914043 ) writes:
  
  People who could not get those 120k/year jobs but still want to work on something interesting?
  
  The higher the pay, typically, the more engaging the work. But there are far more programmers working on dull stuff for 60k/year that could probably use the hobby lets their brains turn to mush.
We encountered a similar bug (Score:2)

by Registered Coward v2 ( 447531 ) writes:

when we had an equipment malfunction and our data logger's time stamped data made no sense - one second we were recording values of x and the next second normal values. Turned out the daylight savings time switch had occurred during the incident and as a result all the time stamps and resulting data were screwed up.
- Re: (Score:2)
  
  by cant_get_a_good_nick ( 172131 ) writes:
  
  Why didn't/couldn't you use GMT?
  - Re: (Score:2)
    
    by Registered Coward v2 ( 447531 ) writes:
    
    Why didn't/couldn't you use GMT?
    Good question. We wondere tht ourselves why the idiots that programmed it used local times. Since they probably never operated a piece of equipment in their life they probably assumed we'd want local time but never asked; which illustrates the classic user / developer disconnect. Years later while on. Control room design project I had to tell developers that the all digital panel design they were so proud of was interesting, cool, futuristic and totally useless for actually operating a plant.. As a result
- Re: (Score:2)
  
  by Megane ( 129182 ) writes:
  
  I use MythTV as my DVR, and I use the OTA guide data from the broadcast signal. Twice a year I have to delete the entire contents of its guide database, because it doesn't handle the DST change properly. I don't know whether it's the guide data format itself not being able to handle it, or a bug in MythTV, but at least MythTV uses UTC time internally.
Cobol (Score:2)

by Anne Thwacks ( 531696 ) writes:

If you have any Cobol programs at risk of this, you had better act now, since the average age of Cobol programmers in 2038 will probably be over 80!
There is no chance whatever of the code being replaced if its working now, because no one will sign off a replacement if it still works.
- Re: (Score:2)
  
  by Viol8 ( 599362 ) writes:
  
  I know this goes against the myth of untouchable Cobol code hidden away that no one dares even look at - but bit by bit it is being replaced (by C++, java, C# whatever). Or at least in the companies I've worked in it was. One small sub section at a time with plenty of testing and a proper rollback plan.
- Assuming "american" programmers (Score:2)
  
  by tekrat ( 242117 ) writes:
  
  Yes, the average age of an American or Russian COBOL programmer will be 80 or over by 2038... However, you're discounting the Filipinos...
  Most of those kids are in their late 20's at most, so they'll be around in 2038. Assuming our Mainframe isn't phased out due to budget cuts, I suspect that the code written in 1980 will still be maintained and running well past 2038.
Why rollover? (Score:3)

by jovius ( 974690 ) writes: on Tuesday May 05, 2015 @11:04AM (#49621087)

Isn't mouseover the modern term?

- Re: (Score:2)
  
  by Coren22 ( 1625475 ) writes:
  
  Can you even buy mice with balls anymore?
Windows one is my fave (Score:3, Informative)

by cant_get_a_good_nick ( 172131 ) writes: on Tuesday May 05, 2015 @11:05AM (#49621095)

There was a counter in Windows that rolled over after 28 days I think (like the 787 bug, but 1000 ticks.second not 100).
Even Microsoft knew that no Windows box could stay up that long.
(And before you mod me as a troll, think about it and know that MS could have made a bigger counter, but didn't feel the need to)

- Re:Windows one is my fave (Score:4, Informative)
  
  by singularity ( 2031 ) * writes: <nowalmart@nOSpAM.gmail.com> on Tuesday May 05, 2015 @11:44AM (#49621411) Homepage Journal
  
  The version of Windows was Windows 95, and the number of days was 49.7.
  https://support.microsoft.com/... [microsoft.com]
  
- Re: (Score:1)
  
  by operagost ( 62405 ) writes:
  
  It was Windows 95 and 98, and the rollover happened at 49.7 days.
  And yes, you are a troll because it's quite easily explained as a garden variety mistake due to careless programming. An unsigned 32 bit integer can hold up to 4 billion. 4 billion milliseconds is about 49.7 days. 4 billion sounds "big enough"-- but it isn't when we're talking milliseconds. And clearly, a Windows box COULD stay up that long, or else the bug would never have been discovered.
  - Re: (Score:1)
    
    by Greystripe ( 1985692 ) writes:
    
    Considering it was still there in Windows 98 apparently a Windows 95 box couldn't stay up that long.
2038 is working itself out already (Score:3)

by jandrese ( 485 ) writes: <kensama@vt.edu> on Tuesday May 05, 2015 @11:06AM (#49621111) Homepage Journal

Several years ago I was really concerned about the 2038 rollover because so many protocols have hard baked 32 bit timestamp fields in them. Even if systems were updated the protocols might not be. But I've come to realize that once the systems are updated, the protocol tend to follow suit in the next revision, and in the next 23 years pretty much every protocol is going to go through at least one revision. There are still going to be a few holdouts that have trouble in 2038, but I'm expecting it to be as much of an event as the year 2000. A few fringe things act weird or even stop working, but pretty much everything important is OK.

- Re:2038 is working itself out already (Score:4, Insightful)
  
  by omglolbah ( 731566 ) writes: on Tuesday May 05, 2015 @11:10AM (#49621151)
  
  In the business I work "profibus" is considered a "new" technology. The standard was published in 1989.
  We still run a token ring coax network for most critical systems on a significant part of the oil rigs in the North sea and on onshore installations supporting them.
  Some of the controllers are 20 years old and just milling along happily. We did a replacement of NVRAM recently and that is all the service the modules need.
  I fully expect this crud to still be in use in 20 years. Conservative bastards >.
  
  - Re:2038 is working itself out already (Score:5, Insightful)
    
    by Viol8 ( 599362 ) writes: on Tuesday May 05, 2015 @11:16AM (#49621189) Homepage
    
    If the hardware is still fully operational after 20 years in a hostile enviroment like an oil rig I'd say its anything but "crud". It was probably some of the best kit on the market.
    This might come as a shock but a lot of businesses want kit that Just Works reliably 24/7, not the latest trendy junk that would impress a Hipster cycling past on his fixie bike but lasts about 5 minutes in the real world.
    
    - Re:2038 is working itself out already (Score:5, Interesting)
      
      by omglolbah ( 731566 ) writes: on Tuesday May 05, 2015 @12:01PM (#49621589)
      
      Oh it is good gear, but the list of 'bugs' and 'erratas' on the gear is growing longer and longer for every month it stays in service. Spare parts are almost impossible to come by, and even the toolchain needed to update the programs are old enough to require special dedicated workstations.
      It is not a matter of 'working' it is a matter of 'will work in the future'. Right now all the gear has reached "end of life" and spare parts are very close to being "ebay if you're lucky" in terms of procurement. Trying to get the customer to upgrade BEFORE we're already screwed and have to 'rush' an upgrade is the game we're in now.
      Doing a 3 year project in 6 months (while in some cases doable..) leads to badly rushed design and future redesigns. We've seen this over and over in the past 10 years.
      An example is that the new hardware has built in EX barriers on each channel, the termination boards are much better and a variety of other improvements. This translates into -4- massive cabinets being reduced to one. Real-estate offshore is hugely expensive and this would save staggering amounts of money compared to expanding equipment rooms... but they want the stuff they're used to, not the stuff that is current.
      The hilarity of the whole thing is that the 'current' stuff is now installed all over the rig where old hardware is not available so now we have both systems running in parallel with a ton of 'interfacing' and single points of failure introduced as a result.
      It can drive an engineer mad.
      
    - Re: (Score:2)
      
      by omglolbah ( 731566 ) writes:
      
      Oh.. and the environment is not very hostile. Everything is fully battery backed, fully environmentally shielded and there are virtually no vibrations reaching the room.
      Hell, after 20 years in operation the room hardly has dust anywhere. The controllers look brand new when inspected.
      I love working with the system as is, but trying to shoe-horn the new system requirements into the existing hardware is tricky at best. We're running all our data over a 2mbit token ring network.
    - Re: (Score:2)
      
      by tlhIngan ( 30335 ) writes:
      
      If the hardware is still fully operational after 20 years in a hostile enviroment like an oil rig I'd say its anything but "crud". It was probably some of the best kit on the market.
      Yeah, but it's now unsupported kit and who knows if there are rollover issues? It already ran 20 years, so it's conceivable it will run another 20+ years and hit the 2038 bug, then what? And catching this bug is a lot more subtle than the y2k bug.
      We've already run into rollover issues - on an old processor board that people are
- Re: (Score:2)
  
  by aNonnyMouseCowered ( 2693969 ) writes:
  
  This could be a problem when running simulation software. You set the date to 2040 and bam your VM crashes.
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
Have fun with that... (Score:1)

by Anonymous Coward writes:

...because I'm going to be in a fucking jar on a shelf in 2038.
- Re: (Score:2)
  
  by Demena ( 966987 ) writes:
  
  Alive or dead?
Y2K was -not- a small issue (Score:5, Insightful)

by mccalli ( 323026 ) writes: on Tuesday May 05, 2015 @12:15PM (#49621739) Homepage

The reason so little went wrong is because people spent ages testing and upgrading/fixing beforehand. Had we left it all to 1st Jan 2000 there would have been issues,

It annoys me to see Y2K trotted out time and time again as a non-event. It was a very big event, and by the large part it was very successfully handled.

- Re: (Score:2)
  
  by PPH ( 736903 ) writes:
  
  Had we left it all to 1st Jan 2000
  I burned a lot of midnight oil during the month of January, 19100.
A Trivial Issue (Score:1)

by CAOgdin ( 984672 ) writes:

I love the way that the entire world-o'-geeks gets upset about a triviality. What prevents future *NIX releases from changing the "base date" to, say, 2010, and changing all the dependent modules to compute properly? Are there to be no future *NIX releases between now an 2038?

Are there programmers who, in their cleverness, have use primitive code that still relies on the older base date without reference to the underlying O.S.? Sure. But, change the base date soon, and all their bugs will appear LONG
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  You're the kind of guy who knows just enough to be incredibly dangerous.
  The epoch is baked into so many algorithms that only a complete idiot would consider changing it. It's also standardized by POSIX, which is why so many algorithms rely on it for calendar manipulation. In POSIX each day is precisely 86400 "seconds" (regardless of leap second), which makes calendar computation easy, without resort to a complex library.
  The proper thing to do is simply change the type of time_t to 64-bits, even on 32-bit sy
  - Re: (Score:2)
    
    by Demena ( 966987 ) writes:
    
    Someone had to say that first sentence. I do not have mod points so, my compliments.
Case Study and Analysis of Ariane 5 .. (Score:2)

by DougPaulson ( 4034537 ) writes:

"I question a management culture that appeared to assume that the rocket’s inertial reference system that functioned perfectly when fitted into Ariane4 would achieve the same results when fitted into Ariane5." ref [lsbu.ac.uk]
The 2038 bug may show up early (Score:2)

by Megane ( 129182 ) writes:

Thanks to the math required for date conversion, the 2038 bug may actually show up a couple of years early. How do I know? I tried setting the clock forward in an embedded system I wrote the code for. Its calendar actually seems to fail in 2036. I haven't tried it in a while, but I think I can't even set the date past January 2036. I didn't try to figure out exactly why it failed earlier than it should have, because the library code looks pretty messy.
It's using the standard date library stuff from the IAR

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ask Mel (Score:5, Funny)

Re: (Score:2)

You cant win... (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

OpenBSD is 2038-ready (Score:4, Informative)

Re: (Score:2)

So Is Mac OS X. (Score:5, Informative)

Re: (Score:2)

Comment removed (Score:5, Funny)

Volunteers (Score:3, Insightful)

Re: (Score:3)

Re: (Score:1)

Re:Volunteers (Score:4, Insightful)

Re: (Score:2)

We encountered a similar bug (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Cobol (Score:2)

Re: (Score:2)

Assuming "american" programmers (Score:2)

Why rollover? (Score:3)

Re: (Score:2)

Windows one is my fave (Score:3, Informative)

Re:Windows one is my fave (Score:4, Informative)

Re: (Score:1)

Re: (Score:1)

2038 is working itself out already (Score:3)

Re:2038 is working itself out already (Score:4, Insightful)

Re:2038 is working itself out already (Score:5, Insightful)

Re:2038 is working itself out already (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Have fun with that... (Score:1)

Re: (Score:2)

Y2K was -not- a small issue (Score:5, Insightful)

Re: (Score:2)

A Trivial Issue (Score:1)

Re: (Score:1)

Re: (Score:2)

Case Study and Analysis of Ariane 5 .. (Score:2)

The 2038 bug may show up early (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals