Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

The 25-Year-Old BSD Bug

Posted by Soulskill on Sun May 11, 2008 10:05 AM
from the better-late-than-never dept.
sproketboy writes with news that a developer named Marc Balmer has recently fixed a bug in a bit of BSD code which is roughly 25 years old. In addition to the OSnews summary, you can read Balmer's comments and a technical description of the bug. "This code will not work as expected when seeking to the second entry of a block where the first has been deleted: seekdir() calls readdir() which happily skips the first entry (it has inode set to zero), and advance to the second entry. When the user now calls readdir() to read the directory entry to which he just seekdir()ed, he does not get the second entry but the third. Much to my surprise I not only found this problem in all other BSDs or BSD derived systems like Mac OS X, but also in very old BSD versions. I first checked 4.4BSD Lite 2, and Otto confirmed it is also in 4.2BSD. The bug has been around for roughly 25 years or more."
+ -
story

Related Stories

[+] BSD: 33-Year-Old Unix Bug Fixed In OpenBSD 162 comments
Ste sends along the cheery little story of Otto Moerbeek, one of the OpenBSD developers, who recently found and fixed a 33-year-old buffer overflow bug in Yacc. "But if the stack is at maximum size, this will overflow if an entry on the stack is larger than the 16 bytes leeway my malloc allows. In the case of of C++ it is 24 bytes, so a SEGV occurred. Funny thing is that I traced this back to Sixth Edition UNIX, released in 1975."
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • more proof (Score:5, Funny)

    by Anonymous Coward on Sunday May 11 2008, @10:13AM (#23369210)
    ...of the superiority of Microsoft.
  • by Anonymous Coward on Sunday May 11 2008, @10:15AM (#23369214)
    but they had more important things to do. At least until Balmer started throwing chairs.
  • See? SEE? (Score:5, Funny)

    by Frosty Piss (770223) on Sunday May 11 2008, @10:21AM (#23369274)
    See? See?

    This is the power of Open Source!

    With all those eyes looking at the code, stuff like this gets ID'd and fixed LICKITY SPLIT!

    (runs and hides)

  • Trac (Score:5, Funny)

    by extirpater (132500) on Sunday May 11 2008, @10:31AM (#23369348)
    Bug tracking software missed this because it's bug #1. lol.
  • by quarrel (194077) on Sunday May 11 2008, @10:33AM (#23369362)
    The most telling thing in TFA for me was that the bug had been identified by the Samba team and a workaround implemented for Samba.

    Surely both the samba communities and the *BSD communities are active enough that this could have been passed on for further investigation by the *BSD crowd? (Sure, samba probably would still need a workaround, particularly given the long uptimes and widespread deployment of *BSDs)

    I know nothing of the devs at Samba and *BSD, but seems a bit strange. Perhaps they did try..

    Meanwhile, congrats to Marc on fixing a bug. One of the most touted benefits of open source (whatever your license) code.

    --Q
    • by id10ts (1147541) on Sunday May 11 2008, @12:16PM (#23369964)

      Yes, Samba did pass on what it found and it appears they were promptly shot down by someone on the *BSD side.

      The Samba e-mail archives contain a message from over 3 years ago [samba.org], but it doesn't give attribution to the *BSD source.

      The Samba Bugzilla also has a bug reported more recently [samba.org] involving the same issue. Reading through the bug history, you can see there was one FreeBSD dev involved in the bug discussion, and he referenced a prior conversation between Tridge (Samba) and PHK (FreeBSD) where PHK said there was no bug in FreeBSD.

  • by Doc Ruby (173196) on Sunday May 11 2008, @10:44AM (#23369420) Homepage Journal
    If you define BSD as a collection of bugs, this story proves that BSD is dying.
  • by Hawkeye477 (163893) on Sunday May 11 2008, @10:57AM (#23369486) Homepage
    After this long would this not be considered a feature? :)
  • by CaroKann (795685) on Sunday May 11 2008, @11:02AM (#23369520)
    Considering how old this bug is, and how much work-around code probably exists as a result, I wonder how many new bugs this bug fix will create.
    • by irc.goatse.cx troll (593289) on Sunday May 11 2008, @11:43AM (#23369774) Journal
      Most workarounds involve disabling caching. As a result they could re-enable caching for a performance increase, but leaving it off should have no iller effects with the patch than without.
      • by Guy Harris (3803) <guy@alum.mit.edu> on Sunday May 11 2008, @03:51PM (#23371440)

        So it seems like POSIX says, "this function is not guaranteed to work".

        Yes.

        Sounds like people were aware of the problem for a long time.

        People were aware that the notion of a directory being a sequence of entries, with each entry having a position such that you can get the position of an entry and later seek to that position and have the next read return that entry even if changes were made to the directory in the interim, was wrong.

        That doesn't mean that they were just trying to avoid having the standard imply that particular bug was fixed - it means that they were trying to avoid making a promise that some reasonable implementations of directories can't keep even if those implementations have no bugs.

  • Long live the Code (Score:5, Interesting)

    by GuldKalle (1065310) on Sunday May 11 2008, @12:37PM (#23370092)
    Am I the only one who thinks it's quite impressive to have 25 year old code still being used and employed on new systems?
    • Here is why (Score:5, Insightful)

      by Solr_Flare (844465) on Monday May 12 2008, @12:12AM (#23374706)
      BSD and the *nixes were designed to be simple, effective, modular operating systems. As long as you have the drivers and know how, you can easily port them over and install them on a variety of hardware. Then, thanks to their modular nature, you can then plug in all the extra bells and whistles you need for your particular system and go to town.

      That is why they are still around and still popular. They are K.I.S.S., work as they are supposed to, and the modular code that is plugged into them can just be sloughed away when it becomes out dated, and newer, better code plugged in to modernize the OS as you go.

      That's also why Windows has had so many problems over the years. Windows was designed to be everything you need in a single package. That means everything is all tied up together. So, unlike BSD and the *nixes, when part of the OS becomes out dated, MS can't just unplug the old stuff and plug in new stuff. It's all interlinked from the ground up. That means a large portion of development time getting is spent fixing bugs caused by new additions, which then cause even more problems down the line when you go to update again. It also makes it bloated as legacy code ends up stuck in the mix because without it the patched together additions wouldn't function right.

      And, unfortunately for MS, their market dominance is based on the windows "feel" being familiar and backwards compatibility. If they could, I'm sure they'd re-write windows from the ground up, but now they are in a catch 22 where doing so might significantly kill their market share.

      I'm guessing Bill and company sometimes look back and kick themselves for not having the guts to go for broke and re-do the OS from the ground up for Windows XP. Because, back then MS was still king, Apple was at its low point with a very small and stagnant market share, and the *nixes were still primarily a hard core enthusiast hobby. Today, if MS were to completely change Windows, they'd probably lose a significant amount of market share to a variety of alternatives.
  • by Solandri (704621) on Sunday May 11 2008, @01:30PM (#23370394)
    Is the MPEG Chroma bug [hometheaterhifi.com]. That was created by someone who wrote one of the original MPEG decoders that was eventually sold/distributed to most of the companies making the first DVD players (pre-1993). This one just won't go away either - initially most of the DVD manufacturers refused to acknowledge it even existed (probably because they didn't want to recall millions of DVD players with non-upgradeable firmware). I still see it every now and then on TV (indicating one of the upstream broadcasting companies is still using equipment afflicted with the bug). I notice it most often when diagonal red lines end up staircased like they're poorly interlaced (see pictures in the above link).
  • by m.dillon (147925) on Sunday May 11 2008, @04:16PM (#23371632) Homepage
    The problem is that the stdio directory scanning routines cache multiple directory entries with a single getdirentries() system call, but then may try to 'seek' into the middle of that buffer later on.

    Any filesystem based on a non-linear-file directory format, such as a B-Tree, will simply never produce consistent offsets or indices within such a buffer.

    The only way to *REALLY* fix this is to add a cookie field to the filesystem-independant dirent structure (and if your BSD isn't using a filesystem-independant dirent structure, it needs to be first fixed to do that). lseek()ing to a directory cookie works just fine, and always will (or at least will far more robustly then trying to scan a re-cached buffer from getdirentries()).

    When DragonFly went to a filesystem-independant dirent structure I very stupidly only added ~40 reserved bits to the dirent structure, instead of the 64 we need to properly implement per-entry directory cookies. I'm still pissed at myself for that gaff.

    In anycase, a per-entry directory cookie effectively solves the problem. The only other way to get such cookies, if it can't be embedded in the dirent structure, is to create a new system call similar to getdirentries() but which also populates an array of directory cookies. FreeBSD and DragonFly have kernel implementations of readdir which supply per-entry directory cookies so it is really just a matter of creating the new system call and then making libc use it.

    -Matt
  • The Perl program below demonstrates this bug. Tested only on OS X...

    #!/usr/bin/perl -w
    use strict;
    use File::Temp qw(tempdir);
    use File::Slurp qw(write_file read_dir);
    use Test::More tests => 9;
     
    # Create some temp files
    my $dir = tempdir(CLEANUP => 1);
    write_file("$dir/one", '1');
    write_file("$dir/two", '2');
    write_file("$dir/three", '3');
     
    # Confirm that the directory contains the files
    is_deeply([read_dir($dir)], ['one', 'three', 'two']);
     
    # Open a directory handle and read through all files
    opendir(my $dirh, $dir);
    is(scalar readdir($dirh), '.');
    is(scalar readdir($dirh), '..');
    my $file1 = readdir($dirh);
    is($file1, 'one');
    my $file2 = readdir($dirh);
    is($file2, 'three');
    # Record the position of the second file
    my $pos2 = telldir($dirh);
    my $file3 = readdir($dirh);
    is($file3, 'two');
     
    # Rewind to the second file's pos, and confirm that the next read is the third files
    seekdir($dirh, $pos2);
    is(scalar readdir($dirh), $file3);
     
    # Delete the first file and try the above test again. It *should* have the same results
    ok(unlink("$dir/$file1"));
    seekdir($dirh, $pos2);
    is(scalar readdir($dirh), $file3);
     
    closedir($dirh);
    The output of the program is:

    % perl bsdbug.pl
    1..9
    ok 1
    ok 2
    ok 3
    ok 4
    ok 5
    ok 6
    ok 7
    ok 8
    not ok 9
    # Failed test at bsdbug.pl line 30.
    # got: undef
    # expected: 'two'
    # Looks like you failed 1 test of 9.
  • by chrysalis (50680) on Monday May 12 2008, @04:30AM (#23375656) Homepage
    This is not the first time the OpenBSD time does an excellent job at finding obscure bugs that were lying around for one or two decades in every BSD derivative. Congratulations !
    • by ivan256 (17499) on Sunday May 11 2008, @10:16AM (#23369230)
      Of course, now that I've R'd the FA, I understand that it's the first entry in the block (of which a directory with a sufficient number of files would have multiple), and not the first entry in the directory. Kindly ignore my previous response... Nothing to see here...
    • by TheRaven64 (641858) on Sunday May 11 2008, @10:19AM (#23369258) Homepage Journal
      The first entry in a disk block, not the first entry in a directory. Each disk block is 512 bytes, so you can store around 30 files in there with shortish file names (each stores the name, the inode and a cumulative free space counter). For large directories you have around a one in 30 chance that the deleted file will be the first one in a block. Delete a dozen files from a large directory and there's a pretty large chance you'll hit this bug.
    • by DannyO152 (544940) on Sunday May 11 2008, @10:18AM (#23369246)
      How many "eyes" were watching BSD systems use Samba for a DOS filesystem? Seems to me, someone saw behavior and exactly because it was open source, looked into it, found the coding error and filed a bug report. It will be fixed, because everyone now knows about this, and that too is a side effect of open source, even if it's related to the politics.
    • by garett_spencley (193892) on Sunday May 11 2008, @10:18AM (#23369252) Journal
      From the sounds of it, this was a bug that was not triggered very often. When it was finally triggered, investigated and fixed the person who found it released the info publicly, thanks to the beauty of Open Source, and everyone affected, commercial entities and FOSS users using the code alike, benefited. If this were a proprietary system that were licensed out to various companies stricken by NDAs etc. it's quite likely that if one company discovered the bug the others would never learn about it.
    • by TheRaven64 (641858) on Sunday May 11 2008, @10:23AM (#23369290) Homepage Journal

      This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir. This is quite uncommon behaviour, and was incredibly uncommon 25 years ago when filesystems were much smaller and directories almost never contained enough files to require more than one or two disk blocks to store the directory.

      When the Samba people found it, they decided to just code a work-around and not bother to report it to any of the BSD teams. If they had done, it would probably have been fixed in 22 years.

      Now that it has been fixed in OpenBSD, the change can easily be taken and incorporated into FreeBSD, NetBSD, DragonFlyBSD and Darwin.

      • by pclminion (145572) on Sunday May 11 2008, @10:28AM (#23369326)

        This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir.

        But that's exactly my point, isn't it? The bug was only "visible" through its behavior, not its manifestation in code. The shallow bugs argument basically says that if enough people stare at the code, they will find the bugs. Clearly that did not happen here.

        Whether the bug fix can propagate rapidly has nothing to do with what I'm talking about. I'm not trying to disparage the concept of open source, I'm arguing that the shallow-bugs argument should be rejected.

        • by Peganthyrus (713645) on Sunday May 11 2008, @10:45AM (#23369430) Homepage
          You see what you're looking for, most of the time. This sounds like a subtle bug that you're not going to find until you go looking for it; it's hard to invoke under normal usage patterns. Nobody stared at that code looking for this problem until now. But if it was closed source, the guy who fixed it wouldn't have been able to look at it and find the problem.

          A quick googling of "many eyes make all bugs shallow" brings me the more complete statement that adage is simplified from: "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone." (Linus via ESR). Clearly this 25-year-old bug is one of the exceptions that calls for the 'almost'.
        • by Derek Pomery (2028) on Sunday May 11 2008, @10:49AM (#23369450)
          Erm. That's not what "many eyes make bugs shallow" means.
          Well. Just reading the source is part of it, but not all.
          Fact is, if I run into odd behaviour when testing/using - if the source is available I can read it, I can breakpoint.
          I cannot do that with a binary.

          So yes. Things did occur as they were supposed to. Someone found something odd, they were able to look at code in question, and fix it.

          The shallowness is the fact that there is a direct connection between the thousands of testers/users and the code in question.
          Instant turnaround. No "user reports behaviour in detailed fashion, including testcase, to some corporate e-mail address, and maybe it eventually gets a to a developer three layers down who may be able to figure it out and fix it if he has the time"
        • by LS (57954) on Sunday May 11 2008, @02:16PM (#23370684) Homepage
          Wuh? thousands of projects that prove the point, and one single bug that doesn't, so reject the whole argument?

          You sound like those people that don't like evolution. The concept of shallow bugs is an approximate description of how things work, not a methodology. Also, if enough people stare at the code and use it, they will find the bugs that matter.

      • by Jeremy Allison - Sam (8157) on Sunday May 11 2008, @05:05PM (#23372018) Homepage
        No, we did report it. The answer at the time was "this is allowed by POSIX, deal with it", can be seen in the bug report here :

        https://bugzilla.samba.org/show_bug.cgi?id=4715 [samba.org]

        I did point out that no other POSIX system behaved like that, but that didn't seem to make much difference :-). Eventually I just added a parameter that allowed our open directory cache to be turned off on *BSD. Once it got into the hands of Marc Balmer he took us seriously and fixed the bug.

        Jeremy.
    • by InvisblePinkUnicorn (1126837) on Sunday May 11 2008, @10:40AM (#23369408)
      This is like saying global warming either does exist because today was the hottest on record, or does not exist because today was the coldest on record. Why are these analogous? Because in both situations, you're only considering one data point, which does not even begin to indicate a trend.
        • by AndGodSed (968378) on Sunday May 11 2008, @12:15PM (#23369960) Homepage Journal
          True, but you misquoted the statement. The correct statement is not absolute. It reads, and I quote a guy called Linus:

          "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone."
          I am sure you will agree that the correct statement sans flamebait modifications does not warrant a "clear contradiction" as many detractors of FOSS who are jumping at this opportunity to point out a example of a fixed bug that was not necessarily a security risk and saying "see, the OSS model is clearly flawed! BSD has a 25year old bug that was only fixed now!"

          Heck, how many other bugs have been fixed over the years?

          These detracting arguments smack of FUD mongering...
    • 3 thoughts on this:

      1. I think this bug would be classified "archeological".

      2. The question now is what happens to the Samba work-around patches. Now that the bug is fixed, do the patches cause a side-effect (i.e. "a new bug")?

      3. This gives rise to a new meme of nerd insults. "You call yourself a programmer? Why I've fixed bugs older than you!" Of course, only one man is entitled to use that line.