sproketboy writes with news that a developer named Marc Balmer has recently fixed a bug in a bit of BSD code which is roughly 25 years old. In addition to the OSnews summary, you can read Balmer's comments and a technical description of the bug.
"This code will not work as expected when seeking to the second entry of a block where the first has been deleted: seekdir() calls readdir() which happily skips the first entry (it has inode set to zero), and advance to the second entry. When the user now calls readdir() to read the directory entry to which he just seekdir()ed, he does not get the second entry but the third. Much to my surprise I not only found this problem in all other BSDs or BSD derived systems like Mac OS X, but also in very old BSD versions. I first checked 4.4BSD Lite 2, and Otto confirmed it is also in 4.2BSD. The bug has been around for roughly 25 years or more."
If you're going to work in Open Source projects related to Operating Systems, stay away. The dreaded "trade secrets" accusation could ruin your whole career.
The most telling thing in TFA for me was that the bug had been identified by the Samba team and a workaround implemented for Samba.
Surely both the samba communities and the *BSD communities are active enough that this could have been passed on for further investigation by the *BSD crowd? (Sure, samba probably would still need a workaround, particularly given the long uptimes and widespread deployment of *BSDs)
I know nothing of the devs at Samba and *BSD, but seems a bit strange. Perhaps they did try..
Meanwhile, congrats to Marc on fixing a bug. One of the most touted benefits of open source (whatever your license) code.
Yes, Samba did pass on what it found and it appears they were promptly shot down by someone on the *BSD side.
The Samba e-mail archives contain a message from over 3 years ago [samba.org], but it doesn't give attribution to the *BSD source.
The Samba Bugzilla also has a bug reported more recently [samba.org] involving the same issue. Reading through the bug history, you can see there was one FreeBSD dev involved in the bug discussion, and he referenced a prior conversation between Tridge (Samba) and PHK (FreeBSD) where PHK said there was no bug in FreeBSD.
Most workarounds involve disabling caching. As a result they could re-enable caching for a performance increase, but leaving it off should have no iller effects with the patch than without.
So it seems like POSIX says, "this function is not guaranteed to work".
Yes.
Sounds like people were aware of the problem for a long time.
People were aware that the notion of a directory being a sequence of entries, with each entry having a position such that you can get the position of an entry and later seek to that position and have the next read return that entry even if changes were made to the directory in the interim, was wrong.
That doesn't mean that they were just trying to avoid having the standard imply that particular bug was fixed - it means that they were trying to avoid making a promise that some reasonable implementations of directories can't keep even if those implementations have no bugs.
BSD and the *nixes were designed to be simple, effective, modular operating systems. As long as you have the drivers and know how, you can easily port them over and install them on a variety of hardware. Then, thanks to their modular nature, you can then plug in all the extra bells and whistles you need for your particular system and go to town.
That is why they are still around and still popular. They are K.I.S.S., work as they are supposed to, and the modular code that is plugged into them can just be sloughed away when it becomes out dated, and newer, better code plugged in to modernize the OS as you go.
That's also why Windows has had so many problems over the years. Windows was designed to be everything you need in a single package. That means everything is all tied up together. So, unlike BSD and the *nixes, when part of the OS becomes out dated, MS can't just unplug the old stuff and plug in new stuff. It's all interlinked from the ground up. That means a large portion of development time getting is spent fixing bugs caused by new additions, which then cause even more problems down the line when you go to update again. It also makes it bloated as legacy code ends up stuck in the mix because without it the patched together additions wouldn't function right.
And, unfortunately for MS, their market dominance is based on the windows "feel" being familiar and backwards compatibility. If they could, I'm sure they'd re-write windows from the ground up, but now they are in a catch 22 where doing so might significantly kill their market share.
I'm guessing Bill and company sometimes look back and kick themselves for not having the guts to go for broke and re-do the OS from the ground up for Windows XP. Because, back then MS was still king, Apple was at its low point with a very small and stagnant market share, and the *nixes were still primarily a hard core enthusiast hobby. Today, if MS were to completely change Windows, they'd probably lose a significant amount of market share to a variety of alternatives.
Is the MPEG Chroma bug [hometheaterhifi.com]. That was created by someone who wrote one of the original MPEG decoders that was eventually sold/distributed to most of the companies making the first DVD players (pre-1993). This one just won't go away either - initially most of the DVD manufacturers refused to acknowledge it even existed (probably because they didn't want to recall millions of DVD players with non-upgradeable firmware). I still see it every now and then on TV (indicating one of the upstream broadcasting companies is still using equipment afflicted with the bug). I notice it most often when diagonal red lines end up staircased like they're poorly interlaced (see pictures in the above link).
The problem is that the stdio directory scanning routines cache multiple directory entries with a single getdirentries() system call, but then may try to 'seek' into the middle of that buffer later on.
Any filesystem based on a non-linear-file directory format, such as a B-Tree, will simply never produce consistent offsets or indices within such a buffer.
The only way to *REALLY* fix this is to add a cookie field to the filesystem-independant dirent structure (and if your BSD isn't using a filesystem-independant dirent structure, it needs to be first fixed to do that). lseek()ing to a directory cookie works just fine, and always will (or at least will far more robustly then trying to scan a re-cached buffer from getdirentries()).
When DragonFly went to a filesystem-independant dirent structure I very stupidly only added ~40 reserved bits to the dirent structure, instead of the 64 we need to properly implement per-entry directory cookies. I'm still pissed at myself for that gaff.
In anycase, a per-entry directory cookie effectively solves the problem. The only other way to get such cookies, if it can't be embedded in the dirent structure, is to create a new system call similar to getdirentries() but which also populates an array of directory cookies. FreeBSD and DragonFly have kernel implementations of readdir which supply per-entry directory cookies so it is really just a matter of creating the new system call and then making libc use it.
The Perl program below demonstrates this bug. Tested only on OS X...
#!/usr/bin/perl -w use strict; use File::Temp qw(tempdir); use File::Slurp qw(write_file read_dir); use Test::More tests => 9;
# Create some temp files my $dir = tempdir(CLEANUP => 1); write_file("$dir/one", '1'); write_file("$dir/two", '2'); write_file("$dir/three", '3');
# Confirm that the directory contains the files is_deeply([read_dir($dir)], ['one', 'three', 'two']);
# Open a directory handle and read through all files opendir(my $dirh, $dir); is(scalar readdir($dirh), '.'); is(scalar readdir($dirh), '..'); my $file1 = readdir($dirh); is($file1, 'one'); my $file2 = readdir($dirh); is($file2, 'three'); # Record the position of the second file my $pos2 = telldir($dirh); my $file3 = readdir($dirh); is($file3, 'two');
# Rewind to the second file's pos, and confirm that the next read is the third files seekdir($dirh, $pos2); is(scalar readdir($dirh), $file3);
# Delete the first file and try the above test again. It *should* have the same results ok(unlink("$dir/$file1")); seekdir($dirh, $pos2); is(scalar readdir($dirh), $file3);
closedir($dirh);
The output of the program is:
% perl bsdbug.pl 1..9 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 not ok 9 # Failed test at bsdbug.pl line 30. # got: undef # expected: 'two' # Looks like you failed 1 test of 9.
This is not the first time the OpenBSD time does an excellent job at finding obscure bugs that were lying around for one or two decades in every BSD derivative. Congratulations !
Of course, now that I've R'd the FA, I understand that it's the first entry in the block (of which a directory with a sufficient number of files would have multiple), and not the first entry in the directory. Kindly ignore my previous response... Nothing to see here...
The first entry in a disk block, not the first entry in a directory. Each disk block is 512 bytes, so you can store around 30 files in there with shortish file names (each stores the name, the inode and a cumulative free space counter). For large directories you have around a one in 30 chance that the deleted file will be the first one in a block. Delete a dozen files from a large directory and there's a pretty large chance you'll hit this bug.
How many "eyes" were watching BSD systems use Samba for a DOS filesystem? Seems to me, someone saw behavior and exactly because it was open source, looked into it, found the coding error and filed a bug report. It will be fixed, because everyone now knows about this, and that too is a side effect of open source, even if it's related to the politics.
From the sounds of it, this was a bug that was not triggered very often. When it was finally triggered, investigated and fixed the person who found it released the info publicly, thanks to the beauty of Open Source, and everyone affected, commercial entities and FOSS users using the code alike, benefited. If this were a proprietary system that were licensed out to various companies stricken by NDAs etc. it's quite likely that if one company discovered the bug the others would never learn about it.
You think this only happens in the open source world?
Let me show you what the "defect priority analysis" would look like at my work were we to receive a report about this bug:
Reproducible: Yes
Frequency of occurrence: Extremely low, only comes manifests for a very rare corner case.
Systems known to be impacted: None, systems that have noticed defect previously have already implemented a workaround.
Current known impact upon the functionality of the system: None
Systems currently using code where defect is present with no impact: All systems accessing a directory
Potential negative impact of an incorrect fix: Extremely high, potentially crippling filesystem traversal.
Proposed solution: Wait till people stop using DOS filesystems.
This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir. This is quite uncommon behaviour, and was incredibly uncommon 25 years ago when filesystems were much smaller and directories almost never contained enough files to require more than one or two disk blocks to store the directory.
When the Samba people found it, they decided to just code a work-around and not bother to report it to any of the BSD teams. If they had done, it would probably have been fixed in 22 years.
Now that it has been fixed in OpenBSD, the change can easily be taken and incorporated into FreeBSD, NetBSD, DragonFlyBSD and Darwin.
This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir.
But that's exactly my point, isn't it? The bug was only "visible" through its behavior, not its manifestation in code. The shallow bugs argument basically says that if enough people stare at the code, they will find the bugs. Clearly that did not happen here.
Whether the bug fix can propagate rapidly has nothing to do with what I'm talking about. I'm not trying to disparage the concept of open source, I'm arguing that the shallow-bugs argument should be rejected.
You see what you're looking for, most of the time. This sounds like a subtle bug that you're not going to find until you go looking for it; it's hard to invoke under normal usage patterns. Nobody stared at that code looking for this problem until now. But if it was closed source, the guy who fixed it wouldn't have been able to look at it and find the problem.
A quick googling of "many eyes make all bugs shallow" brings me the more complete statement that adage is simplified from: "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone." (Linus via ESR). Clearly this 25-year-old bug is one of the exceptions that calls for the 'almost'.
Erm. That's not what "many eyes make bugs shallow" means. Well. Just reading the source is part of it, but not all. Fact is, if I run into odd behaviour when testing/using - if the source is available I can read it, I can breakpoint. I cannot do that with a binary.
So yes. Things did occur as they were supposed to. Someone found something odd, they were able to look at code in question, and fix it.
The shallowness is the fact that there is a direct connection between the thousands of testers/users and the code in question. Instant turnaround. No "user reports behaviour in detailed fashion, including testcase, to some corporate e-mail address, and maybe it eventually gets a to a developer three layers down who may be able to figure it out and fix it if he has the time"
Wuh? thousands of projects that prove the point, and one single bug that doesn't, so reject the whole argument?
You sound like those people that don't like evolution. The concept of shallow bugs is an approximate description of how things work, not a methodology. Also, if enough people stare at the code and use it, they will find the bugs that matter.
I did point out that no other POSIX system behaved like that, but that didn't seem to make much difference:-). Eventually I just added a parameter that allowed our open directory cache to be turned off on *BSD. Once it got into the hands of Marc Balmer he took us seriously and fixed the bug.
This is like saying global warming either does exist because today was the hottest on record, or does not exist because today was the coldest on record. Why are these analogous? Because in both situations, you're only considering one data point, which does not even begin to indicate a trend.
True, but you misquoted the statement. The correct statement is not absolute. It reads, and I quote a guy called Linus:
"Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone."
I am sure you will agree that the correct statement sans flamebait modifications does not warrant a "clear contradiction" as many detractors of FOSS who are jumping at this opportunity to point out a example of a fixed bug that was not necessarily a security risk and saying "see, the OSS model is clearly flawed! BSD has a 25year old bug that was only fixed now!"
Heck, how many other bugs have been fixed over the years?
These detracting arguments smack of FUD mongering...
This is touted as a slip up (or flaw) in the Open Source model of doing things, yet a proprietary software developer, and one of the largest mind you, failed to spot this completely.
It doesn't let you hide an executable. Something like ls, which iterates over the directory entries, will see the executable. It's only things which maintain an internal cache of directory entries which will have the problem. The 'fix' for Samba was to turn off directory caching and this made all of the files visible again.
1. I think this bug would be classified "archeological".
2. The question now is what happens to the Samba work-around patches. Now that the bug is fixed, do the patches cause a side-effect (i.e. "a new bug")?
3. This gives rise to a new meme of nerd insults. "You call yourself a programmer? Why I've fixed bugs older than you!" Of course, only one man is entitled to use that line.
more proof (Score:5, Funny)
Re:more proof (Score:4, Funny)
Parent
Re:more proof (Score:5, Interesting)
Nothing shady about it either; that's the beauty of BSD code.
Parent
the developers probably knew about it (Score:5, Funny)
Re:the developers probably knew about it (Score:5, Funny)
Such as dodging chairs.
Parent
See? SEE? (Score:5, Funny)
This is the power of Open Source!
With all those eyes looking at the code, stuff like this gets ID'd and fixed LICKITY SPLIT!
(runs and hides)
Re:See? SEE? (Score:5, Insightful)
In comparison, Microsoft has been around for what... 20 years? And who knows what bugs in Windows are there, lurking, just waiting to bite us?
Parent
Re:Old Code (Score:5, Informative)
Only 4 years ago.
Parent
Re:Old Code (Score:4, Informative)
analysis of th win2000 source code
Parent
Don't click that link!!! (Score:5, Insightful)
Parent
Trac (Score:5, Funny)
Re:Trac (Score:4, Funny)
Parent
Samba knew, but didn't pass it on? (Score:5, Insightful)
Surely both the samba communities and the *BSD communities are active enough that this could have been passed on for further investigation by the *BSD crowd? (Sure, samba probably would still need a workaround, particularly given the long uptimes and widespread deployment of *BSDs)
I know nothing of the devs at Samba and *BSD, but seems a bit strange. Perhaps they did try..
Meanwhile, congrats to Marc on fixing a bug. One of the most touted benefits of open source (whatever your license) code.
--Q
Re:Samba knew, but didn't pass it on? (Score:5, Interesting)
Yes, Samba did pass on what it found and it appears they were promptly shot down by someone on the *BSD side.
The Samba e-mail archives contain a message from over 3 years ago [samba.org], but it doesn't give attribution to the *BSD source.
The Samba Bugzilla also has a bug reported more recently [samba.org] involving the same issue. Reading through the bug history, you can see there was one FreeBSD dev involved in the bug discussion, and he referenced a prior conversation between Tridge (Samba) and PHK (FreeBSD) where PHK said there was no bug in FreeBSD.
Parent
BSD is Dying! (Score:5, Funny)
Re:BSD is Dying! (Score:4, Funny)
Parent
After this long (Score:4, Funny)
Should it be fixed? (Score:5, Insightful)
Re:Should it be fixed? (Score:5, Informative)
Parent
Re:Should it be fixed? (Score:4, Interesting)
Yes.
People were aware that the notion of a directory being a sequence of entries, with each entry having a position such that you can get the position of an entry and later seek to that position and have the next read return that entry even if changes were made to the directory in the interim, was wrong.
That doesn't mean that they were just trying to avoid having the standard imply that particular bug was fixed - it means that they were trying to avoid making a promise that some reasonable implementations of directories can't keep even if those implementations have no bugs.
Parent
Long live the Code (Score:5, Interesting)
Here is why (Score:5, Insightful)
That is why they are still around and still popular. They are K.I.S.S., work as they are supposed to, and the modular code that is plugged into them can just be sloughed away when it becomes out dated, and newer, better code plugged in to modernize the OS as you go.
That's also why Windows has had so many problems over the years. Windows was designed to be everything you need in a single package. That means everything is all tied up together. So, unlike BSD and the *nixes, when part of the OS becomes out dated, MS can't just unplug the old stuff and plug in new stuff. It's all interlinked from the ground up. That means a large portion of development time getting is spent fixing bugs caused by new additions, which then cause even more problems down the line when you go to update again. It also makes it bloated as legacy code ends up stuck in the mix because without it the patched together additions wouldn't function right.
And, unfortunately for MS, their market dominance is based on the windows "feel" being familiar and backwards compatibility. If they could, I'm sure they'd re-write windows from the ground up, but now they are in a catch 22 where doing so might significantly kill their market share.
I'm guessing Bill and company sometimes look back and kick themselves for not having the guts to go for broke and re-do the OS from the ground up for Windows XP. Because, back then MS was still king, Apple was at its low point with a very small and stagnant market share, and the *nixes were still primarily a hard core enthusiast hobby. Today, if MS were to completely change Windows, they'd probably lose a significant amount of market share to a variety of alternatives.
Parent
Another common old bug (Score:5, Interesting)
Need a new system call to really fix this. (Score:5, Interesting)
Any filesystem based on a non-linear-file directory format, such as a B-Tree, will simply never produce consistent offsets or indices within such a buffer.
The only way to *REALLY* fix this is to add a cookie field to the filesystem-independant dirent structure (and if your BSD isn't using a filesystem-independant dirent structure, it needs to be first fixed to do that). lseek()ing to a directory cookie works just fine, and always will (or at least will far more robustly then trying to scan a re-cached buffer from getdirentries()).
When DragonFly went to a filesystem-independant dirent structure I very stupidly only added ~40 reserved bits to the dirent structure, instead of the 64 we need to properly implement per-entry directory cookies. I'm still pissed at myself for that gaff.
In anycase, a per-entry directory cookie effectively solves the problem. The only other way to get such cookies, if it can't be embedded in the dirent structure, is to create a new system call similar to getdirentries() but which also populates an array of directory cookies. FreeBSD and DragonFly have kernel implementations of readdir which supply per-entry directory cookies so it is really just a matter of creating the new system call and then making libc use it.
-Matt
Demonstration of the bug (Score:5, Insightful)
OpenBSD is good at fixing this kind of bug (Score:4, Insightful)
Re:Wait... Would you ever hit this? (Score:5, Funny)
Parent
Re:Wait... Would you ever hit this? (Score:5, Informative)
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
Parent
Re:Many eyes make bugs shallow... (Score:5, Interesting)
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
Parent
Re:Many eyes make bugs shallow... (Score:5, Informative)
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
Parent
Re:Many eyes make bugs shallow... (Score:5, Informative)
This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir. This is quite uncommon behaviour, and was incredibly uncommon 25 years ago when filesystems were much smaller and directories almost never contained enough files to require more than one or two disk blocks to store the directory.
When the Samba people found it, they decided to just code a work-around and not bother to report it to any of the BSD teams. If they had done, it would probably have been fixed in 22 years.
Now that it has been fixed in OpenBSD, the change can easily be taken and incorporated into FreeBSD, NetBSD, DragonFlyBSD and Darwin.
Parent
Re:Many eyes make bugs shallow... (Score:4, Interesting)
This bug has been around for a long time, but is only visible if you have large directories and delete files from them in between calls to readdir and seekdir.
But that's exactly my point, isn't it? The bug was only "visible" through its behavior, not its manifestation in code. The shallow bugs argument basically says that if enough people stare at the code, they will find the bugs. Clearly that did not happen here.
Whether the bug fix can propagate rapidly has nothing to do with what I'm talking about. I'm not trying to disparage the concept of open source, I'm arguing that the shallow-bugs argument should be rejected.
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
A quick googling of "many eyes make all bugs shallow" brings me the more complete statement that adage is simplified from: "Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone." (Linus via ESR). Clearly this 25-year-old bug is one of the exceptions that calls for the 'almost'.
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
Well. Just reading the source is part of it, but not all.
Fact is, if I run into odd behaviour when testing/using - if the source is available I can read it, I can breakpoint.
I cannot do that with a binary.
So yes. Things did occur as they were supposed to. Someone found something odd, they were able to look at code in question, and fix it.
The shallowness is the fact that there is a direct connection between the thousands of testers/users and the code in question.
Instant turnaround. No "user reports behaviour in detailed fashion, including testcase, to some corporate e-mail address, and maybe it eventually gets a to a developer three layers down who may be able to figure it out and fix it if he has the time"
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
You sound like those people that don't like evolution. The concept of shallow bugs is an approximate description of how things work, not a methodology. Also, if enough people stare at the code and use it, they will find the bugs that matter.
Parent
Re:Many eyes make bugs shallow... (Score:5, Informative)
https://bugzilla.samba.org/show_bug.cgi?id=4715 [samba.org]
I did point out that no other POSIX system behaved like that, but that didn't seem to make much difference
Jeremy.
Parent
Re:Many eyes make bugs shallow... (Score:5, Insightful)
Parent
Re:Many eyes make bugs shallow... (Score:4, Insightful)
Heck, how many other bugs have been fixed over the years?
These detracting arguments smack of FUD mongering...
Parent
Re:yeah, sure (Score:4, Insightful)
This is touted as a slip up (or flaw) in the Open Source model of doing things, yet a proprietary software developer, and one of the largest mind you, failed to spot this completely.
Parent
Re:Many eyes make bugs shallow... (Score:4, Insightful)
BSD has been checked over by 'quality' eyes--when it was used as the basis of NeXT/OSX, for example. They missed it too.
If the code wasn't open (i.e. if there weren't many eyes), this bug would have remained forever, or at least until the code was dumped.
Parent
Re:Many eyes make bugs shallow... (Score:4, Funny)
Parent
Re:Many eyes make bugs shallow... (Score:5, Funny)
Parent
Re:Many eyes make bugs shallow... (Score:4, Interesting)
I mean, a burglar can sue the owner of the property they're burgling for leaving it in a dangerous condition, so why not this too?
That's if it were true, of course.
Parent
Re:They actually do... (Score:5, Insightful)
Parent
Re:Now it's time for a little housekeeping (Score:5, Informative)
Parent
bug blassification, side effects and Insults! (Score:5, Funny)
1. I think this bug would be classified "archeological".
2. The question now is what happens to the Samba work-around patches. Now that the bug is fixed, do the patches cause a side-effect (i.e. "a new bug")?
3. This gives rise to a new meme of nerd insults. "You call yourself a programmer? Why I've fixed bugs older than you!" Of course, only one man is entitled to use that line.
Parent