Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Bug GNU is Not Unix Software Linux

GCC 4.3.0 Exposes a Kernel Bug 256

ohxten sends news from earlier this month that GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels — such as Linux and BSD — that do not clear the direction flag before a signal handler is called, despite the ABI specification.
This discussion has been archived. No new comments can be posted.

GCC 4.3.0 Exposes a Kernel Bug

Comments Filter:
  • Re:GCC is wrong (Score:5, Informative)

    by Anonymous Coward on Wednesday March 19, 2008 @12:42AM (#22792034)
    "Rule #1: Don't break existing stuff"

    GCC is in the business of creating new and better optimizations. It is pretty much impossible to make optimizations without assuming things in the ABI. As more and more stuff from the ABI is assumed in the optimizations, people get away with less violations of the ABI, but without assuming more stuff, faster optimizations wouldn't happen.

    Because the newest versions of GCC are necessary to improve the state of the art in C compiler optimizations in the open source world, the appropriate reaction to this is to have the compiler people follow the spec, and assume the spec, and if assuming the spec breaks something, the people affected by the breakage don't upgrade their compilers.

    This is why there are still people using GCC versions from the stone age.
  • Re:GCC is wrong (Score:5, Informative)

    by Anonymous Coward on Wednesday March 19, 2008 @12:57AM (#22792122)
    Check the BSD mailing lists for yourself, they are affected. I'll give you one example below:

    http://leaf.dragonflybsd.org/mailarchive/commits/2008-03/msg00072.html [dragonflybsd.org]

    Before flaming people next time, at least try and learn about what you're talking about.
  • by EkriirkE ( 1075937 ) on Wednesday March 19, 2008 @01:00AM (#22792138) Homepage
    When scanning strings for, say, a null terminator the direction flag determines if the current memory register gets incremented or decremented after each byte check. It could mean strlen returns 0 if your strings are grouped together in a segment of memory, or it just plain return the wrong result. Also memory copy routines could copy the wrong part of memory to the wrong place and overwrite executable code (or just cause a page/segment fault).
  • by EkriirkE ( 1075937 ) on Wednesday March 19, 2008 @01:32AM (#22792290) Homepage
    In x86 (assumed from here on) assembly, there are some 'quick' operations to read, write, and test memory (LODS*, STOS*, SCAS* respectively - there are probably more). The CPU has registers, or variables that are counters, or hold the memory addresses in question - in these cases a source memory position and a destination memory position. When you performs these commands the memory registers either increment or decrement value (position) depending on how the direction flag is set. GCC is assuming the flag is clear and the pointers will increment - go forward after each call. If the direction flag is set incorrectly upon calling these string or memory functions, the pointers could go backwards and thus copy (or scan) the wrong chunk of memory to the wrong destination.

    Say our source memory contains:

    Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
    Contents: XXXXXXXXA car is heavy.-XXXXXXXX


    Let's pretend the hyphen is a null (the string terminator or "stop" in most languages and OS) If I want to perform a strlen on that string at position '8', it should return 15 characters because it found the null at 'N' If the direction flag is wrong, it will not scan 8, 9, A, ... but 8, 7, 6, ... until it finally finds that null or crashes with an access violation.

    And with memory, I want to copy 5 bytes from '8' to position 'P' If that works correctly, we get this in memory:

    Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
    Contents: XXX-!@#$A car is heavy.-XA carXX


    However, if the direction is wrong, we will get:

    Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
    Contents: XXX-!@#$A car is heav!@#$AXXXXXX


    See how '8' copied to 'P' as expected, but decrementing we then get '7' to 'O', etc

    We now have corrupt memory. If we so a strlen, strcat or other null-expecting function on that string located at '8' we will see garbage where the memory copy wrote the wrong data to the wrong position. For the nitpicks, this example used per-byte, there are 16, 32, 64 bit variants of the functions that would cause similar problems bit in 2, 4, 8 byte chunks.
  • by EkriirkE ( 1075937 ) on Wednesday March 19, 2008 @01:34AM (#22792306) Homepage
    Oops, source memory was supposed to be (better aligned, too):

    Address: 0123456789ABCDEFGHIJKLMNOPQRSTUV
    Content: XXX-!@#$A car is heavy.-XXXXXXXX
  • Re:so what (Score:5, Informative)

    by RML ( 135014 ) on Wednesday March 19, 2008 @01:41AM (#22792346)
    You have read incorrectly. The bug occurs when applications compiled with the brand new GCC 4.3 are run on old kernels, regardless of what compiler was used to compile the kernel.
  • History repeating (Score:3, Informative)

    by Brett Johnson ( 649584 ) on Wednesday March 19, 2008 @02:50AM (#22792658)
    I seem to recall the MS-DOS 2.x suffered this same problem with either the Int 21 or Int 13 interfaces. (Hey it was 20 years ago, I don't remember the details.) If you made certain BDOS calls with the direction flag set, the message "A evird rorre etirw daeR" ("Read write error drive A" backwards) would be displayed on the console. It wasn't fixed for years. I remember we rigorously enforced the "Clear the direction flag before calling into MS-DOS" rule.

  • by Alex Belits ( 437 ) * on Wednesday March 19, 2008 @03:27AM (#22792826) Homepage

    It really exposes something fascinating about the development process: Code is written based on certain assumptions and a working theory of how the code will function once put into use, but the only way to really know how well it works is to hand it over to the ultimate judge of code correctness--the computer--by running the code. If it works, case closed.
    Please don't ever again offer your great insight into software development process. If everything was stuffed into the kernel (or other software projects) once it compiles and runs, we would drown in unstable, crashing, insecure, impossible to debug code. Without any doubt, there are plenty of geniuses (some of them in Northwestern US) who develop in this manner, but I can assure you, neither Linux kernel, nor GCC, glibc or other major open source projects use this procedure. If you want to discuss this method further I recommend you to send your opinion to a friendly individual at djb@cr.yp.to .

    Before anything is released, people have to LOOK AT THE CODE and make sure that the source gives them a reason to think, it will run correctly when used with interfaces that it is supposed to utilize or provide. There are plenty of things in the kernel that would require massive amount of testing to be verified with any certainty, so people write usable code not because they are testing it until their hardware breaks but because they know what they are doing.

    Now it's entirely possible that the kernel developers never heard of this obscure nuance of the Intel processor. Then one day, the compiler changed, and with it, the assumptions changed. Mature code that has been declared good years ago seemingly breaks. Now it's easy to blame the code, but really this is a deletion of a feature from the compiler. Nevertheless, it exposes the fact that ultimately, no matter what tools we use and no matter how well we think our code through, you can only consider the code good once it runs and appears to do what it's supposed to.
    What the hell are you talking about?

    Code generated by a C compiler remains consistent regardless of the version, unless you mix binaries built with different versions of GCC. When code that kernel uses to pass control to applications' signal handlers does not keep the direction flag as it is supposed to according to ABI, then userspace code -- ANY CODE THAT CONTAINS SIGNAL HANDLERS -- compiled by a new compiler will not work correctly. In other words, kernel provides an interface that is incompatible with binaries made by a new GCC, and since the standard is on the side of the new GCC behavior, it's kernel that has to be changed. That's all. Nothing else is involved -- some code compiled with a new compiler will not work on an old kernel. Code compiled with an old compiler remains usable with a new kernel, no sources except for five lines in the kernel [lwn.net] have to be changed. It's not even something that a C programmer has any control over unless he writes pieces of his program in assembly -- and then he should know. I don't even believe, any for a C programmer who knows how to write a signal handler it's possible that he "never heard of this obscure nuance of the Intel processor" -- both are very rarely used directly -- however this is completely irrelevant, the only sources that have to be changed are five lines in the kernel, not in signal handlers.

    The only real problem this "exposes" is that for some reason everyone who used x86 SysV ABI for anything that matters (Linux and BSD), decided to change the interface to exclude the requirement to clear the direction flag, even though that "official" standard said otherwise -- however it was known from the very beginning, and this is why older C compiler taken it into account in the first place. It's not a bug or someone's lack of knowledge, it's a violation of a standard, and GCC developers decided to get things back to the letter of a standard because the compiler's optimization benefits from it.
  • Re:Kernel bug (Score:4, Informative)

    by clickety6 ( 141178 ) on Wednesday March 19, 2008 @04:04AM (#22792946)

    nut not as good as a major screw-up or even a private error
  • Re:so what (Score:3, Informative)

    by RupW ( 515653 ) * on Wednesday March 19, 2008 @06:11AM (#22793378)

    now that GCC isn't turning out broken binaries, old kernels will be unable to run them
    GCC never turned out broken binaries. It turned out overly-conservative binaries that cleared the direction flag even when the ABI spec said it could assume the flag was already clear.
  • by faragon ( 789704 ) on Wednesday March 19, 2008 @06:26AM (#22793444) Homepage
    Some examples, actual bencharks (2 years old, but are pretty the same with K8 and Core2Duo):


    REPNE SCASD: (look element into sequential dword vector)

    Pentium II @300MHz: 133 MB/s (100MHz FSB, 100MHz SDRAM)
    Pentium IV @3GHz: 2.3 GB/s (800MHz FSB, 400MHz DDR SDRAM)


    256-bit uprolling: (process 8 elements in a row)

    Pentium II @300MHz: 233MB/s (100MHz FSB, 100MHz SDRAM)
    Pentium IV @3GHz: 3.3 GB/s (800MHz FSB, 400MHz DDR SDRAM)


    256-bit uprolling w/ SSE2 prefetch to increase data cache hit: (process 8 elements in a row)

    Pentium II @300MHz: -no SSE2- (100MHz FSB, 100MHz SDRAM)
    Pentium IV @3GHz: 4.0 GB/s (800MHz FSB, 400MHz DDR SDRAM)



    P.S. Both REP MOVSB and REP MOVSD are slow: the performance per clock is between 1/8 and 1/16 in the first and between 1/2 and 1/4 in the second. The is no reason for using the microcoded instructions other than backwards compatibility, but it seems nonsense to me to save 16KB to write unrolled and/or prefetched memcpy/memmove/scan variants.
  • by RupW ( 515653 ) * on Wednesday March 19, 2008 @06:31AM (#22793474)

    The rules of the road say that you should check that the car is in drive before setting out on your trip. The older version of GCC used to put the car into drive for you. But the new version lets you leave it in reverse if you don't check making you exit out the rear wall of your garage.
    That's not quite right. In this case:
    • the rules of the road say that you can assume you'll find your car in drive
    • the old version of GCC used to always check anyway and put the car in drive for you; the new version just assumes the car is already in drive, because that's what the rules say.
    The problem comes when an affected kernel temporarily hands your car over to a signal handler - let's say "parking valet". The valet now doesn't bother checking the car is in drive when he gets in, because the rules of the road say the kernel should have given him the car in drive. In the past GCC looked over his shoulder to make sure the kernel had really left the car in drive for him. But now no-one bothers checking for him and he might then accidentally crash your car.

  • Re:so what (Score:3, Informative)

    by petermgreen ( 876956 ) <plugwash@nOSpam.p10link.net> on Wednesday March 19, 2008 @06:33AM (#22793486) Homepage
    Well afaict the debian developers plan to modify gcc 4.3 so it behaves in the old way to reduce the risk of crashes when upgrading from one version of debian to the next. Dunno if gcc upstream will agree on that reasoning though. This isn't perfect though, even before gcc's behaviour changed there was still a risk that a signal handler would break the code that it interrupted.

    Afaict this bug only affects a relatively small number of apps because little code messes with the direction flag in the first place
  • Re:so what (Score:4, Informative)

    by Eunuchswear ( 210685 ) on Wednesday March 19, 2008 @07:22AM (#22793702) Journal
    You never use memmove(3)?
  • Re:GCC is wrong (Score:3, Informative)

    by Eponymous Bastard ( 1143615 ) on Wednesday March 19, 2008 @09:37AM (#22794600)
    Windows does not have signal handlers natively. (or actually, only a few now that I google it:SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM) There is the whole SEH C-language exceptions which take over some of the uses, but no other signals natively. So you won't write a signal handler that gets called on a timer.

    Full signals for GCC-compiled programs would be implemented by Cygwin which should give you timer signals and so on. Since the standard way to upgrade GCC under cygwin is to use the cygwin upgrade/package manager, they can just make the new GCC package depend on an updated cygwin DLL which could set the correct flag for you in a thunk before passing on the signal.

    Don't bother trying to compile GCC yourself under cygwin, it's quite painful. Or at least time-consuming, the slower process spawning makes configure take an hour or more last time I tried it a few years ago. And then you have to wait for make bootstrap to finish.

    Then again, MS isn't notorious for following standards. If this does show up under windows (say when starting an SEH handler) they'll just say that that's the windows ABI and ignore it.

    Hell, it might even be different under win98/XP/Vista, as they are different kernel.
  • by Corbet ( 5379 ) on Wednesday March 19, 2008 @09:45AM (#22794684) Homepage
    FWIW, I originally posted the subscriber link in question to reddit yesterday. I'm surprised to see it show up here, but I also don't mind that it has happened. I'd just as soon not see all LWN content on Slashdot as subscriber links (Slashdot readers probably agree), but this one has brought some attention and, I think, some subscribers. And that's where LWN content comes from in the first place.

All the simple programs have been written.

Working...