GCC 4.3.0 Exposes a Kernel Bug 256
ohxten sends news from earlier this month that GCC 4.3.0's new behavior of not clearing the direction flag before a string operation on x86 systems poses problems with kernels — such as Linux and BSD — that do not clear the direction flag before a signal handler is called, despite the ABI specification.
Yep, (Score:5, Funny)
Re:Yep, (Score:4, Funny)
And the answer is to.... use condoms?
And I thought we were here discussing bugs between GCC and LK.
Re: (Score:2, Funny)
Re: (Score:2, Funny)
Re: (Score:2)
Re: (Score:2)
so what (Score:5, Insightful)
Re:so what (Score:5, Insightful)
This problem has existed for 15 years; GCC has always emitted code that worked correctly on kernels that did not follow the ABI, until now.
Part of the problem is that there are an enormous number of installed kernels that are vulnerable to this problem, but only if GCC 4.3 is installed.
That's, quite literally a fuckton of systems. So simply patching new kernels isn't going to make the problem go away.
Re:so what (Score:5, Insightful)
This bugfix is easily regressed, and has already been done.
If somebody wants to stick with a buggy kernel, they can use an older version of GCC. It's not like older stable ones put out horrible binary or anything (we need to exempt RH using 2.96, cause that was ages ago).
Re:so what (Score:5, Insightful)
Re: (Score:2)
Re: (Score:3, Informative)
Re:so what (Score:5, Informative)
mod parent up (Score:2)
Re: (Score:2)
I believe the problem happens if the kernel flips the direction flag: it will stay flipped when calling back to your application.
Re:so what (Score:4, Informative)
Re: (Score:2)
If I've read correctly, the bug only occurs when old kernels (without this newest little patch) are compiled with the brand new GCC 4.3.
You read wrongly, or more likely did not read at all. And nor did the moderators. The bug exists no matter what compiler was used to compile the kernel.
Re: (Score:3, Insightful)
Re: (Score:2, Funny)
Re:so what (Score:4, Funny)
Re:so what (Score:5, Funny)
NB The use of 'assload' without the 'metric' qualifier is discouraged, the customary US assload being a much greater mass.
Re:so what (Score:5, Interesting)
Seems to me the easy and correct thing to do would be to use deprecation. i.e. keep the old functionality for a bit longer and also patch or make the new kernels properly set the flag right now. This way, we move in the right direction and when it's no longer an issue then we drop the functionality in the compiler and rely on the kernel setting the flag like it's supposed to do.
Now, I see why the kernels have not been setting the flag. Why should they when the compiler was doing it? Time to set things right though... in the interests of portability with other environments and compilers. Having the kernels setting the flag starting now would satisfy ABI compatibility with the other compilers AND having gcc continue to cover the flag, by default for a time, would prevent breakage of alot of existing code.
Seems like a no brainer to me. After all, isn't that what deprecation is for?
That's my take on it...
Re:so what (Score:5, Interesting)
Interesting was:
Re:so what (Score:5, Funny)
Re: (Score:2)
That's, quite literally a fuckton of systems. So simply patching new kernels isn't going to make the problem go away.
Other compilers, like ICC from Intel, do not set the flag. That's, quite literally a fuckton of binaries already out in the wild. So simply patching GCC isn't going to make the problem go away either.
The problem is in the kernel, and GCC cannot solve that. This problem will exist whether GCC adds an ugly hack or not. Even if GCC had never changed their behavior, this would still be a problem for other compilers.
Re: (Score:3, Interesting)
Just sayin'.
Read this -- http://cm.bell-labs.com/who/ken/trust.html [bell-labs.com]
Re: (Score:3, Informative)
Afaict this bug only affects a relatively small number of apps because little code messes with the direction flag in the first place
Re: (Score:2)
Re:so what (Score:5, Insightful)
It's related on how the GCC assumes the kernel sets the state of a flag before calling a function (signal handler), and this happens for compiled applications in userland with newer GCC (4.3.0).
I don't recall the gory details, on Sid with the latest (of today) version of libc6, SBCL exposes the bug (crashes). There aren't big differences between libc 2.7-8 and 2.7-9, but the second was compiled with the newer GCC. Kudos to Aurelien Jarno, a Debian developer, who isolated the bug and pushed a patch upstream. http://lkml.org/lkml/2008/3/5/207 [lkml.org]
Re: (Score:2)
Re: (Score:3, Insightful)
Re: (Score:2, Interesting)
Yes, Probably, a single line of code might fix it. (And I won't even call it a bug.)
But before getting over this, I want to say kudos to gcc developers who have taken care to warn about this.
Kernel bug (Score:5, Funny)
Re:Kernel bug (Score:4, Informative)
nut not as good as a major screw-up or even a private error
EVERYBODY PANIC!!! (Score:5, Funny)
Oh my GOD! If this is true, that means- that means-- it... the-
Uh, what does it mean exactly?
Re:EVERYBODY PANIC!!! (Score:5, Informative)
Re:EVERYBODY PANIC!!! (Score:5, Funny)
Re:EVERYBODY PANIC!!! (Score:5, Informative)
Say our source memory contains:
Let's pretend the hyphen is a null (the string terminator or "stop" in most languages and OS) If I want to perform a strlen on that string at position '8', it should return 15 characters because it found the null at 'N' If the direction flag is wrong, it will not scan 8, 9, A,
And with memory, I want to copy 5 bytes from '8' to position 'P' If that works correctly, we get this in memory:
However, if the direction is wrong, we will get:
See how '8' copied to 'P' as expected, but decrementing we then get '7' to 'O', etc
We now have corrupt memory. If we so a strlen, strcat or other null-expecting function on that string located at '8' we will see garbage where the memory copy wrote the wrong data to the wrong position. For the nitpicks, this example used per-byte, there are 16, 32, 64 bit variants of the functions that would cause similar problems bit in 2, 4, 8 byte chunks.
Re: (Score:2, Informative)
Re: (Score:2)
I wonder if anyone still actually uses the old LODS/STOS/MOVS/CMPS instructions, and these are the only instructions affected by the direction flag. As far as I can tell, on modern x86 systems they are significantly slower than the equivalent multi-instruction versions that read/write/compare via register indirection, i.e. RISC-style code, and they are even slower yet than using MMX or SSE instructions to copy data, if they are available. I don't think that compilers are smart enough to use, say, a MOVSD i
Re: (Score:2)
Re: (Score:2)
Re:EVERYBODY PANIC!!! (Score:4, Informative)
REPNE SCASD: (look element into sequential dword vector)
Pentium II @300MHz: 133 MB/s (100MHz FSB, 100MHz SDRAM)
Pentium IV @3GHz: 2.3 GB/s (800MHz FSB, 400MHz DDR SDRAM)
256-bit uprolling: (process 8 elements in a row)
Pentium II @300MHz: 233MB/s (100MHz FSB, 100MHz SDRAM)
Pentium IV @3GHz: 3.3 GB/s (800MHz FSB, 400MHz DDR SDRAM)
256-bit uprolling w/ SSE2 prefetch to increase data cache hit: (process 8 elements in a row)
Pentium II @300MHz: -no SSE2- (100MHz FSB, 100MHz SDRAM)
Pentium IV @3GHz: 4.0 GB/s (800MHz FSB, 400MHz DDR SDRAM)
P.S. Both REP MOVSB and REP MOVSD are slow: the performance per clock is between 1/8 and 1/16 in the first and between 1/2 and 1/4 in the second. The is no reason for using the microcoded instructions other than backwards compatibility, but it seems nonsense to me to save 16KB to write unrolled and/or prefetched memcpy/memmove/scan variants.
Random DF value (Score:2)
Re:EVERYBODY PANIC!!! (Score:5, Insightful)
Re:EVERYBODY PANIC!!! (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
BTW I belive the intention of debian is to attack this problem from all sides. Afaict SBCL is being changed to keep the direction flag set for as short a time as possible. gcc is being changed to return to the older less likely to fail behaviour and linux is being changed to do what it should have done in the first place.
What this really exposes... (Score:2, Insightful)
What this really exposes is not a bug in any kernel. Indeed, the story states that the "bug" exists in both the BSD and Linux kernels. It really exposes something fascinating about the development process: Code is written based on certain assumptions and a working theory of how the code will function once put into use, but the only way to really know how well it works is to hand it over to the ultimate judge of code correctness--the computer--by running the code. If it works, case closed. Now it's entirely
Re:What this really exposes... (Score:5, Informative)
Before anything is released, people have to LOOK AT THE CODE and make sure that the source gives them a reason to think, it will run correctly when used with interfaces that it is supposed to utilize or provide. There are plenty of things in the kernel that would require massive amount of testing to be verified with any certainty, so people write usable code not because they are testing it until their hardware breaks but because they know what they are doing.
Code generated by a C compiler remains consistent regardless of the version, unless you mix binaries built with different versions of GCC. When code that kernel uses to pass control to applications' signal handlers does not keep the direction flag as it is supposed to according to ABI, then userspace code -- ANY CODE THAT CONTAINS SIGNAL HANDLERS -- compiled by a new compiler will not work correctly. In other words, kernel provides an interface that is incompatible with binaries made by a new GCC, and since the standard is on the side of the new GCC behavior, it's kernel that has to be changed. That's all. Nothing else is involved -- some code compiled with a new compiler will not work on an old kernel. Code compiled with an old compiler remains usable with a new kernel, no sources except for five lines in the kernel [lwn.net] have to be changed. It's not even something that a C programmer has any control over unless he writes pieces of his program in assembly -- and then he should know. I don't even believe, any for a C programmer who knows how to write a signal handler it's possible that he "never heard of this obscure nuance of the Intel processor" -- both are very rarely used directly -- however this is completely irrelevant, the only sources that have to be changed are five lines in the kernel, not in signal handlers.
The only real problem this "exposes" is that for some reason everyone who used x86 SysV ABI for anything that matters (Linux and BSD), decided to change the interface to exclude the requirement to clear the direction flag, even though that "official" standard said otherwise -- however it was known from the very beginning, and this is why older C compiler taken it into account in the first place. It's not a bug or someone's lack of knowledge, it's a violation of a standard, and GCC developers decided to get things back to the letter of a standard because the compiler's optimization benefits from it.
Re: (Score:2)
Re: (Score:2)
Far from being an obscure nuance, CLD and STD are just ordinary instructions which tell the processor which direction the next SCAS, LODS or STOS intruction must go. They are explained very early on in most assembly tutorials that I've come across.
A kernel developer who's never heard of the processor's direction flag has no business writing kernel code.
Re: (Score:2, Funny)
Re: (Score:2)
In summary, it's a bug in the ABI documentation; apparently the direction flag must be considered undefined in this case. Fixing the documentation won't break any current code.
That is wrong. There are by definition no bugs in the ABI documentation: The ABI is what the ABI documentation says. The ABI says the direction flag is always cleared on entry of any function. Since it is the OS that calls signal handlers, it is the duty of the OS to make sure that everything is set up according to the ABI when a signal handler is called. The OS doesn't do that, so it is a bug in the OS.
A bit off topic, that is one of the major problems with OOXML. Some people think that OOXML creates a
[LWN subscriber-only content] (Score:4, Insightful)
Re: (Score:3, Insightful)
Alternatively it's a good way to get additional exposure for LWN, as clearly this article is of some value. Maybe 0.0001% of slashdot readers will subscribe because of this.
Besides, we're all friends here, aren't we?
Re:[LWN subscriber-only content] (Score:4, Informative)
Re:[LWN subscriber-only content] (Score:5, Funny)
Re: (Score:3, Funny)
Re: (Score:3, Funny)
History repeating (Score:3, Informative)
What about other compilers? (Score:2)
Re: (Score:2)
(There is a list of 'other compilers' in there somewhere)
Yes. And? (Score:2)
Of course, that's not very helpful if you depend on closed-source software and the vendor won't tell you what compiler they use. Neither is it particularly helpful if you run Gentoo (which sooner or later will expect you to upgrade compiler) or if you're
Re: (Score:2)
Re: (Score:2)
That's no GNU'd! (Score:2, Insightful)
Re: (Score:2)
I fixed this bug in 1989 too (Score:3, Interesting)
Re: (Score:3, Funny)
I fixed this bug in 1989 in an Intel C compiler. That was some years before the GCC project was started. Some people never learn...
From http://en.wikipedia.org/wiki/GNU_Compiler_Collection [wikipedia.org]:
Perhaps the error in your assertion is a side effect of an uncleared direction flag.
Assembler code (Score:2, Interesting)
Re: (Score:2)
Re: (Score:2)
TGIOS (thank God is Open Source) (Score:2)
- regs->flags &= ~(X86_EFLAGS_TF);
+ regs->flags &= ~(X86_EFLAGS_TF | X86_EFLAGS_DF);
make
done.
Oh No (Score:3, Funny)
Re:GCC is wrong (Score:5, Insightful)
The ABI wasn't being followed correctly, hence GCC, Linux and the BSD kernels were already broken.
"GCC breaks this cardinal rule. It should be reverted."
It is not a wise idea to revert corrections to long standing issues.
Re: (Score:2, Insightful)
Telling your age (Score:2, Funny)
1991 was a long time ago. Linux is old.
Re:GCC is wrong (Score:5, Informative)
http://leaf.dragonflybsd.org/mailarchive/commits/2008-03/msg00072.html [dragonflybsd.org]
Before flaming people next time, at least try and learn about what you're talking about.
Re:GCC is wrong (Score:4, Interesting)
Silly question time...
If this managed to affect both Linux and BSD despite no relevant common code, is Windows affected? I'm guessing OSX is, thanks to its BSD heritage. Has anyone tested either of them, though? How about other OSes?
Re: (Score:3, Informative)
Full signals for GCC-compiled programs would be implemented by Cygwin which should give you timer signals and so on. Since the standard way to upgrade GCC under cygwin is to use the cygwin upgr
Re: (Score:2)
Re:GCC is wrong (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2)
"So, we ARE going to get on GCC's case, right? For breaking compatibility with millions of systems, just like Microsoft intentionally broke Firefox, Opera, and Safari?"
Standards compliance is generally a good drum to bang, but whats REALLY important is what you're breaking. It seems to me GCC has a fix in search of a problem. If they really want to meet the standard here, I think it would be reasonable to request the fix from the broken kernels and
Re: (Score:2)
Re: (Score:3, Interesting)
2) Perhaps you haven't gotten the news, but IE8 is doing the right thing too, by using their "less broken" mode by default. This is a switch from what they announced earlier, where you would have to opt-in to better standards compliance.
3) The difference between IE, and gcc is IE is broken, and gcc is not. Clearing the DF does not break standards in an
Re: (Score:3, Funny)
Re: (Score:3, Interesting)
I think Windows Vista is a good example of what happens when you try to maintain backwards compatibility to the assorted bugs and mis-designs of decades. See the various Vista articles on
Re:GCC is wrong (Score:5, Informative)
GCC is in the business of creating new and better optimizations. It is pretty much impossible to make optimizations without assuming things in the ABI. As more and more stuff from the ABI is assumed in the optimizations, people get away with less violations of the ABI, but without assuming more stuff, faster optimizations wouldn't happen.
Because the newest versions of GCC are necessary to improve the state of the art in C compiler optimizations in the open source world, the appropriate reaction to this is to have the compiler people follow the spec, and assume the spec, and if assuming the spec breaks something, the people affected by the breakage don't upgrade their compilers.
This is why there are still people using GCC versions from the stone age.
Re: (Score:2)
Re:GCC is wrong (Score:4, Funny)
I suppose the only barrier to this optimization would be the political effort needed to get everyone to agreee to change the ABI.
Re:GCC is wrong (Score:4, Insightful)
Using that logic Microsoft shouldn't try to improve security in Windows since it breaks many third party applications that depend on exploits and other silly behavior to function.
Re: (Score:3, Interesting)
No privilege escalation, only DOS.
Re: (Score:3, Interesting)
Actually, no. Two threads will work just fine, because the state of the CPU in its entirety (all flags) is saved and restored at when switching between them - indeed, if it wasn't, simply clearing the flag before using it wouldn't help any, because a task switch can occur between any two instructions, including the one clearing the flag and the one immediately following, which makes
Re: (Score:2)
Brings linux down - I don't think so (Score:2)
Re: (Score:2)
Re: (Score:2)
It affects (breaks) some applications running under Linux or the BSDs but it kills Hurd directly.
Re: (Score:2)
If it's in a regular program, all flags remain consistent regardless of what kernel does, because flags are initialized when the process starts, saved when the process sleeps, and restored when the process is awakened from sleep, so continuity is preserved. The whole problem with signal handlers is that they may be called while the process had the flag set, so they inherit the flag,