Hope For Fixing Longstanding Linux I/O Wait Bug 180
DaGoodBoy writes "There has been a long standing performance bug in Linux since 2.6.18 that has been responsible for lagging interactivity and poor system performance across all architectures. It has been notoriously difficult to qualify and isolate, but in the last few days someone has finally gotten a repeatable test case! Turns out the problem may not even be disk related, since the test case triggers the bug only by transferring data either between two processes or threads. The test results are very revealing. The developer ran regressions all the way back to version 2.6.15 that demonstrate this bug has more than doubled the time to run the test in 2.6.28. Many, many people working at improving the desktop performance of Linux will be very happy to see this bug die. I know that I, personally, will find a way to send the guy that found this test case his beverage of choice in thanks. Please spread the word and bring some attention to this issue so we can get it fixed!"
Re:Desktop??? (Score:4, Informative)
I believe the 1.5tb Seagate linux hang has been fixed. We're using a lot of them (100's) where I work on Ubuntu Hardy servers and haven't had hangs.
this is bad even for /. (Score:5, Informative)
wow, not just badsummary, utterly worthless summary. Here's the relevant discussion from LKML. Yes, this is all of it.
Peter Zijstra
Andrew Morton
In http://bugzilla.kernel.org/show_bug.cgi?id=12309 [kernel.org] the reporters have
identified what appears to be a sched-related performance regression.
A fairly long-term one - post-2.6.18, perhaps.
Testcase code has been added today. Could someone please take a look
sometime?
There appear to be two different bug reports in there. One about iowait,
and one I'm not quite sure what it is about.
The second thing shows some numbers and a test case, but I fail to see
what the problem is with it.
This somewhat deflates the excitement evident in the OP. I mean, I know what he's talking about, these apparently random 1-2 second FREEZES while working, but if the guys in LKML arn't talking about it it's probably not being really worked on.
Re:KTorrent (Score:2, Informative)
There was a bug in ktorrent that cause an infinite loop when udp trackers were present in a torrent file, maybe you check if you have the latest version.
Re:Desktop??? (Score:5, Informative)
How about the Seagate 1500GB drive hang error? To my understanding Windows has been fixed, but the problem still persists in Linux.
The ST31500341AS requires a firmware update from Seagate to something newer than revision SD19 (more info [kernel.org]). In the meantime, if you're using a drive which hasn't been updated to fixed firmware, there's a blacklist in the current development kernel [ozlabs.org] to disable NCQ on affected models as a workaround.
Problem is Real (Score:5, Informative)
For what it is worth, the problem is real.
We have experienced massive negative effects with our MySQL server; downgrading to early linux kernel solves the problem. This has been very difficult to debug as we never guessed that the OS would be a factor... we figured it had to be something we were doing. Only by chance did we try another distro / kernel only to find that everything starts working fine when you downgrade.
whereis bugzilla.kernel.org .. (Score:3, Informative)
Re:Just upgrade (Score:1, Informative)
Yeah, and open source software is of the highest quality, never leaking any memory or having buffer overruns.
Cut the false dichotomy crap, kids.
Re:this is bad even for /. (Score:4, Informative)
"Many Ubuntu users, including me, have noticed that the latency of desktop operations got significantly larger around the time Gutsy was released, which coincides with the Completely Fair Scheduler and kernel upgrade from 2.6.18."
Uhh.. I didn't see anything in there about the Complete Fair Queuing - you just mentioned Completely Fair Scheduler, then kernel 2.6.18.
"Feisty had the 2.6.18 kernel and was quite responsive, so CFQ is in the clear. Gutsy featured 2.6.23 with CFS and was much slower which means it is a possible suspect."
This performance bug has been reported since 2.6.18.