Hope For Fixing Longstanding Linux I/O Wait Bug 180
DaGoodBoy writes "There has been a long standing performance bug in Linux since 2.6.18 that has been responsible for lagging interactivity and poor system performance across all architectures. It has been notoriously difficult to qualify and isolate, but in the last few days someone has finally gotten a repeatable test case! Turns out the problem may not even be disk related, since the test case triggers the bug only by transferring data either between two processes or threads. The test results are very revealing. The developer ran regressions all the way back to version 2.6.15 that demonstrate this bug has more than doubled the time to run the test in 2.6.28. Many, many people working at improving the desktop performance of Linux will be very happy to see this bug die. I know that I, personally, will find a way to send the guy that found this test case his beverage of choice in thanks. Please spread the word and bring some attention to this issue so we can get it fixed!"
Dang!! (Score:5, Funny)
Re: (Score:3, Funny)
Re: (Score:2, Funny)
Oh, god, I can't read Slashdot commentary and drink fluids at the same time, I never know when something is really going to be funny and I just found out what happens when I stumble across something hilarious while chugging a bottle of water.
Re: (Score:2)
Re: (Score:2)
I never learn better, I just coughed on my soda in front of a room full of strangers. God, I'm such a loser.
Re: (Score:3, Insightful)
It was funny to me
Is this bug currently affecting .... (Score:5, Funny)
Re:Is this bug currently affecting .... (Score:4, Funny)
Re: (Score:3, Insightful)
Yes, by spreading the word and asking people to go look into fixes we crashed the bug tracker so nobody doing kernel development can file new bugs or new bug fixes for anything else today.
Awesome plan. Really awesome.
Re:Is this bug currently affecting .... (Score:4, Funny)
Given enough eyeballs, all bug tracking software is fragile
Free testing! (Score:2)
-B
KTorrent (Score:2, Interesting)
I'm not sure if this is related, but has anyone else noticed KTorrent can really bog your system down without showing any excessive resource usage in KSysGuard? For all I know, it may be passing information between one thread and another, and it's disk I/O intensive.
Re: (Score:2, Informative)
There was a bug in ktorrent that cause an infinite loop when udp trackers were present in a torrent file, maybe you check if you have the latest version.
Re: (Score:2)
Re: (Score:2)
ktorrent isnt great for resources but i find nice & ionice can stop it slowing down desktop preformance. I often wonder why the current active program isn't given a nice boost though so i don't need to remember to tell the background programs (torrent, email, irc, etc)
Re: (Score:2)
You use pulseaudio?
Re: (Score:2)
Longstanding...Since 2.6.18 (Score:4, Interesting)
Been waiting all of 2 years and change for your precious bug fix, 'ave you? You almost had my eyes tearing up there I tell ya: 25 Year Old BSD Bug [slashdot.org].
Re: (Score:2)
25 Year Old BSD Bug
I guess this is the part where is say I don't believe you.
25 years... That bug was older than Linux (or me).
Re: (Score:2)
BSD != Linux
Re: (Score:2)
Desktop??? (Score:5, Insightful)
I'm not sure about anybody else here, but I was surprised to see that they mentioned that this will benefit 'Desktop' users.
I think that when it comes to the performance spectrum, Servers would be where this fix is the most needed. Admittedly if you are running a solid server, you should know to use older gen hardware and software that has been proven to be stable. However, some of this 'shiny new' tech coming out is appealing.
How about the Seagate 1500GB drive hang error? To my understanding Windows has been fixed, but the problem still persists in Linux. Could this potentially make a difference? I've been looking to build myself a nice NAS and those 1500GB drives are _cheap_. I can pick one up for about $160. I remember not too long ago that could only get me 80GB.
Re: (Score:1, Interesting)
I think that when it comes to the performance spectrum, Servers would be where this fix is the most needed.
Nope, read the bug. Throughput remains ok, it's the interactivity that suffers.
This is one of those bugs that no Linux developer will admit to until they reckon they have a fix for it. Then we're supposed to be happy, even though people have been complaining about it for years. Oh well, beggars can't be choosers.
I've been ionice'ing my backups and a few other tasks because of this issue, so it'll be n
Re: (Score:3, Interesting)
Re: (Score:2)
I've been ionice'ing my backups
Ionizing? That can't be good for magnetic storage!
Re: (Score:2)
This is one of those bugs that no Linux developer will admit to until they reckon they have a fix for it.
I suppose we see a lot of this in the OSS world, it reminds me of the Firefox "not a memory leak" bug that only became a bug once it had been fixed.
Its just developer's pride showing in its lesser aspect.
Re:Desktop??? (Score:4, Informative)
I believe the 1.5tb Seagate linux hang has been fixed. We're using a lot of them (100's) where I work on Ubuntu Hardy servers and haven't had hangs.
Re: (Score:2)
I remember when that would buy you 60 megabytes! (Hell, I remember when ONE meg drives cost eight times that.)
If you're running a solid server, you know that mechanical devices are (a) slow, and (b) most under strain when doing anything useful, so you tend to avoid using them when at all possible. Servers should do as much as possible via a RAM-based cache -or- use a RAM disk for data that copies to the hard drive only when necessary.
(So long as RAM is battery-backed, even if the machine crashes or the powe
Re: (Score:3, Interesting)
Disk-to-disk operations would then bypass the kernel and asynchronous I/O would consume no primary resources. This was fashionable on some systems (most notably drives that used the IEEE 488 bus) in the 70s and was done to some degree with SCSI, but there's really no excuse for not providing such a capability on any modern drive.
I bought that line, hook line and sinker, in the late 90's with a bunch of IBM 9ES ultra-wide SCSI disks and a good controller.
It never was clear to me that, at any time, Linux was
Re: (Score:2)
IIRC it was proposed on lkml, however it would still need to use the SCSI bus which is where the majority of the time is spent anyway. Also nothing else had tried to do that, so everyone was worried that it'd be turned on and would have weird failure cases (which would be very bad.
Re: (Score:2)
Wait. So. You mean: Nobody has ever done direct disk-to-disk SCSI transfers in a commodity OS[1]? I can't say I'm surprised, but I am a little offended[2].
[1]: I'm sure that, somewhere, there has been at least one embedded or special-built system which accomplished this. This, obviously, doesn't count.
[2]: I bought the big, fast SCSI disks because I needed big, and fast. But it would've been tres cool if copying would've been more efficient. Not that it ever much mattered, as you imply, but the con
Re: (Score:2)
Servers should do as much as possible via a RAM-based cache
Right. RAM is C-H-E-A-P
use a RAM disk for data that copies to the hard drive only when necessary.
Wrong. It means you know more about a dynamic system than the kernel.
Re: (Score:3, Insightful)
The cost of RAM is not that great, compared to the cost of a high-end motherboard on a good server, and is absolutely insignificant compared to even a single hour of downtime in any kind of datacentre. If you want genuine 5N's reliability or better (and you can go a lot better than that), you want as little strain on mechanical components as you can get. There's little point in, say, using Carrier-Grade Linux if the practical lifetime of the hard drive due to usage means your hardware cannot maintain a comp
Re:Desktop??? (Score:4, Funny)
HAL-9K intelligence doesn't pose any problems to the data - it's the *operators* that need to be concerned, especially when giving the system instructions that could potentially conflict with each other.
Re: (Score:2)
Ah yes ... I hear the HAL-9K unit is especially prone to operator overload errors.
Its also prone to occasional operator overboard errors.
Re: (Score:2)
It's not quite the same, but there are caching SCSI (and now probably SATA) controllers that take RAM modules. I used to have a Vesa Local Bus (VLB) SCSI 2 caching controller. The system had 32 megabytes of RAM and would take up to 64, while I had 16 MB in the disk controller and it'd take up to 32 MB. I gave (well, lent, but it failed about 6 years later and I hadn't asked for it back yet) that controller to a friend for a household file server he built out of old parts. He had the full 32MB of cache RAM i
Re:Desktop??? (Score:5, Informative)
How about the Seagate 1500GB drive hang error? To my understanding Windows has been fixed, but the problem still persists in Linux.
The ST31500341AS requires a firmware update from Seagate to something newer than revision SD19 (more info [kernel.org]). In the meantime, if you're using a drive which hasn't been updated to fixed firmware, there's a blacklist in the current development kernel [ozlabs.org] to disable NCQ on affected models as a workaround.
Re: (Score:3, Interesting)
I'm not sure about anybody else here, but I was surprised to see that they mentioned that this will benefit 'Desktop' users.
They mentioned it because it does hit the desktop: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094 [launchpad.net]
Re: (Score:2)
There is this strange delusion that somehow Linux will become an excellent Desktop Platform for everyone.
While most of the development doesn't go in that direction as not to offend the people who want it for a server.
It makes a good server.
A great Development workstation.
great for Appliance applications
good for high end embedded stuff.
But as a desktop/laptop system for the average Joe. I doubt it will ever make it. It is just too geeky and quirky (in terms of UI, not stability) for most people to use it.
It
Re: (Score:2)
I don't think any of those things are difficult to do with the exception of syncing devices. This is changing though. As far as "messing" with photo
Re: (Score:2)
The kernel folks aren't concerned so much with the desktop, because from the kernel space the desktop usually needs about the same things as a server. High speed, low latency kernel calls are good. The desktop is mostly about applications and user interfaces.
Re: (Score:2)
That Seagate 1.5Tb drive problem has, according to a friend of mine who foolishly jumped at buying a large handful of said disks for his data storage needs, been thankfully fixed via a firmware update. If I recall correctly.
That said, I've run into similar problems in "solid, last generation" hardware from vendors. And of course, there are the 10+ year old bugs in Windows which have been largely worked around/with to the point where we forget they're pretty irritating/serious bugs and not just the way thing
Re: (Score:2)
I can pick one up for about $160. I remember not too long ago that could only get me 80GB.
Pfft, I remember not too long ago that could only get me 16% of a 20M drive.
Re: (Score:2)
As am I (kernel: 2.6.24-gentoo-r8). Although I run XFCE. USB Wireless keyboard & mouse.
Maybe I am just more patient?
I dunno...
The bug for viewing this (Score:1)
Re: (Score:1)
Heh, well i just hit the little link and then hit the link at the top to go back to the main topic... then sent a e-mail to /.
Yeah, messed up there... meant to put "little comments link"
Killing kernel.org server isn't very nice... (Score:3, Funny)
I'm sure kernel.org appreciates these links. Now instead of fixing the bug they're putting out fires in the data center...great job slashdot.
Re:Killing kernel.org server isn't very nice... (Score:5, Funny)
Well, maybe the kernel developers or bugzilla developers could use the practice in making a reliable scalable system out of the systems that they design.
--jeffk++
Re: (Score:2)
OSS typically doesn't mean lots of $$$ to spend on hardware : /
Re: (Score:2)
In their defence, the system would scale fine if all the processes weren't stuck in iowait.
Re: (Score:2)
Windows Port? (Score:5, Funny)
If this get resolved is there any chance the fix could get ported to Windows? I just had my Dad's XP laptop completely freeze after I plugged in a bog-basic USB thumbdrive. The desktop sprang to life only after I unplugged it. I wish some of the AC Windows fanboys who were hassling me here last week were around to see it. "Ready for the desktop" my ass.
Re: (Score:2, Insightful)
And I'm going to hassle you again.
(Opps, forgot to check the AC option!)
Never mind, carry on ...
(I also have problems with U3 flash drives. I had to use basic flash drives - thus missing out on all the app portability features.)
So THAT's why we don't have Year of the Linux Desktop! It has performance problems ... just like Vista has performance problems!
Re: (Score:2)
Re: (Score:2)
Reboot for drivers to install? Who does that anymore? Why, just the other day, wait, no, two years ago, I installed Vista and it applied nearly every driver update without a reboot.
Re: (Score:2)
"Ready for the desktop" my ass:
Hope the rest of your gets ready too :-).
this is bad even for /. (Score:5, Informative)
wow, not just badsummary, utterly worthless summary. Here's the relevant discussion from LKML. Yes, this is all of it.
Peter Zijstra
Andrew Morton
In http://bugzilla.kernel.org/show_bug.cgi?id=12309 [kernel.org] the reporters have
identified what appears to be a sched-related performance regression.
A fairly long-term one - post-2.6.18, perhaps.
Testcase code has been added today. Could someone please take a look
sometime?
There appear to be two different bug reports in there. One about iowait,
and one I'm not quite sure what it is about.
The second thing shows some numbers and a test case, but I fail to see
what the problem is with it.
This somewhat deflates the excitement evident in the OP. I mean, I know what he's talking about, these apparently random 1-2 second FREEZES while working, but if the guys in LKML arn't talking about it it's probably not being really worked on.
Re: (Score:2, Interesting)
The fscking freezes are in HAL. They have been driving me nuts for more than a year. In my case, the solution is to unplug the CDROM drive.
Re: (Score:2, Funny)
This somewhat deflates the excitement evident in the OP. I mean, I know what he's talking about, these apparently random 1-2 second FREEZES while working, but if the guys in LKML arn't talking about it it's probably not being really worked on.
I know, it looks like someone's pet bug made the cover of /. today. For the record, here is my pet bug:
https://launchpad.net/ubuntu/+bug/1 [launchpad.net]
Re:this is bad even for /. (Score:5, Interesting)
If you haven't used Linux regularly within the last two years, you probably have not noticed that the system has gotten significantly slower with more recent releases. The probable symptom was discussed here [slashdot.org]. Many Ubuntu users, including me, have noticed that the latency of desktop operations got significantly larger around the time Gutsy was released, which coincides with the Completely Fair Scheduler and kernel upgrade from 2.6.18.
Since it is most likely a latency issue, the problem is extremely hard to diagnose. Alt-tabbing between programs seem a little slower, keyboard input might lag somewhat. You can't measure desktop latency easily.
Re: (Score:2)
CFS was introduced in 2.6.23, not 2.6.18. CFQ was introduced in 2.6.18.
Re: (Score:2)
Re:this is bad even for /. (Score:4, Informative)
"Many Ubuntu users, including me, have noticed that the latency of desktop operations got significantly larger around the time Gutsy was released, which coincides with the Completely Fair Scheduler and kernel upgrade from 2.6.18."
Uhh.. I didn't see anything in there about the Complete Fair Queuing - you just mentioned Completely Fair Scheduler, then kernel 2.6.18.
"Feisty had the 2.6.18 kernel and was quite responsive, so CFQ is in the clear. Gutsy featured 2.6.23 with CFS and was much slower which means it is a possible suspect."
This performance bug has been reported since 2.6.18.
Re: (Score:2, Interesting)
It's very easy to trigger, just unrar an iso from a torrent. Regardless of CPU cores, copious amounts of RAM, and no other real system activity, your desktop experience will grind to a miserable halt until the archive process has completed. renicing makes very little difference. Linux has had this problem for years, certainly more than two. Memory suggests it came along with SATA.
Re:this is bad even for /. (Score:5, Interesting)
Yep, this is a petty big problem - an easily reproducible one - and it's been around for a really long time. I don't remember when exactly it came about, but I moved from Debian Sid to Ubuntu 7.x about 8 months ago. I didn't have any problem under debian, and I'm uncertain whether the 7.x ubuntus had the problem, but I certainly noticed it in 8.x releases.
I do recall a bit of a somewhat gradual progression of desktop performance decreases, though, going all the way back to the later 2.0 kernels. Back then, the schedulers would all allow an at-the-time relatively slow machine run a fairly bloaty window manager (like E16) responsively while untarring an archive and running a kernel build at the same time - provided there was 100+Mb or so of RAM for the process, of course. Even still, if you were to dip into swap, the UI would remain pretty responsive. Not anymore.
The way things sit now, the Linux I/O scheduler results in desktop performance similar to Windows XP during I/O ops. That is completely unacceptable.
Part of me thinks this is due to a server-centric focus in development (being as the people doing kernel dev largely work for corporations who want server kernels), but I'm not really in the know. If that's the case, we really need to pull one of the old desktop schedulers out of retirement and use that instead of what we've got now, at least for the desktop, and maintain two different-focus schedulers within the kernel instead of just having a couple generally-suited schedulers.
I second this (Score:5, Interesting)
Problem is Real (Score:5, Informative)
For what it is worth, the problem is real.
We have experienced massive negative effects with our MySQL server; downgrading to early linux kernel solves the problem. This has been very difficult to debug as we never guessed that the OS would be a factor... we figured it had to be something we were doing. Only by chance did we try another distro / kernel only to find that everything starts working fine when you downgrade.
Re: (Score:3, Interesting)
What version do you need to downgrade to? And does downgrading open you up to any security flaws or incompatibility?
Re:Problem is Real (Score:5, Insightful)
If you can reproduce it, do a git-bisect. You'll find the change that caused it pretty quickly.
This is what happens... (Score:3)
...when you insist on doing development in the 'stable' kernel tree and expect vendors to stablise it.
Genius!
Re: (Score:2)
Exactly!
The current crop of problems observable in the Linux kernel started roughly around the time when the development policy changed. We went to "kernel is stable and works very well for a known subset of things, and builds consistently" to "kernel is stable for some things and works decently for most things, with pretty much everything working to some extent, and barely ever builds consistently (at least from one subversion to another).
whereis bugzilla.kernel.org .. (Score:3, Informative)
I am NOT experiencing this bug (Score:2, Interesting)
Re: (Score:2, Interesting)
I am getting it. This is on Ubuntu running the 2.6.20-generic kernel that came from the distro. My backups (~19GB) are responsive but I am currently running Ben Gamari's suggested method to reproduce it and it appears to be showing up. I get 'small' freezes of ~1-3 seconds when entering text as well as larger freezes of ~5-15 seconds upon maximizing a minimized program.
It only seems to cause a problem for maximizing minimized programs when it happens at the same time as you maximize the window. It doesn't s
Re: (Score:2)
Complete Fair Queuing (Score:2)
This I/O scheduler was introduced as the default in 2.6.18 and available since 2.6.13. I wonder if that has something to do with it. I'm going to test it out on my home machines later today and have a look-see.
Supposedly it can be disabled and the AS scheduler can be used if you change it at runtime in /sys/block/hda/queue/scheduler, or use the "elevator=as" boot option.
Re: (Score:2)
That's because you're not transferring data between yourself and another thread.
Looks like also affects servers, not just desktops (Score:3, Funny)
It must also affect servers, because none of the links is transferring data either.
Re: (Score:3, Funny)
That's because you're not transferring data between yourself and another thread.
But he is transferring data between himself and another sockpuppet.
Re:funny (Score:4, Funny)
Re: (Score:2)
Yeah, but yours is just a duplex issue, been known about for years, and easily fixed.
Re: (Score:3, Interesting)
Re:funny (Score:5, Insightful)
Re:Just upgrade (Score:4, Funny)
OS not fast enough? Just upgrade your hardware components, preferably to a new, top-of-the-line system.
Oh wait... that's the Windows way of doing things.
Yeah, exactly, that's why volunteers have been hard at work to find and fix the (published, admitted) bug. Just like Win... Oh, wait.
Re:Just upgrade (Score:5, Insightful)
Re: (Score:2)
Well, that would explain a lot.
Re: (Score:3, Interesting)
Re: (Score:2)
It's happily not an either/or situation. If you didn't buy newer hardware in the meantime, this bug being fixed will speed the kernel up on your old hardware. If you did buy new hardware, you get that extra speed plus the speed boost when the bug is fixed.
Re: (Score:2)
From your tone: I assume that you will be sending this guy something of great value ?
Why do you not have the courage to say your name when you post ?
Re: (Score:3, Funny)
Re: (Score:2)
I've been using the "Preemptible Kernel (Low-Latency Desktop)" option for years on my Gentoo systems, and haven't seen any problems with it. The little bit of overhead seems negligible compared to the voluntary option.
Re: (Score:2)
Re: (Score:2)
Please someone fix the damn economy for crissakes.
Ah, okay. I'll start coding that right away.
Re: (Score:2)
Or those people who didn't have a sufficiently high-paying job to manage basic necessities in the pre-crash era. Or people who, despite the desire and intelligence to do excellent work, failed to find a job before the beginning of the recession.
Any simplified model of the economy is bound to fail, because the economy is not simple. Not everyone who doesn't have massive savings spent themselves retarded. Not everyone who spent themselves retarded is going to suffer now. Sadly, it doesn't work like that.
Re: (Score:2)
In all fairness, the 20+% annual inflation we're going to see starting in 2010 will bring housing values back to their peaks by 2013.
Re: (Score:2)
Long-term US bonds are going to be unsellable internationally a year or two from now
Yeaaaaaaaahhhh...'cause you know, all the other countries in the world that sell long-term bonds have perfect economies. The interest rate paid by the U.S. Gov't might go up a little to increase demand, but people will still be buying U.S. treasuries for the foreseeable future. As in the rest of your life.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
You've obviously not kept up with events. For a year now, the US has been under attack over its' AAA credit rating. This was BEFORE the market meltdown, etc.
From January 10th, 2008: http://www.reuters.com/article/bondsNew [reuters.com]
Re: (Score:2)
With the latest wrong-headed bailouts (Merrill Lynch) diverting even more capital to propping up bad investments and bad actors, inflating its' value away is inevitable.
The government should have allowed the failures, kept its' powder dry, then moved in after the market correction to help pick up the pieces. It would have been cheaper and more effective.