Hope For Fixing Longstanding Linux I/O Wait Bug 180
DaGoodBoy writes "There has been a long standing performance bug in Linux since 2.6.18 that has been responsible for lagging interactivity and poor system performance across all architectures. It has been notoriously difficult to qualify and isolate, but in the last few days someone has finally gotten a repeatable test case! Turns out the problem may not even be disk related, since the test case triggers the bug only by transferring data either between two processes or threads. The test results are very revealing. The developer ran regressions all the way back to version 2.6.15 that demonstrate this bug has more than doubled the time to run the test in 2.6.28. Many, many people working at improving the desktop performance of Linux will be very happy to see this bug die. I know that I, personally, will find a way to send the guy that found this test case his beverage of choice in thanks. Please spread the word and bring some attention to this issue so we can get it fixed!"
Re:funny (Score:3, Interesting)
KTorrent (Score:2, Interesting)
I'm not sure if this is related, but has anyone else noticed KTorrent can really bog your system down without showing any excessive resource usage in KSysGuard? For all I know, it may be passing information between one thread and another, and it's disk I/O intensive.
Longstanding...Since 2.6.18 (Score:4, Interesting)
Been waiting all of 2 years and change for your precious bug fix, 'ave you? You almost had my eyes tearing up there I tell ya: 25 Year Old BSD Bug [slashdot.org].
Re:Desktop??? (Score:1, Interesting)
I think that when it comes to the performance spectrum, Servers would be where this fix is the most needed.
Nope, read the bug. Throughput remains ok, it's the interactivity that suffers.
This is one of those bugs that no Linux developer will admit to until they reckon they have a fix for it. Then we're supposed to be happy, even though people have been complaining about it for years. Oh well, beggars can't be choosers.
I've been ionice'ing my backups and a few other tasks because of this issue, so it'll be nice to get it fixed.
Re:Desktop??? (Score:3, Interesting)
Disk-to-disk operations would then bypass the kernel and asynchronous I/O would consume no primary resources. This was fashionable on some systems (most notably drives that used the IEEE 488 bus) in the 70s and was done to some degree with SCSI, but there's really no excuse for not providing such a capability on any modern drive.
I bought that line, hook line and sinker, in the late 90's with a bunch of IBM 9ES ultra-wide SCSI disks and a good controller.
It never was clear to me that, at any time, Linux was actually telling the drives to copy data directly from one disk to any other without the kernel in the middle.
And now that we live in a world of point-to-point serial buses (SATA, SAS) linking disks to seemingly independent controllers: Is it even theoretically possible anymore?
Re:Desktop??? (Score:3, Interesting)
Re:this is bad even for /. (Score:2, Interesting)
The fscking freezes are in HAL. They have been driving me nuts for more than a year. In my case, the solution is to unplug the CDROM drive.
Re:Just upgrade (Score:3, Interesting)
I second this (Score:5, Interesting)
Re:Problem is Real (Score:3, Interesting)
What version do you need to downgrade to? And does downgrading open you up to any security flaws or incompatibility?
Re:Desktop??? (Score:3, Interesting)
I'm not sure about anybody else here, but I was surprised to see that they mentioned that this will benefit 'Desktop' users.
They mentioned it because it does hit the desktop: https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.22/+bug/131094 [launchpad.net]
Re:this is bad even for /. (Score:5, Interesting)
If you haven't used Linux regularly within the last two years, you probably have not noticed that the system has gotten significantly slower with more recent releases. The probable symptom was discussed here [slashdot.org]. Many Ubuntu users, including me, have noticed that the latency of desktop operations got significantly larger around the time Gutsy was released, which coincides with the Completely Fair Scheduler and kernel upgrade from 2.6.18.
Since it is most likely a latency issue, the problem is extremely hard to diagnose. Alt-tabbing between programs seem a little slower, keyboard input might lag somewhat. You can't measure desktop latency easily.
Re:this is bad even for /. (Score:2, Interesting)
It's very easy to trigger, just unrar an iso from a torrent. Regardless of CPU cores, copious amounts of RAM, and no other real system activity, your desktop experience will grind to a miserable halt until the archive process has completed. renicing makes very little difference. Linux has had this problem for years, certainly more than two. Memory suggests it came along with SATA.
Re:Problem is Real (Score:1, Interesting)
Different AC here.
I've already started doing a git bisect on some of the versions. My findings are at home (i'm at work), but there was a difference whereby a newer kernel version had worse performance that an older version.
What tends to happen is that intra-disk and inter-disk transfers are very fast (DMA is on), at the expense of X11 responsiveness. GUI operations (minimise windows, mouse clicks) start to lag. Not sure if this is the same problem as in the article.
I really could do with spending a good few days investigating and bisecting. I hope this is fixed soon.
I am NOT experiencing this bug (Score:2, Interesting)
Re:funny (Score:1, Interesting)
Yes, they do. It's out there in a lot of naming permutations, with a lot of different causes - video, browser, disk, X, general high I/O, etc. I have personally run into this bastard a number of times.
Re:This is what happens... (Score:1, Interesting)
"Genius" is using the 'stable' kernel tree in production systems, when Linus himself recommends vendor kernels for this.
Re:this is bad even for /. (Score:5, Interesting)
Yep, this is a petty big problem - an easily reproducible one - and it's been around for a really long time. I don't remember when exactly it came about, but I moved from Debian Sid to Ubuntu 7.x about 8 months ago. I didn't have any problem under debian, and I'm uncertain whether the 7.x ubuntus had the problem, but I certainly noticed it in 8.x releases.
I do recall a bit of a somewhat gradual progression of desktop performance decreases, though, going all the way back to the later 2.0 kernels. Back then, the schedulers would all allow an at-the-time relatively slow machine run a fairly bloaty window manager (like E16) responsively while untarring an archive and running a kernel build at the same time - provided there was 100+Mb or so of RAM for the process, of course. Even still, if you were to dip into swap, the UI would remain pretty responsive. Not anymore.
The way things sit now, the Linux I/O scheduler results in desktop performance similar to Windows XP during I/O ops. That is completely unacceptable.
Part of me thinks this is due to a server-centric focus in development (being as the people doing kernel dev largely work for corporations who want server kernels), but I'm not really in the know. If that's the case, we really need to pull one of the old desktop schedulers out of retirement and use that instead of what we've got now, at least for the desktop, and maintain two different-focus schedulers within the kernel instead of just having a couple generally-suited schedulers.
Re:I am NOT experiencing this bug (Score:2, Interesting)
I am getting it. This is on Ubuntu running the 2.6.20-generic kernel that came from the distro. My backups (~19GB) are responsive but I am currently running Ben Gamari's suggested method to reproduce it and it appears to be showing up. I get 'small' freezes of ~1-3 seconds when entering text as well as larger freezes of ~5-15 seconds upon maximizing a minimized program.
It only seems to cause a problem for maximizing minimized programs when it happens at the same time as you maximize the window. It doesn't seem to happen very much but when it does its pretty noticeable.
I never really noticed this before. I suppose I just expected it after hearing about how bad IDE drives are for anything involving heavy multitasking.
Yep, I've left it running and it just did it again.