'Kernel Memory Leaking' Intel Processor Design Flaw Forces Linux, Windows Redesign (theregister.co.uk) 205
According to The Register, "A fundamental design flaw in Intel's processor chips has forced a significant redesign of the Linux and Windows kernels to defang the chip-level security bug." From the report: Programmers are scrambling to overhaul the open-source Linux kernel's virtual memory system. Meanwhile, Microsoft is expected to publicly introduce the necessary changes to its Windows operating system in this month's Patch Tuesday: these changes were seeded to beta testers running fast-ring Windows Insider builds in November and December. Crucially, these updates to both Linux and Windows will incur a performance hit on Intel products. The effects are still being benchmarked, however we're looking at a ballpark figure of five to 30 per cent slow down, depending on the task and the processor model. More recent Intel chips have features -- specifically, PCID -- to reduce the performance hit. Similar operating systems, such as Apple's 64-bit macOS, will also need to be updated -- the flaw is in the Intel x86 hardware, and it appears a microcode update can't address it. It has to be fixed in software at the OS level, or buy a new processor without the design blunder. Details of the vulnerability within Intel's silicon are under wraps: an embargo on the specifics is due to lift early this month, perhaps in time for Microsoft's Patch Tuesday next week. Indeed, patches for the Linux kernel are available for all to see but comments in the source code have been redacted to obfuscate the issue. The report goes on to share some details of the flaw that have surfaced. "It is understood the bug is present in modern Intel processors produced in the past decade," reports The Register. "It allows normal user programs -- from database applications to JavaScript in web browsers -- to discern to some extent the contents of protected kernel memory. The fix is to separate the kernel's memory completely from user processes using what's called Kernel Page Table Isolation, or KPTI."
FOOF (Score:5, Insightful)
About par for Intel's course. Make it fast at the expense of horrible bugs.
FUCKWIT (Score:2)
not you. thats's what the Linux team wanted to call this bug.
I read the El Reg article but I still don't understand what it is saying. At all levels. I don't understand if this means all intel processors or just the new ones. I don't understand if the 20% slowdown is for a tiny fraction of operations in the OS or if it means that things like e-mail, firefox or general python programming will be slowed down 20% overall. The latter would be a disaster. (could I ask intel to refund 20% of my computer cos
Re: (Score:2)
A question I have is is the patch optional? I'm not in the habit of running random binaries downloaded from the Internet, I suspect the chances of someone exploiting this on the computers I own is extremely low.
A 30% hit for virtually no security benefit in practice seems really bad to me.
Re: (Score:2)
My understanding is that there's a boot-time kernel parameter you can set that will disable the fix.
Re: (Score:2)
Seriously, hardware design is so much more complicated than software its not even funny.
Re: (Score:3)
Actually, it's a reference to a hex value that could trigger a nasty Pentium bug.
Re: (Score:3, Informative)
F00F [wikipedia.org]
How could this be abused? (Score:4, Interesting)
Re:How could this be abused? (Score:5, Insightful)
You're running in EC2 on shared hardware. Your instance can read the memory of other instances running on the same physical hardware.
Re: (Score:2, Interesting)
Yep, "the cloud" bites again.
When are you supposedly "technically sophisticated" people going to learn that security inherently means running on your own hardware?
When you cheap out, you lose. Performance, security, integrity, reliability - everything.
VM specifically: For most local-system applications, VM is a long in the tooth, unneeded hangover that drops performance. I hav
Re: (Score:3)
You're assuming that the cloud works out cheaper. Sometimes it does, but for many cases it does not (and it's not even close).
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
He confused me at first with the mention of VMs as, like you, I initially thought he was talking about Virtual Machines (especially given the context of cloud computing and hardware sharing). But he was actually talking about Virtual Memory. Not sure how he got from one topic to the other, but it's pretty clear:
"Memory is crazy cheap these days compared to your time and security and energy and wallet thickness; you should have lots and lots. If you don't, then there's your error."
Re:How could this be abused? (Score:5, Interesting)
Cryptographic keys, information on other processes (making other attacks feasible), perhaps random number generator seeds and status, for example
...
And the principle in general that there could be information the process is not supposed to reach.
Re:How could this be abused? (Score:5, Interesting)
Private keys for system-level crypto and user credentials are stored in kernel space. You want everyone on the system to be reading those? If you can read a private key or a Kerberos token, you can become that daemon/system/user.
This bug essentially destroys local security and severely compromises network security, subject to any limitations on where/when data can be read.
I'm not a microarchitecture guru who can dig through the details and figure out the limitations of potential attacks. Perhaps only a small portion of kernel memory can be exposed via this bug. I don't really know. The naive, simple scenario where all kernel memory is exposed, though---that is pretty damned bad. Infosec doomsday bad.
five to 30 per cent slow down (Score:3)
I find it hard to believe that a virtual memory change will result in a 5-30% slowdown for Intel processors. Maybe for a few extremely specific (likely edge-case) tasks, but if there was a legitimate 5-30% performance decrease, you can bet there would be a far different solution in the works that would suitably fix the problem.
Re: five to 30 per cent slow down (Score:2)
So an almost unusuable computer becomes completely unusable. Unless you're on solid state, then you get the performance of a mechanical hdd.
Re: (Score:2)
Page File is only one area that can be mapped into a Virtual Address Space. System RAM is another. Often time, I/O is as well.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
Virtual memory is just the mapping of a virtual address space to a physical address space. Paging is the swapping out of memory to disk so you can allocate more than you physically have. Virtual memory is what's commonly used to implement paging.
This bug is a flaw with the virtual memory protection mechanism that stops user code from reading kernel data.
Re: (Score:3)
If the choice is a 30% slowdown or a massive highly dangerous security flaw, the developers will pick the 30% slowdown. Especially if that flaw is a big problem for people using VMs (there are suggestions in some places that the flaw would be a HUGE problem for cloud providers like Amazon). That said, if you are running Linux and dont care about the security flaw but do care about the slowdown, you can always compile your own kernel without the relavent bits in there
:)
Re: (Score:3)
Re: (Score:3)
This is a good point. If the machine lives in a protected environment where only approved software is used by authorized users, disabling the fix to avoid the slowdown might be the right thing.
But I'm pretty sure the slowdown in this case isn't FUD. Otherwise we'd hear Intel loudly denying it by now.
Re:five to 30 per cent slow down (Score:4, Informative)
Re: (Score:2)
One o
Re: (Score:3, Insightful)
If the choice is a 30% slowdown or a massive highly dangerous security flaw, the developers will pick the 30% slowdown.
As it is it seems a lot of developers choose "30% slowdown" over "spend some time to write not shit code".
"Premature optimization is the root of all evil" gets turned to "Do no optimization whatsoever, have no understanding of underlying hardware, and pick the latest trendiest framework that runs on top of 5 layers of framework to provide 6502 performance from an i7."
Re: (Score:2)
provide 6502 performance from an i7
Good, it's a step in the direction of improving latency. Next, replace the rest of the computer with an Apple
//E and we'll have something responsive. //E to have the least latency in displaying a character, https://danluu.com/input-lag/ [danluu.com]
Article measuring latency of various computers that finds the Apple
Re: (Score:2)
to provide 6502 performance from an i7."
Well, if it uses the 6502 instruction set as well, all is not lost!
Re:five to 30 per cent slow down (Score:5, Informative)
I find it hard to believe that a virtual memory change will result in a 5-30% slowdown for Intel processors. Maybe for a few extremely specific (likely edge-case) tasks, but if there was a legitimate 5-30% performance decrease, you can bet there would be a far different solution in the works that would suitably fix the problem.
Virtual memory access is used in every single memory access cached or not. 5% would be lucky for trying to work around a broken system. I am guessing the flaw is probably in the TLB which is meant to accelerate these things.
Either that or a lawsuit (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
Mail everybody a new chip from the past decade would be the "different" solution!.
Re: (Score:2)
The slowdown is on syscalls. So it depends on your workload. For example, `du -s` is reportedly slowed down really by tens of percents.
Re:five to 30 per cent slow down (Score:4, Informative)
They don't have a choice. The cost is quite believable since the workaround involves mapping the kernel in and out of the process space for every system call. Keeping it mapped in and keeping the page tables hot in the cache helps performance a lot.
The real fix involves new silicon.
Re: (Score:2)
I don't think you understand how drastic this fix is. Every time a user mode to kernel mode transition happens and every time a hardware interrupt happens, the entire page table directory layout has to be switched. This means all the TLB caches are flushed as well and that's where the main performance hit comes from.
So if you're doing something like crypto currency mining you're not going to see much of a hit. But if you're doing a lot of I/O (file servers, database servers, web servers, etc.) you're goi
Re: (Score:2)
From what I can read it's actually every context switch, the kernel is forced to look into different address spaces. So for every interrupt (eg. a network packet) your computer has to dump it's CPU cache and re-read from RAM. This totally removes the benefits of L1-L3 caches on the CPU which these days can be ~30MB.
I'm much more surprised this can't be fixed in microcode, the entire CPU is run by its own "OS" since Pentium 1 the Intel CISC is just a translation to RISC so you should be able to patch it out.
Re: (Score:2)
I'm much more surprised this can't be fixed in microcode, the entire CPU is run by its own "OS"
...
Perhaps it actually can, but then Intel would assume the responsibility for that fix -- across all the many, many CPU families and versions -- and the responsible for any bricked CPUs if/when something goes wrong. Better for them to say, "Fixed in the next release."
Re: (Score:2)
....but if there was a legitimate 5-30% performance decrease, you can bet there would be a far different solution in the works that would suitably fix the problem.
There is, and there isn't. Intel obviously does not want to recall hardware and they really don't give a shit. If they slip this under the radar then they're hoping CPUs will just get replaced as an answer to performance problems.
Re: (Score:2)
Like a lawsuit?
Re: (Score:2)
Every single system call will essentially incur a costly task switch that wasn't there before. The total slowdown will be based on the total number of syscalls used in any particular workload.
Though since this is the new reality, where accessing kernel memory directly is more expensive than it used to be, you can expect kernel and application developers to try and claw back some of these losses over time.
Re: AMD stock? Intel Stock? (Score:2)
Re: (Score:2)
Why should warranty period apply, this is a manufacturing defect, that causes your machine to be insecure or slower than advertised.
Just like if find out your car air bags won't work, they are replaced free of charge.
Re: (Score:2)
Probably only needs to clear the TLB when leaving kernel mode. I don't think it is a problem if the kernel can read user pages using this bug.
This could be massive (Score:5, Interesting)
The developers behind the GRSecurity project measured up to 63% performance loss [twitter.com]. If most common tasks are equally affected, Intel is sure fucked. Home users might not need to bother, but large cloud providers might be seriously affected.
Meanwhile the Linux kernel has received the largest incremental minor patch [kernel.org] in its history (229KiB) - perhaps kernel 4.14.11 already contains all the required fixes.
I have a sneaking suspicion Intel shares will fall through the floor in the next few weeks because Intel CPUs might have suddenly become quite slower than their AMD Zen based counterparts.
Re:This could be massive (Score:5, Informative)
What about games? (Score:2)
Re: What about games? (Score:2)
Re: (Score:2)
A new GPU issue would have been evil. CPU use is consul port code and art limited already and most games on suggested settings will not feel any different.
Re: (Score:2)
Re:This could be massive (Score:5, Informative)
Re: (Score:2)
Nice catch.
I guess that's what I get for reading articles about the issue, rather than LKML...
Re: (Score:2)
Nope. Page Table Isolation is the fix and not the fault. But isolating the userland and kernel page tables means you have to switch between them each time you go from user mode to kernel mode and back. This slows things down.
AMD CPUs don't have the bug where user mode can read kernel pages so does not require this isolation and the performance hit caused by enabling it. From the AMD email: "The AMD microarchitecture does not allow memory references, including speculative references, that access higher p
Re: (Score:2)
Based on this link from Hacker News: https://cyber.wtf/2017/07/28/n... [cyber.wtf] and the linked email/patch from AMD, it looks like what happens is that AMD checks memory permissions up front before allowing an instruction into the pipeline, while Intel made the memory permission check as a later part of the pipeline, apparently after the memory was accessed and inserted into the cache.
Obligitory LWN link (Also affects ARM64) (Score:5, Informative)
Linux Weekly News [lwn.net] has been covering this for quite a while.
5% slowdown on average, with up to 30% for some particularly bad network operations.
ARM64 is also affected [lwn.net], so it's not just intel
Older (non-paywalled) LWN Link (Score:2)
An older link, about the KAISER patch set [lwn.net]
And, time for AMD to shine again (Score:4, Informative)
Re: (Score:2)
Nice catch.
AMD is safe (Score:5, Informative)
The summary is not fully explicit: this is not a flaw in Intel x86 ISA, but specific to CPUs made by Intel. AMD processors don't have the problem, so they should not need the patch.
https://lkml.org/lkml/2017/12/... [lkml.org]
This could be a huge win for AMD, because the patch incurs a measurable slowdown. At the moment, though, the Linux fix doesn't seem to distinguish between manufacturers. I expect the distinction will appear later -- better safe than sorry.
Re: (Score:2)
Although AMD is safe, why do they mention a 50% slowdown for AMD processors?
> @grsecurity measured a simple case where Linux “du -s” suffered a 50% slowdown on a recent AMD CPU.
Re: (Score:2)
Presumably they enabled the software workaround and ran it on a bunch of CPUs. Then they picked the most alarming slowdown they could find, regardless of whether that CPU needed the workaround or not.
Or perhaps they were just unaware at the time that AMD CPUs are not at risk.
More info on the subject (Score:4, Informative)
http://pythonsweetness.tumblr.com/post/169166980422/the-mysterious-case-of-the-linux-page-table [tumblr.com]
Deja Vu (Score:2)
Re:Deja Vu (Score:4, Informative)
The future (Score:4)
I'm curious how much Cannon Lake and Ice Lake CPU architectures are going to be delayed. Since Cannon Lake is basically SkyLake on a 10nm node, Intel cannot release it with such a glaring hole which causes such a significant performance loss.
I've been running a Sandy Bridge CPU for seven years now, and now I'm really looking forward to the second gen Zen CPUs. Viva, competition. I'm really glad AMD is still around.
Re: (Score:3)
I'm curious how much Cannon Lake and Ice Lake CPU architectures are going to be delayed.
I'm going to go out on a limb and say "not at all"
CPU design pipelines are pretty long; generally requiring at least a year to go from "Tape Out [wikipedia.org]" to fabrication.
Releasing no chip (and staying with an even slower current generation) is just not an option.
Cannon Lake and Ice lake are still an improvement on Intel's current offering, and can still compete against AMD's offering.
Intel managed to move on after the (even more dire) disasterous NetBurst architecture [wikipedia.org]; there's no reason to believe they won't get pas
Intel CEO Sold a lot of stock... (Score:5, Interesting)
https://www.fool.com/investing... [fool.com]
Less than a month before we know the linux kernel was being patched for this bug.
Re:Intel CEO Sold a lot of stock... (Score:5, Informative)
This bug has been known and reported about since early November; the original paper was presented in July of 2017, and code has been in Github since Feburary.
Motley Fool is just noting that the Intel CEO isn't holding any more stock than he needs to.
And there are good reasons:
* AMD is back from the dead.
* Intel's GPU hasn't been that successful -- they've even teamed up with AMD to put Radeon GPU's in the same die as an Intel CPU.
* PC sales are declining as consumers shift from Intel PC's to using ARM-powered tablets & phones instead.
* ARM is making inroads into the "desktop and laptop computer" marketplace.
* ARM is powering most consumer electronics as well (TV's, Blu-ray players, Smart Speakers, etc)
* Intel is absolutely nowhere in the mobile world. Mobile has one ARM to rule them all.
* Intel missed the boat for the current generation of XBOX and PlayStation consoles.
Intel is looking more and more like a one trick pony, and its competitors are beginning to do that one trick better too.
... and this is why ... (Score:5, Interesting)
This is why we run our mission critical workloads on SPARC and Power along side Linux. Solaris and AIX. Diversity -- in operating system, in processor, in manufacturer - is healthy. The SPARC T8's are blazing faster, secure, and don't have this nonsense. Neither do our POWER8's. Having all your eggs in the Intel+Linux basket could be a major shitshow here... meanwhile, we'll keep chugging along.
MacOS? (Score:2)
Re: (Score:2)
Re: (Score:2)
Yes. I understand it, this Intel flaw concerns the kernel being mapped into all processes' address space at the same addresses. The kernel's memory pages are marked "Global" (present for all processes) and "Ring 0" (Kernel access only). This means that a system call does not require a context switch - only a flip of a bit inside the CPU.
Both Linux, MS Windows and macOS do/did this before the recent patches.
While macOS uses the Mach microkernel which (because it is a microkernel) had been designed to be smal
Go ARM Go (Score:2, Interesting)
Make sure smugness doesn't bite you in the ass (Score:2)
Just because ARM processors don't have this security bug it doesn't mean that there aren't any Broadcom ARM processor hardware (or its kernel) security issues lurking out there that are as bad or worse.
Re: (Score:2)
Re:Go ARM Go (Score:5, Informative)
We'll just live with the slowdown, pretty much. (Score:2)
The notion that Intel even has the capability of producing new fixed CPUs to match other than the latest packaging/pin requirements seems fanciful. In which case we'll just have to live with any slowdown. As buying all new systems is just too expensive.
Re: (Score:2)
No, but this would be one of the few times I'd love to see a class action lawsuit. Intel has been selling us CPUs with a specified amount of processing speed and for all of these years because of this bug those CPUs will no longer be able to match those specs.
Rebates should be given out automatically based on the cost of the CPU on systems and an average slowdown for the market that they used the machine for (home use, corporate desktop, server, etc).. Large companies such as Apple, Dell, etc have records o
Is this always worth fixing? (Score:2)
Won't there be people who decide that fixing this is not worth the slowdown? After all, if it is ran on an internal machine where users can't cause a buffer overflow or provide code, why should there be a risk?
Speculative Memory References and Page Faults (Score:4, Interesting)
From the AMD commit [lkml.org]:
this can probably be rewritten in the inverse like:
Intel processors
... allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode, [including]
when that access would result in a page fault.
So it seems like: set up a speculative memory reference to a kernel memory structure, cause a page fault, and then get a bit of kernel memory out (and back in?). That could get you root before long. Some people have been saying this can be leveraged to get a guest into its hypervisor too.
idiots (Score:2)
Hope the SEC is paying attention (Score:2)
Intel CEO, Brian Krzanich, apparently sold a bunch of shares on Nov. 29. While that's not unusual in and of itself, apparently Intel corporate bylaws require its CEO to maintain a minimum number 250,000 shares, and that's exactly how many shares Mr. Krzanich has left. Despite predicting future market growth, the guy dumped his stock for some reason.
https://www.fool.com/investing... [fool.com]
Re:In all fairness... (Score:4, Informative)
Re:In all fairness... (Score:5, Interesting)
AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault.
The (trivial) patch essential disables the work-around on AMD CPUs. I'm not going to comment on how fair GP's criticism of Intel is in general, but that link definitely isn't evidence in Intel's favor
Re: (Score:3)
Re:In all fairness... (Score:5, Insightful)
That bolster's AC's point. It looks like the Intel guys were going to cripple performance for everyone until the patch from AMD removed the unnecessary crippling from AMD processors.
Re:In all fairness... (Score:4, Funny)
Well, I'm guessing the approach was more along the lines of "an abundance of caution with the X86 ISA" as opposed to deliberate malice towards AMD.
Whilst no doubt there's some Intel guys with a very good working knowledge of AMD CPU internals, you'd really want to get direct confirmation from the actual AMD hardware guys that their hardware is immune to this.
Re:In all fairness... (Score:5, Insightful)
Well, I'm guessing the approach was more along the lines of "an abundance of caution with the X86 ISA" as opposed to deliberate malice towards AMD.
Hi. Have you met Intel?
Re: (Score:2)
Re: (Score:2)
If you read one of the original articles about the KAISER patch set [lwn.net]: a commenter asked about microkernels, and the reply is that since it's a hardware issue, both microkernels & monolithic kernels have to pay the same price.
Re: (Score:3)
And this comment. Someone could feel the storm coming.
KAISER: hiding the kernel from user space
Posted Nov 16, 2017 7:21 UTC (Thu) by alkbyby (subscriber, #61687) [Link]
Looks like something bad is coming. Such as mega-hole maybe in hardware that can be mitigated by hiding kernel addresses.
Otherwise I cannot see why simply hiding kernel addresses better, suddenly becomes important enough to spend massive amount of cpu on it.
- This isn't the first time. There was a problem a decade ago with Intel CPU's, when
Re: (Score:2)
Re: (Score:2)
Nah, GP didn't configure the kernel's settings properly.
There's more to running without swap than not enabling a swap file/partition. You have to configure swappiness, the oom_adj_score for what to kill when memory runs out, and so on.
Re: (Score:3)
https://www.mail-archive.com/l... [mail-archive.com]
2) Namespace
Several people including Linus requested to change the KAISER name.
We came up with a list of technically correct acronyms:
User Address Space Separation, prefix uass_
Forcefully Unmap Complete Kernel With Interrupt Trampolines, prefix fuckwit_
but we are politically correct people so we settled for
Kernel Page Table Isolation, prefix kpti_
Linus, your call
:)
LOL!
It's serious as it affects all Intel cpus (Score:2)
It could explain why Intel did put the brakes on CPUs production, and some of the 2017 are very hard to find.
Re: (Score:2)