Heap Protection Mechanism 365
An anonymous reader writes "There's an article by Jason Miller on innovation in Unix that talks about OpenBSD's new heap protection mechanism as a major boon for security. Sounds like OpenBSD is going to be the first to support this new security method."
Hope (Score:4, Interesting)
But why did it take so long to implement?
OpenBSD at the cutting edge on security (Score:4, Interesting)
Re:OpenBSD at the cutting edge on security (Score:1, Interesting)
"... in other words, it doesn't matter WHERE you shift the buggy code."
--Theo de Raadt
Won't this crutch actually tempt people to write sloppy memory management because "the heap manager will catch it?"
Is the performance hit really worth it?
Hm... old technique? (Score:5, Interesting)
Could this help Gnome? (Score:3, Interesting)
Is it really true that the standard GNU/Linux heap implementation holds onto pages like this when it becomes fragmented? That sounds really primitive to me.
In related news, GCC 4.1 stack protector (Score:4, Interesting)
Hopefully mainstream distros that have been wary of propolice will start using this new feature. And perhaps glibc malloc will borrow a few tricks from this new openbsd malloc too.
Already in Microsoft DEP (Score:2, Interesting)
My CPU doesn't support DEP in hardware, so I imagine the software-based method of doing this will create quite a speed hit. Anybody have any experience with turning on DEP for all programs?
Intron == heap protection (Score:3, Interesting)
OSes can put Intron between Exon (useful DNA -- useful stuff on the heap) to
detect badly behaving apps!
In other words, so-called "Junk DNA" may actually have a use...
HEAP PROTECTION
Wrong solution for solving heap problems. (Score:4, Interesting)
From the kerneltrap.org post:
He explains that for over a decade efforts have been made to find and fix buffer overflows, and more recently bugs have been found in which software is reading before the start of a buffer, or beyond the end of the buffer.
The solution that the kerneltrap.org refers to against buffer overflows is to:
My opinion is that #1 will slow software down, although it will make it indeed more secure. #2 will make it more difficult to exploit buffer overflows, since the space between two allocated heap blocks will be random (and thus the attacker may not know where to overwrite data).
Unless I haven't understood well, these solutions will not offer any real solution to the buffer overflow problem. For example, stack-based exploits can still be used for attacks. The solution shown does not mention usage of the NX bit (which is i86 specific). It is a purely software solution that can be applied to all BSD-supported architectures.
Since all the problems relating to buffers (overflow and underflow) that have costed billions of dollars to the IT industly is the result of using C, doesn't anyone think that it is time to stop using C? there are C-compatible languages that allow bit manipulation but don't allow buffer overflows; e.g. Cyclone [wikipedia.org].
Re:cool (Score:3, Interesting)
Yes, plenty, and maybe even most of their promises about being a generally secure system are complete and utter rubbish. However, I'm willing to bet that each of their OSes are more secure than the last one. The problem is that they still leave plenty of holes open when they do things like (to point out the landmark example) weld the web browser to the kernel. I know that most people crack windows because it's easy, but while I may be wrong on this, I think people will continue to spend thier efforts on Windows even if their security was (competely hypothetically) top-notch only because of the bad reputation that precedes them.
I'm not saying that Windows is secure or that it ever will be. However, their security has improved, regardless of how poorly. The last reason on the list to crack Windows (in my opinion) and possibly the strongest reason is that they have a history of poor security. I think script-kiddies will pour ANY amount of effort into destroying any version of Windows just to keep that idea alive.
As much as I love Linux, I really doubt that any one distro (like RedHat for example) would be able to keep their system as secure as it is now if they were the entire world's information security scapegoat as MS is now. (PS, yes I know that MS does, mostly deserve the title they hold)
(I'm not a security expert, nor am I claiming to be, so if you think I'm wrong all I ask is that you not 'correct' me with a torch ^_^)
Re:new method? (Score:3, Interesting)
Re:Already in Microsoft DEP (Score:3, Interesting)
Also, I have not been hacked anytime in the last ~5 minutes, whatever that's worth (but would I know?).
As an aside, I read the paper on the Microsoft DEP flaw a few months ago, and wasn't that impressed. It looks very hard to exploit. And since DEP is a added protection mechanism, the existence of a small, hard-to-expliot flaw isn't that big of a deal. (In simple terms, with DEP on, a hacker would have to exploit and DEP flaw and a normal overrun flaw to hack the system.)
OpenBSD Goals (Score:3, Interesting)
Comment removed (Score:2, Interesting)
This is how Electric Fence works. (Score:5, Interesting)
It may be a legitimate invention - it is cited as prior art in an ATT patent. This is also the first known example of a prior Open Source publication causing a patent filer to cite. ATT also removed a claim from the patent that my work invalidated. Just search for "Perens" in the U.S. patent database to find the patent.
We don't run it on production programs because of its overhead. To do this sort of protection exhaustively, it requires minimum two pages of the address space per allocation: one dead page between allocations and one page allocated as actual memory. This is a high overhead of page table entries, translation lookaside buffers, and completely destroys locality-of-reference in your application. Thus, expect slower programs and more disk-rattling as applications page heavily. If you are to allocate and release memory through mmap, you get a high system call overhead too, and probably a TLB flush with every allocation and free.
Yes, it makes it more difficult to inject a virus. Removing page-execute permission does most of that at a much lower cost - it will prevent injection of executable code but not interpreter scripts.
I don't think the BSD allocator will reveal more software bugs unless the programmers have not tested with Electric Fence.
Bruce
Re:Slowdown? (Score:3, Interesting)
Performance? (Score:3, Interesting)
Application heap allocation has "traditionally" been fairly inexpensive unless the heap has to be grown (update a couple of free block pointers/sizes) and the cost of growing the heap (which requires extending the virtual address space and therefore fiddling with page tables which would on a typical CPU require a mode change) is mitigated by allocating more virtual address space than is immediately needed.
If free space is always unmapped then each block allocation will require an alteration to the page tables, as will each unallocation. Not to mention that could cause the page-translation hardware to operate sub-optimally since the range of addresses comprising the working set will constantly change.
If most allocations are significantly less than a page size, then the performance impact may be minimal since whole pages will rarely become free, but if allocations typically exceed a page size, that would no longer be true. If the result is that some applications simply implement their own heap management to avoid the overhead, then you've simply increased the number of places that bugs can occur.
Re:For Real Security (Score:3, Interesting)
Do you also advocate four times the memory usage and double the speed? Do you blame the language or the speaker when they can't formulate a proper sentence construct? How about we just teach the programmers better instead of bitching about the tool they use.
If half the C coders out there knew the differences between stack, heap, and namespaces, we would not even be debating this issue. Don't blame the coders, blame the universities.
Enjoy,
Re:Slowdown? (Score:4, Interesting)
I work with real time systems and 0.0001 seconds (100 microseconds) is plenty fast for most Human to Real time systems applications. Granted JAVA is not what you want for fine-tuning your Engines performance but it's plenty fast for most applications. What makes Java so useful is you get to avoid most of the really time consuming bugs. Compare a fully functional java based multithreaded HTTP server with the C / C++ equivalent and it's going to be 1/3rd as much code. And will operate at vary close to the same speeds. In other words it's designed around applications where programmer time is worth more than machine time. We already have C so Java was built around the 95% of applications that don't need inline ASM.
I have killed BSD UNIX with buggy C networking code which is the only thing I have been unable to duplicate with good Java code. You can do bit twiddling in Java, but it's faster in C. You can have hundreds of threads doing their own thing in either but it's much easer to do that in Java than C/C++. The secret is to know enough about how Java works so that you avoid things like creating new threads that eat up a lot of time. Once you understand how things work you can use things like Thread Pooling that are extremely efficient. Instead of complaining that concatenating Strings takes so long try learning about what other tools are out there like StringBuffer.
PS: A quick look at some fast Java code. (It is a bit dated but gives you some idea what I am talking about.) [protomatter.com]
Re:Wrong solution for solving heap problems. (Score:4, Interesting)
In the event of a screw-up on the part of the JIT or runtime programmer for any language, every program is instantly vulnerable, and all of this generic proactive security stuff is disabled because this "secure language" doesn't work in an "inherantly secure" environment, only a much weakened one. C's runtime is rather basic (and it's still huge), as is its language; people still screw that up once in a while, but rarely.
While these "shiney new secure languages" may boast "immunity to buffer overflows," their runtimes are still designed around other concepts that may leave holes. Look at this memory allocator and think about a bug in the allocator that smashes up its own memory before it gets everything set up; because the new protections aren't yet set in place, it'd be totally vulnerable at that point (no practical exploit of course). A bug that forgets to add guard pages (generates 0 guard pages every time) might occur too in one update. Now add to that something like Java or Mono-- interpreted or not, you're running on a whole -platform- instead of just a runtime. C++ instruments loads of back-end object orientation.
So in short, C is a very basic language that has easily quantifiable attack vectors, and thus the system can be targeted around these for security. Several such enhancements exist, see PaX, GrSecurity, W^X, security in heap allocators, SELinux, Exec Shield, ProPolice. Higher level languages like C++ implement back-end instrumentation that ramps up complexity and may open new, unprotected attack vectors that are harder to quantify. Very high end languages on their own platform, like Java and Mono, not only implement massive complexity, but rely on a back-end that may lose its security due to bugs. Platform languages may also be interpreted or runtime generated, in which case they may require certain protections like PaX' strict NX policy to vanish; in some cases these models (as an implementation flaw) also don't work well with strict mandatory access control policies under systems like SELinux.
Face it. C is the best language all around for speed, security, portability, and maintainability. Assembly only brings excessive speed at the cost of all else; and higher level languages sacrifice both speed and real security (despite their handwaving claims of built-in security) at varying degrees for portability, speed of coding, and maintainability. Even script languages working inside a real tightly secured system would more easily fall victim to cross-site scripting, the injection of script into the interpretation line; under such a system, any similar attack is impossible in a C program.
On a side note, I'd love to see a RAD for C. Think Visual Basic 6.0, but open source, using C/GTK+. Glade comes close. . . .
Re:Slowdown? (Score:4, Interesting)
And anyone who's run a JVM knows about the price of this task -- yes GC takes time.
However, as I understood the article, the author was making a point that the way most C programmers manage memory tends to make the task more time consuming than is necessary. Therefore relying on a known optimized implementation rather than reinventing the wheel every time may be preferred. After all, it is just the VM implementor that needs to understand how to optimize the memory management, not the application developers. So yes, where the time is spent is shifted but also the amount of total execution time spent on memory management can be reduced -- because the task is managed differently.
As for the specific details of this paper, they're basically discussing how to determine which objects can be safely allocated from the stack, instead of heap, and therefore can be discarded without the usual book keeping required from a heap GC.
how many high performance memory intensive Java applications are there
Java is so widely used on the server side and middleware, it cannot be difficult to come up with examples -- Tomcat, J2EE app servers, etc. eBay for instance advertizes very clearly on their front page to be powered by Sun's Java technology. There are individual Java systems that manage millions of transactions daily, and there must be thousands of systems out there that do this every day with Java.
Re:Slowdown? (Score:4, Interesting)
Essentially, I'd create a large ring buffer of malloced temporary buffers of some standard length. Any time a temporary buffer was needed, I'd grab the next one in the ring.
Before the buffer was provided to the function asking for it, the length would be checked. If the requested length was longer than the current length, the buffer would be freed and one of at least the proper length would be allocated. (I normally allocated by buffers in byte multiples of some fixed constant, usually 32.)
The idea was that by the time it was reused, what was already in the buffer was no longer needed. To achieve that, I'd estimate how many buffers might be needed in the worst case and then multiply that number by 10 for safety's sake.
My primary use of this was when doing enormous numbers of allocations of memory for formatting purposes. The function doing the formatting would request a buffer large enough to hold whatever it would need, write the formatted data into the buffer, and then return a pointer to the buffer. The calling function would simply use the buffer and never have to worry about freeing it.
The performance results were superb except in the very simplest cases where you allocated the buffers without ever using them.
I've never known anyone else who used this kind of approach although I've showed it to a large number of people.
Re:Slowdown? (Score:3, Interesting)
Re:This is how Electric Fence works. (Score:3, Interesting)
None of these are perfect of course, but each of the techniques has found bugs (hundreds in the case of the two mentioned above) in our source and ports trees. It's also great to see projects like CCured [berkeley.edu] being developed at Berkeley; although the overhead is just slightly too high to be used "out of the box" right now, it still works great with select applications such as Apache. The underlying tool, CIL [berkeley.edu] can compile most of the OpenBSD source tree (including the kernel) now, and the result even boots when using a null source-to-source transform.
Re:new method? (Score:3, Interesting)
Re:Heap Protection vs. Managed Code (Score:3, Interesting)
You could probably also use the MMU to reduce pauses in the gc... you determine what objects are unreachable in the background and using the dirty bit you can tell which pages may have references into the set, so instead of starting over from scratch again you just weed out the now-reachable objects from the set, which seems like it should be a much more tractable problem.
If I wasn't working I would do this. Check out jnode or jxos for example.
Solaris has had this for YEARS (Score:2, Interesting)