Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Security Programming

English Shell Code Could Make Security Harder 291

An anonymous reader writes to tell us that finding malicious code might have just become a little harder. Last week at the ACM Conference on Computer and Communications Security, security researchers Joshua Mason, Sam Small, Fabian Monrose, and Greg MacManus presented a method they developed to generate English shell code [PDF]. Using content from Wikipedia and other public works to train their engine, they convert arbitrary x86 shell code into sentences that read like spam, but are natively executable. "In this paper we revisit the assumption that shell code need be fundamentally different in structure than non-executable data. Specifically, we elucidate how one can use natural language generation techniques to produce shell code that is superficially similar to English prose. We argue that this new development poses significant challenges for in-line payload-based inspection (and emulation) as a defensive measure, and also highlights the need for designing more efficient techniques for preventing shell code injection attacks altogether."
This discussion has been archived. No new comments can be posted.

English Shell Code Could Make Security Harder

Comments Filter:
  • by jaymz2k4 ( 790806 ) <jaymz@jaymz.WELTYeu minus author> on Monday November 23, 2009 @09:55PM (#30209324) Homepage
    I just have to point out how well that PDF looked from a purely graphic point of view... That is all. Interesting content to boot.
  • Re:In other news... (Score:4, Interesting)

    by Knightman ( 142928 ) on Monday November 23, 2009 @10:10PM (#30209416)

    An assembler/compiler doesn't necessarily use a high-level language input.

    In this instance they (as you say) 'takes as input executable machine code and generates executable machine code with a very narrowly-defined statistical property' which tells me they have an assembler that reads executable code and assembles executable code that looks like English text, in other words an assembler.

  • Re:In other news... (Score:3, Interesting)

    by calmofthestorm ( 1344385 ) on Monday November 23, 2009 @10:19PM (#30209464)

    No, it translates assembly to different assembly that's also English. This is actually a rather interesting piece of work. They didn't just write a program that converts assembly to English assembly, they wrote one in English assembly.

  • Re:In other news... (Score:2, Interesting)

    by mysidia ( 191772 ) on Monday November 23, 2009 @10:28PM (#30209524)

    It is indeed a translator.

    It doesn't translate assembler code.. it translates x86 machine code.

    (Which also implies that it cannot be an assembler, since assemblers only accept Assembly code as input)

  • Re:OMG! (Score:4, Interesting)

    by wizardforce ( 1005805 ) on Monday November 23, 2009 @10:28PM (#30209526) Journal

    You joke but what is a meme (religions are "memes") really other than a self replicating piece of language? The *extreme* bits act in many ways like a virus does: self replication, performing specific tasks, adapting to their environment (like some of the more insidious malware) and neither viruses nor memes can replicate on their own; they need a "host."

  • So what? (Score:3, Interesting)

    by Fnord666 ( 889225 ) on Monday November 23, 2009 @11:17PM (#30209794) Journal
    I guess I don't see the big deal in this paper. Yes, they can encode the shell code into English sentences. It's still meaningless to the recipient and should raise suspicion. It would be far easier to use simple steganographic techniques to embed the shell code into any image transmitted between two systems. The recipient would not suspect any alteration and filters would not have the original image for comparison. Just a thought. Maybe I should write a response paper.
  • Re:This is (Score:2, Interesting)

    by mysidia ( 191772 ) on Monday November 23, 2009 @11:29PM (#30209852)

    No, it won't be the legacy x86 instruction set.

    But we can call it the "Secure x86 instruction set" or the "Enhanced x86 instruction set"

    Market it properly, and everyone will switch to it, because they think it's faster and safer.

  • Re:This is (Score:3, Interesting)

    by nneonneo ( 911150 ) <spam_hole.shaw@ca> on Monday November 23, 2009 @11:35PM (#30209886) Homepage

    Unfortunately, this does not fully solve the problem. Say, for instance, that you've managed to get a buffer overflow on a system, and you now have control over the stack (which is marked RW, but not X). Then, you overwrite the return address of the current function to mprotect() and stick some arguments on it which change the stack protection to RX (there are good reasons for doing this in actual practice, e.g. executable compressors like UPX, or executable thunks on the stack); this type of attack is known as a "return-to-libc" attack. If you can successfully overwrite the next lower return address as well, then you can ensure that your shellcode is executed after mprotect returns.

    Even if we assume that the stack is permanently fixed at RW, this does not prevent heap spray attacks which place executable code on the heap and overwrite return addresses on the stack to point at the heap. If the heap is marked RW, then we can just repeat the same process as used above to call mprotect.

    Prohibiting execution on writable segments seems sensible, but in the face of functions which can change the protection bits, it is ineffective. Further, simply restricting the use of those functions is potentially too restrictive, as in the case of some runtime environments which rely on the ability to execute dynamically generated trampoline code to implement key features (for instance, GCC may generate trampoline code to call nested functions), as you mentioned with your second paragraph.

  • by Falconhell ( 1289630 ) on Tuesday November 24, 2009 @12:20AM (#30210088) Journal

    It hope none of you are thinking of subscribing coolforsale's email address zminring@gmail.com to a lot of spam lists.

    That would be very wrong.

    Very very wrong.

  • Re:In other news... (Score:3, Interesting)

    by rnturn ( 11092 ) on Tuesday November 24, 2009 @12:25AM (#30210108)

    "the compiled binary data that can be run raw on an x86 processor -- looks like English text."

    I had brought something like this up during an after-work, Friday night beer session back in the late '80s when a co-worker mentioned the odd snippets of text that one would see while examining programs using the debugger. (No... we weren't talking about strings of text defined in the source code.) I wondered whether it was possible to come up with a program whose machine code formed English text that actually performed a useful function; like some bizaare entry in an Obfuscated Assembly Language contest. Looks like it was possible though I still am not sure that malware actually meets my definition of "useful". Eye of the beholder, I guess.

  • Re:In other news... (Score:3, Interesting)

    by TheLink ( 130905 ) on Tuesday November 24, 2009 @12:32AM (#30210124) Journal
    There's a difference, an assembly language representative of a machine code program doesn't normally execute on the target machine. It has to be "assembled" to the object code before it can be executed.

    What these bunch have done is created a program that "massages" (which could include expanding and alteration) source machine code to a new arrangement of _machine_code_ that can execute on the target as is. That new arrangement happens to resemble English text (in a computer format).

    It's only an assembler if you're thinking of machine code as the "assembly language" and the "english looking" machine code as the assembled object code.

    But that's stretching things a lot. Like saying you've actually been right all along, that is if wrong is right. ;)
  • by Terje Mathisen ( 128806 ) on Tuesday November 24, 2009 @03:43AM (#30210882)

    I for one is very impressed by what they've done, even if it is somewhat similar to what I did nearly 15 years ago:

    At that time I wrote what's probably the "best" executable text encoder for MsDos, it uses the absolute minimum possible amount of self-modification (a single 2-byte Jcc opcode) while staying entirely within the MIME text character set, and survives all the most usual forms of reformatting/reflowing of the text. (Replacing CRLF with a single CR (Mac) or LF (unix) or turning each paragraph into a single line.)

    The initial bootstrap looks like this:

    ZRYPQIQDYLRQRQRRAQX,2,NPPa,R0Gc,.0Gd,PPu.F2,QX=0+r+E=0=tG0-Ju E=
    EE(-(-GNEEEEEEEEEEEEEEEF 5BBEEYQEEEE=DU.COM=======(c)TMathisen95

    (The uppercase 'E's are my NOP fillers, they execute as INC BP, a register I don't use.)

    Terje

    PS. Unlike the current guys, I wrote the code above by hand, on paper, during the evenings of a ski vacation. I had brought with me a listing of the ascii encoding of all instructions that would use MIME characters only. :-)

  • by Hurricane78 ( 562437 ) <deleted @ s l a s h dot.org> on Tuesday November 24, 2009 @03:57AM (#30210920)

    Isn’t this why CAPTCHA was invented?

    I mean just add captchas an a place where is slows him down too much for spamming to still make sense.

    And freakin’ use reCAPTCHA, if you don’t want to get laughed at! ^^

  • HP had it in 1986 (Score:3, Interesting)

    by Anonymous Coward on Tuesday November 24, 2009 @07:07AM (#30211660)

    I think this is interesting, but hardly break-through.

    In the mid 80's, we did the same thing at a field Hewlett-Packard office, although not aimed at viruses. Our target was to enable users to key in x86 code in text form. In other words, sit down at a PC, open EDLIN (the DOS equivalent of Notepad), or some simple text editor, and key in human readable words (i.e. meaningful text that humans - HP Engineers - could easily transcribe from paper or a phone call). Then save the file as a .com file (which was a DOS executable), and then run it.

    Think back to the days of stand-alone PC's, no USB, etc. If the field engineer was at a customer site, and needed to run a small diagnostic program on the PC, but didn't have the tool, then they'd simply call the office, and have the secretary ("coordinator") read them the human-readable sentences to key in. The engineer keys it in, and launches a diagnostic program. Our version even had a check-sum built into the words, so as long as you got the first few sentences exactly right (which were the boot part), then the rest of the "code" (sentences) were examined for check-sums, and would generate a location-specific error (e.g. "Checksum error in the sentence 'Many frightened capsules trigger captain mole".)

    I remember this well, because I wrote the boot part, and the checksum algorithm. I made it fairly resilient to normal human typing habits (i.e. don't worry about capitalization, multiple spaces between words, apostrophes, etc). And I tried to choose some easy sentences (manually) for the boot part, since that had to be entered exactly right each time.

    The system was made up of a "compiler" which would take a simple .com file (that is, an executable file, not a dot-com website), and convert it to "sentences" (which made little sense). We used a spelling dictionary English words, removing homonyms, as we wanted the words to be "read aloud". We tried to compile into short sentences of specific noun-phrase / verb-phrase formats, but they rarely made any sense. Some were outright silly, like: "The crazed orange melts to school."

    It worked great! But it was only practical for very short utilities. Still, it was FAR easier to key in sentences of nonsense, rather than hex code. Our experience was that a typical engineer could key in nonsense sentences about 5 times faster than hex code (even considering that the words had to have extra boot code to analyze the text), although the results varied depending on the length of the overall program.

    Then networks came along and rendered it fairly useless.

  • by rochberg ( 1444791 ) on Tuesday November 24, 2009 @09:22AM (#30212684)

    First, I would not say that they can convert arbitrary shell code to English-like prose. Rather, the only instructions that can be used are the ones that are identical to the ASCII encoding of the alphabet. For instance, the ASCII encoding of the letter "r" is identical to the binary for the unconditional jmp instruction. Granted, the authors showed that you can do a lot with this limited set of instructions, but I still wouldn't call it arbitrary.

    According to the PDF it does convert arbitrary shell code. FTA: What follows is a brief description of the method we have developed for encoding arbitrary shellcode as English text... It looks like they can encode anything once they have built an English-like decoder (judging by their language and the 3rd figure).

    Ah, I forgot about that part. Yes, the first part of the shell code decodes the remaining message so that they are no longer limited to just typical ASCII characters. You are correct.

    The tight constraints on the instructions that can be encoded into ASCII make crafting decent English syntax nearly impossible. Spam filters based on natural language processing could probably detect and flag them.

    If they were sending SPAM... which they aren't.

    Here, you missed my point. I was not implying that they were actually sending spam. The sentences they crafted are essentially identical to the kinds of sentences you see in spam. My point was that NLP techniques could be applied to flag these sentences just as they are with spam.

  • by Terje Mathisen ( 128806 ) on Tuesday November 24, 2009 @10:32AM (#30213440)

    I know, and that's exactly what's makes it so interesting:

    They have effectively defined a small subset of the entire instruction set while allowing all other instructions that doesn't produce a side effect which would crash their "real" code.

    Terje

  • Re:Why run the code? (Score:2, Interesting)

    by Gleapsite ( 713682 ) on Tuesday November 24, 2009 @10:39AM (#30213498) Homepage
    Its a stenographic method. It requires some executing code (malware, software vunerability, w/e) to jump to the ASCII text and begin executing it. Its like camouflaging the battering ram to look like the countryside. You still need someone on the inside to lower the drawbridge.
  • by Megane ( 129182 ) on Tuesday November 24, 2009 @02:21PM (#30216566)

    This sort of shellcode is probably a bit harder to write for the 68000, with its 16-bit instructions that have an "operand mode" field that spans between the two bytes. While a lot of useful instructions are in the 2xxx-7xxx range, and branches are in the 6xxx range, the instructions that do any sort of math are outside it.

    It would be interesting to see what can be done with other CPUs as well. In particular, I recall that OS X PPC missed a chance to resist shellcode by ignoring two of the four bytes of the OS trap instruction, rather than forcing them to be nulls.

Real Programmers don't eat quiche. They eat Twinkies and Szechwan food.

Working...