Forgot your password?
typodupeerror
Security Programming

English Shell Code Could Make Security Harder 291

Posted by ScuttleMonkey
from the little-bobby-tables-takes-up-writing dept.
An anonymous reader writes to tell us that finding malicious code might have just become a little harder. Last week at the ACM Conference on Computer and Communications Security, security researchers Joshua Mason, Sam Small, Fabian Monrose, and Greg MacManus presented a method they developed to generate English shell code [PDF]. Using content from Wikipedia and other public works to train their engine, they convert arbitrary x86 shell code into sentences that read like spam, but are natively executable. "In this paper we revisit the assumption that shell code need be fundamentally different in structure than non-executable data. Specifically, we elucidate how one can use natural language generation techniques to produce shell code that is superficially similar to English prose. We argue that this new development poses significant challenges for in-line payload-based inspection (and emulation) as a defensive measure, and also highlights the need for designing more efficient techniques for preventing shell code injection attacks altogether."
This discussion has been archived. No new comments can be posted.

English Shell Code Could Make Security Harder

Comments Filter:
  • by benjamindees (441808) on Monday November 23, 2009 @09:45PM (#30209250) Homepage

    They don't mean shell commands. They mean code that exploits a vulnerability in order to start a shell.

  • Re:In other news... (Score:5, Informative)

    by blueg3 (192743) on Monday November 23, 2009 @09:50PM (#30209276)

    Good job not reading the article.

    It's not that shellcode can be written in text and then compiled to an executable form. It's not that shellcode can be compiled to an intermediary form, translated or compiled into machine instructions by a piece of code (this is common in malware now, to pass input restrictions -- as the article says). It's that the executed machine instructions themselves -- the compiled binary data that can be run raw on an x86 processor -- looks like English text.

  • Re:This is (Score:5, Informative)

    by Wovel (964431) on Monday November 23, 2009 @10:00PM (#30209358) Homepage

    Guess you missed their "compromised" machine assumption. "..After successful exploitation of a software vulnerability, we assume that a pointer to the shellcode..." . The sky is not really falling any faster today than it was yesterday.

  • Re:Confused (Score:4, Informative)

    by The MAZZTer (911996) <megazzt@gm[ ].com ['ail' in gap]> on Monday November 23, 2009 @10:04PM (#30209380) Homepage

    Nope, you're confusing assembly code and shell/machine code, which are two different things.

    Assembly is text-based, and is readable for people who know the language. Each operation is a keyword, and some take arguments. It's basically the lightest-weight possible programming language (although it's not really considered a programming language, it's so light weight!) A computer cannot run assembly code directly.

    Machine code is what you get if you take the assembly and run it through an assembler to produce code that the computer can understand. The computer can then execute it. It is not human readable unless you've memorized which opcodes correspond to which assembly keywords. Far easier to pipe it through a disassembler to get the assembly code back and read that.

    To answer the GP's question this sounds like they mean shell code. It wouldn't be very useful as assembly code anyway. ("To claim your free iPod, run this sentence through masm and run the resulting EXE file.") Most people don't have an assembler and the ones who do aren't usually susceptible to malware anyway.

  • by HEbGb (6544) on Monday November 23, 2009 @10:05PM (#30209390)

    This is the sixth spam message this user has posted, will SLASHDOT please BAN this guy already? Come on.

  • Re:This is (Score:5, Informative)

    by blueg3 (192743) on Monday November 23, 2009 @10:11PM (#30209426)

    Pinning down terminology use by security researchers is tricky.

    In this case, what they mean is that the system has a vulnerability that enables code from a remote source to be executed, and that the input from the remote source is being run through a filter that attempt to identify executable code (in order to block it) versus English text.

    On an already-secure system, this makes no difference at all. Those don't exist, much. If you were relying on a "looks like executable code" filter to protect you, this is a tip that it's not that secure. The paranoid should already assume so (based on things that already are available in Metasploit, if nothing else).

  • by Tynin (634655) on Monday November 23, 2009 @10:13PM (#30209438)

    This is the sixth spam message this user has posted, will SLASHDOT please BAN this guy already? Come on.

    He must be making new logins. I've seen him posting for a few weeks, he surely has more than 6 spams that I've seen alone. Going on that idea... lets see:
    http://slashdot.org/~coolforsale117 [slashdot.org]
    http://slashdot.org/~coolforsale116 [slashdot.org]
    http://slashdot.org/~coolforsale115 [slashdot.org]
    http://slashdot.org/~coolforsale114 [slashdot.org]
    http://slashdot.org/~coolforsale112 [slashdot.org]
    http://slashdot.org/~coolforsale110 [slashdot.org]

    No doubt there is a TON of them. So I'd guess they are banning him, he just keeps making new uids (and siphoning a ton of moderation points to keep him marked at troll / offtopic). I know I've used many mod points keeping this bastard down.

  • by sten ben (1652107) on Monday November 23, 2009 @10:26PM (#30209510)
    Looks like LaTeX [latex-project.org] with a CHI [rwth-aachen.de] template. But maybe that was what you were getting at? Pretty it is.
  • Re:In other news... (Score:3, Informative)

    by blueg3 (192743) on Monday November 23, 2009 @10:27PM (#30209518)

    Technically, machine code -- assembly is the pseudo-English text version of machine code.

    But otherwise, yes.

  • by tepples (727027) <tepples@gmaiBLUEl.com minus berry> on Monday November 23, 2009 @10:31PM (#30209542) Homepage Journal

    Isn't this what NX is supposed to stop, execution of arbitrary data as code?

    Then you compromise a binary that has opted out of strict NX, such as a Java virtual machine that needs to dynamically recompile JVM bytecode to x86 bytecode.

  • by gzipped_tar (1151931) on Monday November 23, 2009 @10:33PM (#30209556) Journal
    The PDF file itself was generated using Adobe Distiller for Mac. Not sure what is used to generate the original. Since they were using Adobe, it's not likely that they were using LaTeX.
  • by sten ben (1652107) on Monday November 23, 2009 @10:40PM (#30209592)

    Since they were using Adobe, it's not likely that they were using LaTeX.

    Except the .dvi file extension. And: Creator: dvips(k) 5.97 Copyright 2008 Radical Eye Software

    Acrobat was probably only used to convert the ps to pdf.

  • by x2A (858210) on Monday November 23, 2009 @10:47PM (#30209640)

    It's a research paper, not an exploit, not instructions on how to make an exploit, not recommendations on how to make an exploit. God what's with you people on this site, you can't just see something for what it is, you have to see it for how it serves no purpose to you or how you can do it so much better.

    If they could exploit a machine by sending a point across, they'd get it past you lot every time, you'd never detect that huh.

  • by dubaiguy (1684890) on Monday November 23, 2009 @10:49PM (#30209648)
    It's latex with an ACM template. I'm pretty sure their workflow was latex (.dvi) to dvips (.ps) to Acrobat Distiller (.pdf).
  • Re:In other news... (Score:4, Informative)

    by DoctorBit (891714) on Monday November 23, 2009 @10:58PM (#30209702)
    It's a translator that takes any arbitrary x86 machine code as input, and produces as output functionally equivalent self-modifying machine code that starts off looking like English text. The same approach also works with other non-x86 machine codes, and other languages, such as Russian, French, etc... Very interesting work. It goes to show that for an OS to allow any code to self-modify can produce results that are very difficult to predict. Self-modifying code has an almost biological nature.
  • by rochberg (1444791) on Monday November 23, 2009 @11:49PM (#30209948)

    This talk was probably my favorite at CCS this year. Unlike MANY researchers, the lead author of this paper was quite entertaining. Regarding the work itself, there are a few details that the current discussion has missed.

    First, I would not say that they can convert arbitrary shell code to English-like prose. Rather, the only instructions that can be used are the ones that are identical to the ASCII encoding of the alphabet. For instance, the ASCII encoding of the letter "r" is identical to the binary for the unconditional jmp instruction. Granted, the authors showed that you can do a lot with this limited set of instructions, but I still wouldn't call it arbitrary.

    Second, he showed several examples of the sentences created. They make about as much sense as "Lorem ipsum dolor sit amet..." The tight constraints on the instructions that can be encoded into ASCII make crafting decent English syntax nearly impossible. Spam filters based on natural language processing could probably detect and flag them.

    While disguising the binary as ASCII is cool, I don't see that it's all that different than other exploits. Once a sentence containing an exploit is detected, you'll have signatures just like any other type of virus/trojan. I highly doubt that contemporary anti-virus scanners stop working on data that looks like ASCII. Rather, they look for tell-tale signs of particular instructions that appear in particular orders, etc.

    And, as many others have pointed out, this code is only harmful if it is executed in the right context (i.e., you have a vulnerability to exploit). Disguising the code as ASCII doesn't really make it different than any other type of zero-day attack.

    This work was very sophisticated, and there's no way that script kiddies could build something like this. I don't know that more advanced attackers would bother, because I really don't see all that much of a payoff given the amount of work that this attack requires. It's a whole lot easier to take over a vulnerable web server and launch a XSS attack. The incentives simply do not seem to suggest that this technique will become widespread.

    So, no, I don't think the sky is falling because of this attack. Having said that, though, this was a very cool piece of work.

  • by dubaiguy (1684890) on Tuesday November 24, 2009 @12:07AM (#30210030)

    First, I would not say that they can convert arbitrary shell code to English-like prose. Rather, the only instructions that can be used are the ones that are identical to the ASCII encoding of the alphabet. For instance, the ASCII encoding of the letter "r" is identical to the binary for the unconditional jmp instruction. Granted, the authors showed that you can do a lot with this limited set of instructions, but I still wouldn't call it arbitrary.

    According to the PDF it does convert arbitrary shell code. FTA: What follows is a brief description of the method we have developed for encoding arbitrary shellcode as English text... It looks like they can encode anything once they have built an English-like decoder (judging by their language and the 3rd figure).

    The tight constraints on the instructions that can be encoded into ASCII make crafting decent English syntax nearly impossible. Spam filters based on natural language processing could probably detect and flag them.

    If they were sending SPAM... which they aren't.

  • Re:Confused (Score:5, Informative)

    by Ungrounded Lightning (62228) on Tuesday November 24, 2009 @12:14AM (#30210070) Journal

    TFA uses the security community's special term "(a) shellcode", which means something other than what it sounds like to ordinary programmers.

    "A shellcode" is the infection head of an exploit - the thing you try to get to run on the target to make the rest of the exploit work. It's in the machine language of the target, not a shell language.

    It's called "a shellcode" because it typically (but not necessarily) tries to sucker the system into launching a shell to run the rest of the exploit. The rest of the exploit may be in a shell language (depending on the shell to interpret it), a machine language executable, etc. Or "the shellcode" may do something else than launch a shell.

    This is one of the latter cases. It's a chunk of self-modifying code (due to the limits of what instructions you can get out of English-looking text) that bootstraps its own internals into something that can act as an interpreter (or other executor) for the rest of the English-looking exploit code, then runs though that code and "makes it happen".

    You can think of it as a binary executable program that depends on self-modification to get away with consisting only of combinations of bytes that look enough like English to fool spam filters which are trying to recognize executable code.

    So it's a very goofy binary and there are no shells or shell languages involved. Instead (if I read this right) the researchers built a very screwy assembler that takes as input an assembler source program and produces as output some VERY screwy machine code that looks like English and ends up doing the same job in a roundabout way, rather than being the direct translation of the assembler code input.

  • Re:In other news... (Score:1, Informative)

    by Anonymous Coward on Tuesday November 24, 2009 @12:34AM (#30210144)

    And who defines what the assembly is? The ones writing the assembler. Sheesh..

    Sheesh indeed.

    A compiler traditionally takes a high-level set of instructions and translates them into a lower-level set of instructions. What they have done is take a low-level set of instructions and found a way to make them high-level... looking... but still able to execute at a low level.
    So technically this is more of an obfuscator than a compiler. I'm not saying you're entirely wrong, but to try and sum it up as just another x86 assembler is glossing it over a good bit. As well as missing the underlying point that this type of thing could be used for some pretty nasty purposes in the right scenario.

  • Re:Antelope museum (Score:5, Informative)

    by slashqwerty (1099091) on Tuesday November 24, 2009 @01:12AM (#30210312)
    For those that are curious, here is some actual exploit code from the paper [jhu.edu]:

    There is a major center of economic activity, such as Star Trek, including The Ed Sullivan Show. The former Soviet Union. International organization participation Asian Development Bank, established in the United States Drug Enforcement Administration, and the Palestinian territories, the International Telecommunication Union, the first ma

    The bold characters are code. The rest have no net effect.

    Their strategy is to break the exploit into two pieces, a small executable decoder, and the payload. As you might imagine, the decoder decodes the payload. The payload is encoded in a benign-looking format which is simple enough. Their goal was make the decoder also look like benign data. To achieve that, their tool takes an existing decoder and automatically converts it to English-looking prose like the paragraph above. The tool is able to convert a decoder is less than an hour on commodity hardware.

  • Thanks (Score:4, Informative)

    by turgid (580780) on Tuesday November 24, 2009 @07:09AM (#30211680) Journal

    What is "shell code" supposed to be? Bourne shell scripts?

    Someone had to ask it!

    From the wikipedia [wikipedia.org]: In computer security, a shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability. It is called "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine. Shellcode is commonly written in machine code, but any piece of code that performs a similar task can be called shellcode. Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient.[1] However, attempts at replacing the term have not gained wide acceptance.

    So it's a poor piece of new terminology that has stuck, unfortunately.

  • Re:HP had it in 1986 (Score:1, Informative)

    by Anonymous Coward on Tuesday November 24, 2009 @08:40AM (#30212338)

    Maybe you didn't read the same article I did. The concept is identical: English words that are also machine executable. The only difference is that the "payload" for HP was a utility, and the payload for the article is a virus.

  • by coinreturn (617535) on Tuesday November 24, 2009 @10:24AM (#30213350)
    Yeah, but yours doesn't look like English; theirs does.

APL is a write-only language. I can write programs in APL, but I can't read any of them. -- Roy Keir

Working...