Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Security United States

DARPA Delving Into the Black Art of Super Secure Software Obfuscation 124

coondoggie writes Given enough computer power, desire, brains, and luck, the security of most systems can be broken. But there are cryptographic and algorithmic security techniques, ideas and concepts out there that add a level of algorithmic mystification that could be built into programs that would make them close to unbreakable. That's what the Defense Advanced Research Projects Agency (DARPA) wants for a new program called "Safeware." From DARPA: “The goal of the SafeWare research effort is to drive fundamental advances in the theory of program obfuscation and to develop highly efficient and widely applicable program obfuscation methods with mathematically proven security properties.”
This discussion has been archived. No new comments can be posted.

DARPA Delving Into the Black Art of Super Secure Software Obfuscation

Comments Filter:
  • by jcochran ( 309950 ) on Sunday October 05, 2014 @01:51PM (#48069125)

    The objective of "mathematically proven security properties" via program obfuscation is definitely not achievable. After all, it's a given security principle of "security through obfuscation" is unsupportable. If an adversary is capable of obtaining the executable of a program, they can also reverse engineer that same executable. It may take a lot of effort, but it is always achievable.

    • Re: (Score:2, Insightful)

      by roman_mir ( 125474 )

      OTOH all security is by obscurity, what is a password if not a piece of data that is obscured from most people and supposedly is only known by the one that owns it?

      • by Fwipp ( 1473271 ) on Sunday October 05, 2014 @02:25PM (#48069251)

        Well, something that is obscure is just something that's hard to read. A password is supposed to be hidden, and not seen at all. "Security through obscurity" is the idea that they'll be able to see your algorithms, just not figure it out.

        • by Anonymous Coward

          Fully homomorphic encryption. 'nuff said. Happens to be my current project: developing a processor that uses FHE. Unfortunately, they don't want to use specialized hardware. Does an FPGA count?

          CARRIER LOST. ;)

        • Well, I found the summary completely incomprehensible, so DARPA is apparently well on their way with this new technology to befuddle and obfuscate...
          • Well, I found the summary completely incomprehensible, so DARPA is apparently well on their way with this new technology to befuddle and obfuscate...

            I thought that was the purview of the legislature...

      • OTOH all security is by obscurity, what is a password if not a piece of data that is obscured from most people and supposedly is only known by the one that owns it?

        not impressed. "security by obscurity" generally refers to restrict information on how a system works in order to make it harder for people to access. pwords are not that - everybody knows that a system can be accessed by a password, and there are protocols in place for resetting or releasing passwords (if that is the case), etc. The rules of the game are well publicized.

      • No, not all security is obscurity. [stackexchange.com] If your list of things that need to be kept secret includes your security implementation, and especially your algorithm, then you have flawed security. Multi-level security increases the number of things you need to have and/or know in order to compromise the system. With e.g. ROT-13 or another shift cipher, once you know that they are using that cipher, there is no other knowledge that you need in order to break it. On the other hand, if you have an arbitrary number of ke

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      Security through obscurity as a first line of defense is perfectly fine. Now if the obscurity is the entirety of your security then you have problems.

      • Security through obscurity as a first line of defense is perfectly fine. Now if the obscurity is the entirety of your security then you have problems.

        It tends to give you a false sense of being more protected than you actually are, and it gives management incentive, through this false sense working on them as well, that they budget less for the work on real security in depth.

        There's a natural human psychological barrier against getting a good lock for one's front door, when one already has a lock for one's front door. Why buy another lock, when I have a perfectly good lock? It's the same mentality behind the anti-circumvention and reverse engineering l

    • How sure are you? If you can execute the program, that still doesn't mean you can predict exactly what it does and understand everything it could possibly do.
      A simple example of checking a password: you can see that the program hashes it and checks it against the hash it should be, doesn't mean you know what the right password is to get beyond it.
      Even if you can execute the program, triggering every possible machine state to analyze it is impossible for non trivial programs. And i'm wondering what they coul

      • I imagine that self-altering program code could become incredibly hard to analyze and unravel.
        • I imagine that self-altering program code could become incredibly hard to analyze and unravel.

          There are many tricks to make it hard. Some CISC instruction sets have variable length instructions, and allow instructions to start at any byte offset. So you can have the same string of bytes execute a completely different sequence of instructions depending on the offset of the entry point. This can make dis-assembly very challenging. I have heard from my biologist friends that DNA sometimes does the same thing, with the same DNA sequence encoding different proteins depending on the offset of the star

    • by IamTheRealMike ( 537420 ) on Sunday October 05, 2014 @02:31PM (#48069285)

      The objective of "mathematically proven security properties" via program obfuscation is definitely not achievable. After all, it's a given security principle of "security through obfuscation" is unsupportable. If an adversary is capable of obtaining the executable of a program, they can also reverse engineer that same executable. It may take a lot of effort, but it is always achievable.

      That is the standard consensus view in the software industry, yes. I'm afraid to tell you though, that it's wrong.

      Last year there was a mathematical breakthrough in the field of what is called "indistinguishability obfuscation" [iacr.org]. This is a mathematical approach to program obfuscation which has sound theoretical foundations. This line of work could in theory yield programs whose functioning cannot be understood no matter how skilled the reverse engineer is.

      It is important to note here a few important caveats. The first is that iO (to use the cryptographers name) is presently a theoretical technique. A new paper came out literally 5 days ago that claims to discuss an implementation of the technique [iacr.org] but I haven't read it yet. Will do so after posting this comment. Indeed, it seems nobody is quite sure how to make it work with practical performance at this time.

      The second caveat is that the most well explored version of it only applies to circuits which can be seen as a kind of pure functional program only. Actually a circuit is closer to a mathematical formula than a real program e.g. you cannot write circuits in C or any other programming language we mortals are familiar with. Researchers are now starting to look at the question of obfuscating "RAM programs" [iacr.org] i.e. programs that look like normal imperative programs written in dialects of, say, C. But this work is still quite early.

      The third caveat is that because the techniques apply to pure functions only, they cannot do input or output. This makes them somewhat less than useful for obfuscation of the sort of programs that are processed with commercial obfuscators today like video games.

      Despite those caveats the technique is very exciting and promising for many reasons, none of which have to do with DRM. For example iO could provide a unifying framework for all kinds of existing cryptographic techniques, and enable cryptographic capabilities that were hereto only conjectured. For example timelock crypto can be implemented using and iO obfuscator and Bitcoin.

      • by Fwipp ( 1473271 )

        Thanks, this is very interesting. I'd imagine that DARPA is aiming to do further research along these lines.

      • by IamTheRealMike ( 537420 ) on Sunday October 05, 2014 @03:56PM (#48069589)

        OK, I read the paper.

        The money quote is at the end:

        The evaluation results from Section 4 show that work still needs to be done before program obfuscation is usable in practice. In fact, the most complex function we obfuscate with meaningful security is a 16-bit point function, which contains just 15 AND gates. Even such a simple function requires about 9 hours to obfuscate and results in an obfuscation of 31.1 GB. Perhaps more importantly (since the obfuscation time is a one-time cost), evaluating the 16-bit point obfuscation on a single input takes around 3.3 hours. However, it is important to note that the fact that we can produce any “useful” obfuscations at all is surprising. Also, both obfuscation and evaluation are embarrassingly parallel and thus would run significantly faster with more cores (the largest machine we experimented on had 32 cores).

        Translated into programmer English, a "16 bit point function" is basically a mathematical function that yields either true or false depending on the input. It would correspond to the following C++ function prototype:

        bool point_function(short input);

        In other words you can hide a 16-bit "password" inside such a function and discover if you got a match or not. Obviously, obfuscating such a function is just a toy to experiment with. "SHA256(x) == y" is also a point function and one that can be implemented in any programming language with ease - short of brute forcing it, there is no way to break such an "obfuscated point function". Thus using this technique doesn't presently make a whole lot of sense. However, it's a great base to build on.

        I should note that the reference to AND gates above doesn't mean that the program is an arbitrary circuit - it means that the "program" which is being obfuscated is in fact a boolean formula. Now you can translate boolean circuits into boolean formulas, but often only at great cost. And regular programs can only be translated into circuits at also a great cost. So you can see how far away from practicality we are. Nonetheless, just last year the entire idea that you could do this at all seemed absurd, so to call the progress so far astonishing would be an understatement. Right now the field of iO is developing so fast that the paper's authors note that whilst they were writing it, new optimisations were researched and published, so there are plenty of improvements left open for future work.

      • Actually a circuit is closer to a mathematical formula than a real program e.g. you cannot write circuits in C or any other programming language we mortals are familiar with.

        Kevin Horton (kevtris in #nesdev on EFnet) writes circuits in Verilog for a living.

      • So you can basically "mathematically" obfuscate a function that is a"pure" function not implementable in a imperative programming and that it has no IO = you can obfuscate something that has no use and probably nobody knows well what it is doing :-)
      • by Anonymous Coward

        Assume for the moment it is true.

        That also means that an undetectable virus will exist as well.

      • Once you add all your caveat you have a pretty piece of hardware which gives you number you cannot see the function is for, while good for hardware, most software in the real world do not work like that. Once you start unrolling the instructions, consider input and output, then sooner or later depending on how much effort you want to have, you won't be able to hide your program inside working. The simple fact is that contrary to your hardware argument, the software has to be executed by the CPU. Frankly you
        • What they are trying to construct (and at least partially succeeding), though, is a cryptographic construct whereby you can feed input in one end and "iterate" the computation, but not know what computation you are actually doing. Imagine that every time you do any operation on two variables, you actually do all possible operations (i.e. multiply, add, shift, etc.) and only one output is stored. The trick is that which one is actually kept is hidden from you cryptographically. That is a very crude metaph
          • The trick is that which one is actually kept is hidden from you cryptographically.

            But what if you happen to be a CPU who just wants to execute a series of instructions? Is there some way to tell the CPU which opcode you want to execute without also telling a human who is merely pretending to be a CPU? If it is cryptographically hidden then how can the CPU read it?

            • Yes, the CPU executes all of them like I said. The "you" in this is the computer. It executes obliviously (think homomorphic encryption, but hiding even the circuit) so that someone with the key can recover the correct output but the entity doing the computation doesn't know what it actually computed.
    • by lgw ( 121541 )

      If an adversary is capable of obtaining the executable of a program, they can also reverse engineer that same executable. It may take a lot of effort, but it is always achievable.

      Well, you can also brute force a 256-bit key. It may take the lifetime of the universe, squared, but it is achievable.

      The whole point of this technology is that the computer executing the code doesn't have the source or data in the clear.

      There's some existing work designed for databases that work just this way. I send "the cloud" and encrypted query that causes the server to sum a column in some encrypted table and return me the encrypted result all without the server having any keys. It's all manipulati

  • by JustNiz ( 692889 ) on Sunday October 05, 2014 @02:10PM (#48069209)

    I'm amazed that someone who supposedly knows what they are doing would even suggest this.
    Program obfuscation is completely the wrong approach. It is just another mechanism that relies on security through obscurity, which has been proven time and again to be a short-term solution at best.
    When something is actually secure, it's readability should be irrelevant.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Security through obscurity can work to a point. *IF* you make it hard enough.

      Take for example Raiden II. That game has only recently (in the past month) been 'cracked'. Even though only sorta. There is no encryption. It is all just bundled into a 'cop' chip.

      The point though with their 'security' was not to never be cracked. But just make it a big enough pain in the ass that the bootlegers didnt copy the game for a long time. You could argue it took nearly 20 years to crack. Not bad for security thro

      • You could argue it took nearly 20 years to crack. Not bad for security through obscurity.

        But not nearly as long as what the industry wants, which is 95 years after first publication.

    • When something is actually secure,

      That's like saying "when we have world peace."

      Programmers make coding mistakes. It is inevitable.
      Even the best coding techniques can only reduce errors, not stop them completely.

    • There are different aspects of "security through obscurity". In this case I don't think obfuscation equals obscurity, as the software is still available in its whole form. But I agree with you that this is a pretty weak effort.
    • Except they are not talking about "security through obscurity" they are talking about a very specific kind of cryptographic program obfuscation. It is not at all the same.
    • by gweihir ( 88907 )

      I suspect the intention behind this is to hide backdoors that are intentionally placed. Tell the clueless public that the "TCP/IP stack is now protected from hacking by code obfuscation", when in fact a magic packet gets the NSA access to everything and it is very hard to find that out looking at the source.

    • And I always thought that PERL had been invented just for this purpose. That language has obfuscation built in!

    • I'm not sure if you have a legitimate point or are arguing for an ideology here (i.e. everyone should have the same incoming, military, software, etc).

      I can reverse engineer the output of a dotfuscated solution, for example, but the code is so mangled during the obfuscation process (where classes are hacked and merged semi-arbitrarily and variable names turn into "A", "x34Kj", etc) that it becomes unmanageable to make any changes to.

      Being able to reliably make changes to the source code is almost the
      • by JustNiz ( 692889 )

        >> Being able to reliably make changes to the source code is almost the entire cost of software development.

        It sounds like you're suggesting that making the $ cost of doing something not commecially profitable is alone sufficient to be a good security mechansim.

        • I'm not sure it's "alone suffecient", but if I can make reversing too expensive for my competitors, foreign adversaries, etc, I wouldn't call it "the completely wrong approach".
  • Pro tip for DARPA: use perl, hand out the source. Same end result but probably a few reverser suicides along the way.
    • by Tablizer ( 95088 )

      First translate the algorithm into Perl, and then run it on a Perl interpreter written in BrainFuck, of which the BrainFuck interpreter is written in APL, which runs on a machine language written for a drum-based OS from the early 1960's.

      • First translate the algorithm into Perl, and then run it on a Perl interpreter written in BrainFuck, of which the BrainFuck interpreter is written in APL, which runs on a machine language written for a drum-based OS from the early 1960's.

        You shouldn't knock Visual BASIC like that...

  • by Anonymous Coward

    So now on top of abstraction layers and lazy code we can look forward to wasting cycles on advancing the cat and mouse game of a security. I know I'm going to sound like an old codger, but my daily computing tasks have not really changed substantially since the mid 90s (emails, web browsing, shell access, word processing, etc). Streaming video and modern web technologies are awesome; it's not like there haven't been any worthy advances, but addressing excess power consumption, e-waste, and other associate

  • by Anonymous Coward

    This is a very different usage of "obfuscation" from what people typically use in everyday programming. It's coming out of some recent work in cryptography. See for example:

    Candidate Indistinguishability Obfuscation and Functional Encryption for all circuits
    http://eprint.iacr.org/2013/451.pdf

    What this line of work may allow you to do is have a cloud computer run some code on some data for you, without revealing anything to the computer about either the code or the data. Without breaking the crypto the clo

    • It seems to my uninformed mind like you would have to have already processed the data to create the sum total code path that the cloud computer is to run, at which point you no longer need them to run it.

    • by gweihir ( 88907 )

      Still, this does not make software "secure". In typical attacks you just want the software to misbehave in some way that gets you a shell. You do not need to understand what it does for that. In fact, most modern attacks involve fuzzing and then only looking at the specific things that break in order to subvert them. There is no need to understand what the code actually is supposed to be doing.

      • by gweihir ( 88907 )

        And before I forget: These techniques are excellent to hide backdoors and such, and thereby make software much, much less secure. That may be the real intent. After all, you do not want some vigilante to find the secret government backdoors in everything.

  • This feels like a blast from the past, specifically the Trusted Computer System Evaluation Criteria (TCSEC) [wikipedia.org] aka the "Orange Book."

    DoD 5200.28-STD - December 26, l985 [nist.gov]

    4.1 CLASS (A1): VERIFIED DESIGN

    Systems in class (A1) are functionally equivalent to those in class (B3) in that no additional architectural features or policy requirements are added. The distinguishing feature of systems in this class is the analysis derived from formal design specification and verification techniques and the resulting high degree of assurance that the TCB is correctly implemented. This assurance is developmental in nature, starting with a formal model of the security policy and a formal top-level specification (FTLS) of the design. Independent of the particular specification language or verification system used, there are five important criteria for class (A1) design verification:

    4.2 BEYOND CLASS (A1)

    Most of the security enhancements envisioned for systems that will provide features and assurance in addition to that already provided by class (Al) systems are beyond current technology. The discussion below is intended to guide future work and is derived from research and development activities already underway in both the public and private sectors. As more and better analysis techniques are developed, the requirements for these systems will become more explicit. In the future, use of formal verification will be extended to the source level and covert timing channels will be more fully addressed. At this level the design environment will become important and testing will be aided by analysis of the formal top-level specification. Consideration will be given to the correctness of the tools used in TCB development (e.g., compilers, assemblers, loaders) and to the correct functioning of the hardware/firmware on which the TCB will run. Areas to be addressed by systems beyond class (A1) include:

    DEF CON 20 - Tom Perrine - Creating an A1 Security Kernel in the 1980s [youtube.com]

    DEF CON 20 Archive [defcon.org]

    • 4.2 BEYOND CLASS (A1)

      Most of the security enhancements envisioned for systems that will provide features and assurance in addition to that already provided by class (Al) systems are beyond current technology.

      Ah, lovely. Government language at its most... statuesque. That's an incredibly awkward way to say, "Dude! We can't do it!"

  • Thats good (Score:2, Interesting)

    by Sla$hPot ( 1189603 )
    But how do you scan code for back doors, trojans, viruses, malware, bots etc.?
    • by Fwipp ( 1473271 )

      You don't. But we already don't scan the majority of proprietary (or even open source) code that we run on our machines, so effectively, the difference might not be that great.

      You can disable its ability to communicate with the outside world, or monitor communications it does make, to warn others that the code may be malicious. But that's about it.

  • Software obfuscation confronts exactly the same core problem as DRM: The goal is to both provide information, in usable form, and not provide the same information, to the same recipient, at the same time. That's impossible. So in both cases all you can do is to try to raise the bar, make it harder to extract the convenient form of the information, but "mathematically proven security properties" must be forever out of reach.

    Unless maybe they define "obfuscation" differently than I do.

    • by gweihir ( 88907 )

      It is not the same problem. DRM has to be secure against the machine it runs on. That is impossible. Secure software has to be secure at some perimeter (network socket, IPC interface, etc.), but anything inside this perimeter is assumed to be trustworthy. Secure software _is_ possible.

      • It is not the same problem. DRM has to be secure against the machine it runs on. That is impossible. Secure software has to be secure at some perimeter (network socket, IPC interface, etc.), but anything inside this perimeter is assumed to be trustworthy. Secure software _is_ possible.

        Obfuscation also has to be secure against the machine it runs on.

    • Unless maybe they define "obfuscation" differently than I do.

      It wouldn't be the first time this has happened in academic circles.

      A friend of mine did his PhD in Artifical Intelligence a couple decades ago, and has been working in the field since then. Some years back we were having an discussion about the Turing Test or something related, but it seemed like we were arguing past each other. Well, after some time it came out the source of the problem - at least according to him, what AI people think of when they discuss whether a system "understands" a language (to pic

      • by lennier ( 44736 )

        A person (or computer) possessing a (theoretical) book containing every possible response to every conceivable question and statement in, say, Chinese, would be considered to understand Chinese,

        I've always had a problem with this definition of the Chinese Room scenario. It's the following:

        To be successful in a conversation, that 'book' with responses to questions has to model not just the language, but also the subject domain and the personality of the simulated Chinese speaker. That means that not only does the book have to be huge - we're talking a giant library - but it also has have a representation of a personality inside. And the ability to store knowledge and alter that personality, dependi

  • I think what they are asking for is a virtual processor that executes encrypted instructions on a physical processor through homomorphic encryption. Existing techniques for this are insanely slow. They want some new math to make it faster, so that it's practical.
    • If you need a new processor you may as well just add encryption and local key storage and local memory to an existing processor and let it decode and run the code internally, with all outputs re-encrypted.
    • by gweihir ( 88907 )

      Actually no. Security functionality at some point must interact with its environment. Homomorphic encryption does _not_ allow the software to make decisions that are hidden from the processor, yet get passed to some the environment, like an access decision or the like.

  • According to Eric Holder, information security == material support for pedophiles. I expect the DOJ to promptly due DARPA out of existence!
  • by carlhaagen ( 1021273 ) on Sunday October 05, 2014 @03:50PM (#48069555)
    ...interpret the obfuscated source code, then why wouldn't a human be able to?
    • As I understand it, it's the compiler output which is obfuscated. Not the source code itself. After all, the original source code must be understood by a human programmer in order to be written in the first place.

    • Another poster mentions that a 16 bit function consisting of "15 AND gates" takes 31 GB of space and takes 3 hours to process. That actually does seem beyond a human to unobfuscate, but I bet a well written tool should be capable (if they can actually find the part that matters in that 31 GB).
  • ... a different language. Take Ancient Egyptian... without a Rosetta Stone there would be no means to translate it. The whole structure of the language was inscrutable without some sort of introduction to it.

    Your program and system ideally should run on custom hardware. Not known computer hardware that must conform to known standards. The system will not be as fast or cheap. But it will be so different that it will be difficult to understand. And what a programmers cannot understand they will not be able t

  • This really is BS. Sure, you can obfuscate complex function, e.g. mathematical functions, to the point that reading the code becomes pretty hard. (That includes most crypto, but note that non-standard crypto has a tendency to be insecure and standard crypto can be recognized.) But even there, an attacker can simply try out the functionality and recognize what it does for the cases that matter. For simple functionality (and most functionality is simple, e.g. data access or access control), this does not work

  • is the lamest title i've ever heard..
  • Black Box Hacker Challenge. Or variations on the name. It's what I'm calling it.

    OK, before you start with "HONEYPOT!", no it isn't, and this isn't a new idea either. It's been done. Many times. By lots of companies. Including Google - and the NSA, and all to test security on various bits of software outside of lab conditions. In case you're new here, a BBHC is a standalone or more commonly an integrated part of, a hacker convention where you take a blackbox (literally, hence the name of the game) loaded wit

  • This seems like something only useful for malware. After all the only reason you don't want the person to know what code they run is to do something they don't want you to do. And that's essentially the definition of malware.

C'est magnifique, mais ce n'est pas l'Informatique. -- Bosquet [on seeing the IBM 4341]

Working...