Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Encryption Security

Student Uncovers US Military Secrets 484

karthik_r085 writes "According to The Register, An Irish graduate student has uncovered words blacked-out of declassified US military documents using nothing more than a dictionary and text analysis software. Claire Whelan, a computer science student at Dublin City University was given the problems by her PhD supervisor as a diversion. David Naccache, a cryptographer with Gemplus, challenged her to discover the words missing from two documents: one was a memo to George Bush, and another concerned military modifications to civilian helicopters."
This discussion has been archived. No new comments can be posted.

Student Uncovers US Military Secrets

Comments Filter:
  • WMD!! (Score:5, Funny)

    by AmigaAvenger ( 210519 ) on Sunday May 16, 2004 @12:00PM (#9167261) Journal
    Oh OH, i can see it coming already, text analysis and dictionary software declared as Weapons of Mass Destruction! That, and Ireland is going to become the next member of the "axis of evil"
    • Re:WMD!! (Score:5, Funny)

      by FyRE666 ( 263011 ) * on Sunday May 16, 2004 @01:19PM (#9167797) Homepage
      Next stop for her: Guantanemo Bay...
      • Re:WMD!! (Score:4, Insightful)

        by Speare ( 84249 ) on Sunday May 16, 2004 @03:35PM (#9168479) Homepage Journal
        How sad is it when you read:
        • Re: WMD!! (Score: 5, Funny)
          Next stop for her: Guantanemo Bay...

        The government has already proven it will detain people just for what they know, without criminal charge, without provocation, without family access, without legal representation, without regard for international criticism, without regard for international laws and norms, without safeguards for personal safety, without justification or oversight by the courts.

        I doubt the G goons will be sweeping up this particular researcher, but what small and subtle distinction really lies between this case and others? What shred of humanity protects her from the inhumanity of the Bush/Rumsfeld/Ashcroft three-ring circus? Oh, she has red hair and freckles? Alrighty then.

    • Re:WMD!! (Score:5, Funny)

      by SlowMovingTarget ( 550823 ) on Sunday May 16, 2004 @01:54PM (#9167977) Homepage

      We must liberate all of that innocent Guinness from the oppressive Irish regime.

  • Ingenious... (Score:5, Interesting)

    by Denyer ( 717613 ) on Sunday May 16, 2004 @12:00PM (#9167266)
    "The first task is to identify the font, and font size the missing word was written in. Once that is done, the dictionary search begins for words that fit the space, plus or minus three pixels"

    This is why I don't work for an intelligence agency. On the other hand, I'm still probably better qualified than people who think blacking out a few words in a document strips them of contextual information...

    • Re:Ingenious... (Score:5, Interesting)

      by torpor ( 458 ) <ibisum AT gmail DOT com> on Sunday May 16, 2004 @12:45PM (#9167568) Homepage Journal
      This is why I don't work for an intelligence agency

      how righteous of you. in fact, if you look and know a little about intelligence analysis techniques, i think you'll find that the NSA already know about this approach for 'interpreting' typewritten redacts, even as far back as the 50's.

      what this story really seems to point out is the naivete of a lot of people about computers, and the powerful simplicity to seemingly difficult problems that they offer ... the average consumer.

      it wasn't so long ago that the idea of having massive dictionaries in ram and font and calculations on this order to make a practical approach was considered relatively 'resource difficult'.

      but moores laws and fry's electronics has certainly changed that.

      for the price of a nice night out, i could buy an extra computer for brute-force hacks against any target, stick it in my closet and forget about it. used to be, not so long ago you had to have a halon system and power room to do things like that ...
      • Re:Ingenious... (Score:5, Interesting)

        by blair1q ( 305137 ) on Sunday May 16, 2004 @01:18PM (#9167789) Journal
        how righteous of you. in fact, if you look and know a little about intelligence analysis techniques, i think you'll find that the NSA already know about this approach for 'interpreting' typewritten redacts, even as far back as the 50's.

        I just wish the intelligence community and their unintelligent sycophants the press would stop using redact [onelook.com] to mean elide [onelook.com].

        Especially as a noun, because a "typewritten redact" is like a copy editor with ink hammered onto him, somewhere.
  • by Pranjal ( 624521 ) on Sunday May 16, 2004 @12:01PM (#9167273)
    ...said
    "Please please please let the army attack Iraq"

    Apparently the word that was blacked out was please.
  • good for her (Score:2, Insightful)

    by nomadic ( 141991 )
    Pretty funny, but luckily she's from Ireland. If an American did this they'd probably receive a visit from some intelligence goons in short order.
    • Damn, with our nation in the state it is today? She'd be goddamn lucky to get ONLY a visit. Sad but true :(
    • Re:good for her (Score:5, Interesting)

      by iabervon ( 1971 ) on Sunday May 16, 2004 @12:16PM (#9167385) Homepage Journal
      I suspect there would be a 50/50 chance that the visit from the intelligence goons would be a job offer. US intelligence sorely needs people who can read between the lines and actually come up with correct answers.
      • Re:good for her (Score:2, Insightful)

        by nomadic ( 141991 )
        I don't know about that. They do need people like that, but I think they might not know they need people like that.
  • by Anonymous Coward
    Can gov stop her research on National Security grounds.

    An example of the program in use.

    G.W Bush is the ____________ of the United States of America.

    After the program

    G.W Bush is the idiot of the United States of America.

    • But most likely the resulting outcome will be that fewer documents are released. Or the pages will be so blackened that they will be unreadable.

      And yes I can make fun of your typos.. nothing better then the kettle calling one black...
  • obvious solution (Score:5, Insightful)

    by Anonymous Coward on Sunday May 16, 2004 @12:03PM (#9167292)
    Obviously, the next step the government will take is to require all documents be written in fixed-width fonts. Either that or they will require that all documents be converted into fixed-width before they are released for FFIA inquiries.

    Don't see how this is a big threat.
    • by zhenlin ( 722930 ) on Sunday May 16, 2004 @12:09PM (#9167332)
      Variable width fonts makes this easier. Or not.
      'iiii' probably has the same width as 'MM' in some variable width fonts.

      On the other hand, fixed width fonts allows calculation of the exact amount of letters to fit in.

      In any case, the 'official' font of the US Government was Courier New 12 for quite some time.
    • require all documents be written in fixed-width fonts

      That's no solution at all. You can still determine the word based on the context and the character count. It's just that the pool of possible solutions will be a little bigger.
    • by Feanturi ( 99866 ) on Sunday May 16, 2004 @12:20PM (#9167405)
      Wouldn't fixed-width just make it easier to figure out how many letters were in the missing words?
      • by Lord Ender ( 156273 ) on Sunday May 16, 2004 @12:37PM (#9167517) Homepage
        Yes. But knowing 5 letters are in a word doesn't narrow it down nearly as much as knowing the word is 46 pixels long.
        • by Midnight Thunder ( 17205 ) on Sunday May 16, 2004 @01:17PM (#9167783) Homepage Journal
          Yes. But knowing 5 letters are in a word doesn't narrow it down nearly as much as knowing the word is 46 pixels long.

          Maybe its just me, but the way I see it, is if you know that a word is 5 letters long, then you know its x pixels long (knowing the width of one character and you them all with monospaced). With a variable width font you know the length, but you don't know the number of characters. This means you go from 26^5 permuations, for the previous example, (26^n generally), to how ever many different letters fit in that space. For example 'will' with take up as much space as 'iiill', so you have a combination of multiple powers, in this case (26^4 + 26^5). For longer words you have more possible variations.
          • Re:obvious solution (Score:3, Informative)

            by Anonymous Coward
            iiill would never get chekced, because it isn't in the dicinoairy. What your doing is dividing all the word in a dicionairy into for example 147 catagories (3-150Pixles) instead of say 40 (1-40 letters). Now you do the math from here.
          • by nacturation ( 646836 ) <nacturation AT gmail DOT com> on Sunday May 16, 2004 @03:59PM (#9168590) Journal
            If you're talking about encrypted text, then your point is very valid. However, for English words you can get a much better result by using a dictionary to limit the number of words that fit the pattern.

            How many 5 letter words are there in the English language? According to /usr/share/dict/words, there's 9987 words, from aalii to zymin. Compare that to how many combinations of letters add up to 60 pixels? If the letter "i" is 4 pixels -- 3 pixels for the letter, one pixel space after it -- then you *could* guess that the word is "iiiiiiiiiiiiiii". In fact, there's a hell of a lot more possibilities doing it the pixel way, but you can reduce this down by using a dictionary. "iiiiiiiiiiiiiii" isn't in the dictionary. You can also reject outright words that have impossible letter combinations. Three of any letter in a row can be rejected, Q followed by X can be rejected, etc. The rest you do a dictionary lookup to see if they exist.

            It'd be an interesting exercise to perform. Luckily for the researcher, the word preceding the blacked out word was "an", which implies that the next word starts with a vowel. So that narrowed it down to only 7 potential words based on pixel length and dictionary lookup, and the one that seemed to work best was Egyptian. However, if all you knew was that it was an 8 letter word beginning with a vowel... you'd be looking at 6089 possibilities (again, according to /usr/share/dict/words and grep).
      • That's the point. :-)
    • Parent is confused (Score:3, Informative)

      by moronga ( 323123 )
      A fixed-width font (like courier) uses the same width for all characters. A document printed in a fixed-width font would make the process easier, because you would know with certainty how many letters fit into a black box.

      If you read the article, the seven words that were found to be a possible fit range from seven to ten characters, implying that the document was printed in a variable width font.
    • You mean like Courier New 12 [slashdot.org] :)
      Guess that change wasn't such abright idea after all!
    • Re:obvious solution (Score:3, Interesting)

      by grondin ( 241140 )
      Exactly wrong. What is needed is RANDOM width fonts.

      The folks at typografica suggest the "ransom note" [typographi.ca] type fonts
    • Re:obvious solution (Score:3, Informative)

      by LS ( 57954 )
      Don't you think that if they had the insight to convert a censored document to fixed width, that they would also make all the blacked-out spaces of the same length, and give NO information to potential cryptographers?

      LS
  • The first task is to identify the font, and font size the missing word was written in. Once that is done, the dictionary search begins for words that fit the space, plus or minus three pixels

    hmm, maybe this is wakeup call for govt. to maybe use a variable font size and spacing in classified documents. not sure how often they use 'blacking out' though.

    • Re:wake-up call (Score:5, Interesting)

      by LostCluster ( 625375 ) * on Sunday May 16, 2004 @12:09PM (#9167336)
      The other way to get around this problem would be to do the blackouts against a digital version of the document, so that the words are all replaced with blocks of equal size without revealing any information about how long the oriignal words were.
    • by b!arg ( 622192 )
      Before they release it they should convert the blacked out parts to 1337 speak...
    • MEMORANDUM TO ALL EMPLOYEES OF THE DHS AND US INTELLIGENCE SERVICES:

      So as to counter the terrorists' latest methods for conducting espionage against our great nation, all official documents will now be composed in a combination of Wingdings 3 [identifont.com] and MS Comic Sans [help4web.net].

      Sincerely, The Management
  • old news (Score:4, Informative)

    by Swen Swen ( 778337 ) on Sunday May 16, 2004 @12:05PM (#9167306)
    The Monde (famous French newspaper) published an article [lemonde.fr] on the story a few days ago. An English translation can be found here [infosecwest.com].
    • Isn't calling it "The Monde" like calling USA Today "Etats-Unis Today," it is inappropriate to translate the title of a newspaper like that. Call it by its real name "Le Monde," which of course means "The World." But "The World" is not the title of the newspaper.
  • by LostCluster ( 625375 ) * on Sunday May 16, 2004 @12:07PM (#9167316)
    The student didn't actually solve for any real US secrets, because the documents she was using were already declassified. However, as an academic exercise this demonstrates that there's still information being conveyed in the typical black-out way of "redacting" certain words from documents.

    And, since the information was known, we're sure that she did come up with the correct solutions.
    • Did anyone honestly believe that blacking out certain words was a reliable method of withholding names or information? I think it is usually just done to discourage the discovery of such things, but not with the thought that it is 100% secure. Even the human eye can often figure out what was blacked out.
  • Perfect. (Score:4, Insightful)

    by NegativeK ( 547688 ) <tekarien@hotmail.cOPENBSDom minus bsd> on Sunday May 16, 2004 @12:10PM (#9167346) Homepage
    This is a classic example of security through obscurity.. And how it fails miserably.
  • by doria13 ( 779114 ) on Sunday May 16, 2004 @12:13PM (#9167370) Homepage Journal

    Perhaps the US government should start using text message lingo in their memos.

    "An Egyptian Islamic Jihad (EIJ) operative told an Egyptian srvic @ d sAm tym dat bn l@n wz plnin 2 exploit d operatives acces 2 d us 2 mount a terrorist strike"*

    Could make decoding sensitive documents much more difficult and at the same time provide jobs for teenage cryptologists.

    *lingo courtesy of transl8it.com [transl8it.com]

  • by RoTNCoRE ( 744518 ) on Sunday May 16, 2004 @12:15PM (#9167377) Homepage
    Change the length of the blacked out portion to some standard generic length to avoid disclosing the word length? Then you could only use context.

    Or if you wanted to be really sneaky, randomize the length of the blacked out box, to spur wild goose chases.
  • by eddy ( 18759 )

    Anyone know if there's a paper on this? This news came up on another site a couple of days ago, but they didn't even mention the researchers name, only implied it was presented at EuroCrypt'2004 [ibm.com] in Switzerland. I looked though the list of accepted papers, but nothing stood out.

    A search on IACR will give a single hit [iacr.org] on the author, but it isn't this report/paper/work.

  • by clonan ( 64380 ) on Sunday May 16, 2004 @12:22PM (#9167420)
    So...can they now tell us how REALLY killed Kennedy?
  • by Anonymous Coward on Sunday May 16, 2004 @12:25PM (#9167448)
    Nuculer.

    Dictionary-based approaches seem to miss this one for some reason.
  • More examples (Score:5, Interesting)

    by broothal ( 186066 ) <christian@fabel.dk> on Sunday May 16, 2004 @12:25PM (#9167449) Homepage Journal
    If people knew how easy it was to "break" simple means of protection, we'd see far less in the media.

    If you film a person in backlight, his face will be dark when you see him on tv. Cranking up the contrast (in most cases, just the contrast on the tv will do) shows the face clearly.

    If you pixelize the face of a person, he's not recognizeable. But unless he stands completely still, his movements will give enough info to calculate the originating pixels after a couple of minutes.

    If you apply a standard mixer filter to a persons voice, it sounds dark and unrecognizeable... Until you run the reverse algorithm.

    If you black out sentences with a marker, it's often just a question of holding the paper up agains the light to read it.

    I never understood this behaviour anyway. Why show a person on TV that obvoiously not want to be recoznized (however carefully concealed by the production)?

    As for documents - I'm pretty sure most documents are available electronically. Why not just delete the stuff you don't want people to see?
    • Not hard to understand. TV is 100% entertainment (yes, even the evening news). Watching a fuzzed out image of someone with a messed up voice is very dramatic.. "Look what lengths they've gone to to protect the indentity of this person.. what he's saying must be very juicy.."
    • Why show a person on TV that obvoiously not want to be recoznized (however carefully concealed by the production)?


      That one is easy. It is always better to display the subejct of the story, like the raped in a rape case etc. than to obectify them. The story becomes more interesting and draws a larger crowd. It is also a part of the evidence chain, like interviewing a witness or other part. The viewer get closer to the story, even though i is just a large blob with a strange voice.
    • Re:More examples (Score:3, Insightful)

      by Kjella ( 173770 )
      If you pixelize the face of a person, he's not recognizeable. But unless he stands completely still, his movements will give enough info to calculate the originating pixels after a couple of minutes.

      You have an example of this? something tells me you'll have a very hard time identifying changes in pixelation, like if you took a photograph and moved behind a pixelation mask, and changes in the image itself like lips moving, eyes blinking, turning (X-axis)/lowering & rising (Y-axis)/rolling (Z-axis) his
    • Another great one that's been in the news lately is doing redaction by drawing black squares over the top of words in a PDF document. The words are still there beneath the black rectangles, sort of like redacting a paper document by using electrical tape. =P
  • secrets indeed (Score:3, Interesting)

    by trs9000 ( 73898 ) <trs9000@gmail . c om> on Sunday May 16, 2004 @12:30PM (#9167484)
    quote of memo to bush from the article:
    "An Egyptian Islamic Jihad (EIJ) operative told an XXXXXXXX service at the same time that Bin Ladin was planning to exploit the operative's access to the US to mount a terrorist strike."
    and from the article itself:
    "This eliminated all but seven words: Ugandan, Ukrainian, Egyptian, uninvited, incursive, indebted and unofficial. Naccache plumped for Egyptian, in this case."

    AH-HAH!
    so an egyptian operative told an *egyptian* service....
    man this is some tricky work! uncovering covert secrets for sure!

    seriously though the technique is pretty awesome
  • .. already out there now. Like I'd like to see a lot of the Black Vault's thousands of documents translated, just for one interesting example, one of many. woo hoo this is cool!
  • And what if.. (Score:4, Insightful)

    by EdMcMan ( 70171 ) <moo.slashdot2.z.edmcman@xoxy.net> on Sunday May 16, 2004 @12:38PM (#9167522) Homepage Journal
    What if the blacked out word is not in the dictionary? Most of these blacked out things are very likely names or places, things that could not be so easily brainstormed or listed.
  • by Black Rabbit ( 236299 ) on Sunday May 16, 2004 @12:43PM (#9167556)
    ...would have been if the censored bits were revealed by running the document through the spelling and grammar check in Word!

  • ----- Post! (Score:3, Funny)

    by jdkane ( 588293 ) on Sunday May 16, 2004 @12:51PM (#9167605)
    they won't know to mod this down
  • Source? (Score:3, Funny)

    by beforewisdom ( 729725 ) on Sunday May 16, 2004 @01:10PM (#9167733)
    She didn't get these US Military secrects off of a BDSM site with pictures of women dragging men around on a leash did she?

    Ooops, never mind

    Steve
  • One solution (Score:3, Interesting)

    by PsiPsiStar ( 95676 ) on Sunday May 16, 2004 @01:53PM (#9167973)
    One way to solve this problem, of course, is to develop a font that constantly varies the size and type so that your document ends up looking like a ransom that's been clipped and pasted from a newspaper.

    One nice thing about being paranoid, you're never bored.
  • Solutions . . . (Score:5, Insightful)

    by Dausha ( 546002 ) on Sunday May 16, 2004 @02:02PM (#9168023) Homepage
    Well, there are two solutions to this method of cracking. The first is never release classified documents. However, this does not work well in a free and open society.

    Nowdays, most, if not all, classified documents are created electronically. Perhaps the source document should be kept in an archive. When it is declassified, they just delete the text needed to lower the classification, or maybe replace the text with a few '#' to show were text was missing (but never a one-for-one character replacement). Then the released document is a little harder to crack.
  • by cerberusss ( 660701 ) on Sunday May 16, 2004 @02:10PM (#9168053) Journal
    Has anyone any photos of this geek girl? Yes, I tried Google images, but I don't think she looks like a puppy [google.com].

  • The Memo Went like this:

    URGENT: MSG from GEORGE W. BUSH
    TO: JOINT CHIEFS OF STAFF

    1. ATTACK IRAQ
    2. ____???____
    3. PROFIT!!!

    Claire has finally revealed the second step!

    Read the article to find out.
  • by Anonymous Coward on Sunday May 16, 2004 @02:27PM (#9168147)
    Sorry, but in at least one of the cited examples, the methodology used requires an assumption that is false.

    The proposed method depends on the calculated length of the missing word(s).

    I believe that the "memo to George Bush" is the now infamous PDB of 8/6/03 (it was released in a PDF format). In this, the actual letters in the missing words were changed to nonsense characters (including non-alphanumeric symbiols) before the black box was drawn in. So the spaced taken up by the "redactions" have nothing to do (except by chance) with the length of the original words.

    Sorry. Try again.
  • by t_allardyce ( 48447 ) on Sunday May 16, 2004 @02:43PM (#9168220) Journal
    Its ok we can solve this by arresting the student and banning any software that does this. Just like we solved the Iraqi abuse problem by taking their cameras away, and how we solved the Berg murder by making sure no news outlet would publish or link to the video, and how we solved the terrorists hi-jacking planes and crashing them problem with iris and finger scanning, (so now they can still get on the plane, but when they've crashed it we will know who did it and not to let them on next time). Or maybe its more like how the CD copy-protection system being defeated by the shift-key problem was defeated by threatening the student under the DMCA! or could it be how the drug problem was totally solved by throwing half the population in jail? [insert something about DRM solving everything and letting governments send sensitive documents in full without having to worry about someone reading the bad words] great, so i guess we can bomb for peace and fuck for virginity after all :)
  • by bbagnall ( 608125 ) on Sunday May 16, 2004 @04:13PM (#9168647) Homepage
    The title of this article sounds impressive, but the results are wishy-washy. It can only narrow down one missing word to maybe half a dozen possibilities. Who is to say the word is not North Korea instead of South Korea? And since most blackouts are several words long, it is not useful at all.
  • Silly question... (Score:4, Interesting)

    by NeuroManson ( 214835 ) on Sunday May 16, 2004 @04:18PM (#9168672) Homepage
    Are those documents redacted in the final photocopy, or are they redacted by hand (very expensive, but they're spending our money, after all)?

    There are two simple solutions that go beyond and below high technology.

    Unless they crank down the brightness as far as possible, most photocopiers put down a varying amount of toner to paper. A cloth soaked in, say, spirit solvents, when wiped across the page, will expose part, if not all, of the text. Similarly, this can be done with most magic marker inks.
  • Example from Chile (Score:3, Interesting)

    by stanwirth ( 621074 ) on Sunday May 16, 2004 @06:05PM (#9169234)

    During the reign of Pinochet, writer Ariel Dorfman used to convey the extent of the official censorship of his articles by incorporating the censored sections as blacked-out text and photos, with the understanding that people could fill in the blanks for themselves based on the surrounding text, knowing where the blanks were.

    What's left out is as significant as what is included.

I tell them to turn to the study of mathematics, for it is only there that they might escape the lusts of the flesh. -- Thomas Mann, "The Magic Mountain"

Working...