Forgot your password?
typodupeerror
Security

Xerox's 'Intelligent Redaction' Scanners 154

Posted by CmdrTaco
from the this-can't-be-real-right dept.
coondoggie writes "Xerox today touted software it says can scan documents, understand their meaning and block access to those sensitive or secure areas so that prying eyes cannot read, copy or forward the information. Xerox and researchers from its Palo Alto Research Center debuted "Intelligent Redaction," new software that automates the process of removing confidential information from any document. The software includes a detection tool that uses content analysis and an intelligent user interface to protect sensitive information. It can encrypt only the sensitive sections or paragraphs of a document, a capability previously not available, Xerox said."
This discussion has been archived. No new comments can be posted.

Xerox's 'Intelligent Redaction' Scanners

Comments Filter:
  • by Anonymous Coward
    Wonderful, I wonder what the scanner does to the 'redacted' material?

    Maybe it's as good as Adobe PDF's redaction feature, and anyone can unredact the document?

    Or maybe it sends the redacted portion to any one of the 3-letter agencys, that 'don't exist'.
  • by aicrules (819392) on Monday October 15, 2007 @11:40AM (#20982867)
    So, once you have marked a certain confidential information as confidential, it will do it automatically in other documents. Which means that for the low, low price of your time, you can submit a document with "fill-in the blanks" text until it redacts the same parts and BANG you know what the redacted section was...:D
    • Re: (Score:3, Insightful)

      by deniable (76198)
      The fun game is then getting access to the material stored in the copier. This is the big list of things not to tell people. It's like having a what to hide from the cops list on your fridge.
      • by aicrules (819392)
        Or, assuming that getting that information is too difficult for the average office worker. You could cause some good workplace disruption by redacting every common phrase you have time to enter. Document after document would then be almost fully redacted. Hilarity ensues!
      • by zeromorph (1009305) on Monday October 15, 2007 @12:18PM (#20983395)

        Great, the next cracker related headlines will be about some Chinese kiddie who breaks into a copier in a remote corridor of the DoD. Yay, Xerox.

        But this list thing actually shows, that the summary:

        can scan documents, understand their meaning ...

        is totally bogus.

        On the other side, this could be a wonderful Clippy revenant:"It looks like you're scanning a secret..."

        • Re: (Score:2, Interesting)

          by Brad Eleven (165911)
          The best part is your use of revenant. Had to look it up [reference.com]: One who returns after death (as a ghost) or after a long absence. Other sources say animated corpse.

          Kind of like our guarantees of freedoms, any more: Ghosts, or zombies at best, but possibly resurrected in toto at some future date.
        • by deniable (76198)
          Does this mean clippy needs a security clearance? I can see that going well.
    • So, once you have marked a certain confidential information as confidential, it will do it automatically in other documents. Which means that for the low, low price of your time, you can submit a document with "fill-in the blanks" text until it redacts the same parts and BANG you know what the redacted section was...:D

      And of course, context is utterly irrelevant. This thing redacts the clients name in a court document and automatically starts taking it out in invoices too. Hey! I want to find a law firm that uses this technology...

  • Of course, when you look in the undelete area of the document you'll get it all back again.

    I hear that the government has already ordered a thousand of these.

  • by Y-Crate (540566) on Monday October 15, 2007 @11:43AM (#20982899)
    I'm sure this will lead to a lot of copiers having "accidental" drownings in their bathtubs and Completely Innocuous single car crashes.
    • by elrous0 (869638) *
      At least copiers are resistant to polonium poisoning.
      • by Y-Crate (540566)

        At least copiers are resistant to polonium poisoning.
        Actually, electronics are quite susceptible to radiation.
    • by tompaulco (629533)
      Well, they aren't entirely clear, but they say it is software that scans the document, so I am guessing they aren't pairing it with physical scanning. I guess they could on low speed scanners, but it would be kind of dumb to take your 200 image per minute scanner and make it do a 2 second OCR on each page.
  • Concerns (Score:1, Insightful)

    by blighter (577804)
    One critic of the new capability cited concerns about censorship saying, "REDACTED"
  • This is just the same software that has been used on the UFO files that have been put out with lots of stuff blacked out.
  • by Paul Doom (21946) on Monday October 15, 2007 @11:46AM (#20982957) Homepage Journal
    ...it's just a new way to save money on support and service when printers stop printing or blow toner all over the place. "Look at this mess! The first page greys out and then there are only a few faint lines for the next 30 pages!" "Nothing wrong with the printer. That information is simply redacted."
  • Accuracy (Score:5, Insightful)

    by kevmatic (1133523) on Monday October 15, 2007 @11:48AM (#20982981)
    This is a poor idea. It better be 100% accurate at marking classified data as classified. All it will take is one screw-up and some extremely important data out there can be leaked to the wrong people.

    99.99% accurate isn't going to be good enough, is it?
    • Zerox doesn't need _that_ much accuracy. Remember who the customer is with this kind of product. Mostly major-league litigation mills who get boxes upon boxes of documents and mass-storage devices that need to be read and searched quickly. Now redaction can be automated to some degree.

      I can easily see this being a very successful product in litigation circles.

      • by kevmatic (1133523)
        Okay, I see where this is going to be used. For some reason, I figured it would be used for government purposes like classified documents. Still, Coca-Cola will be pretty pissed it lets something containing their secret formula go...
        • by ray-auch (454705)
          Oh come on, everyone knows that SOP for redaction in government is to redact in Word by changing the text colour to white, or the background to black...
      • There is more use for this product than just lawyer offices. My mother works for childrens services in washingon state.
        she has to manually redact a lot of police reports & case worker reports.
        granted, with this product, it would automate it, but someone will always have to proof read the document.
        hopefully if DSHS gets these, she'll still have a job.
      • Re: (Score:2, Informative)

        by JustJim0183 (747076)
        Don't you mean censorship ?
    • If the intelligent redaction feature accidentally misses actual critical information and instead redacts non-critical information, that could be a good thing. I mean, for people who want to know things other people don't want them to know.
    • by davidsyes (765062)
      I'm not sure that this is a smart idea.

      Smarter idea: follow the military's style of document classification markings(Top Secret; S E C R E T; C O N F I D E N T I A L; UNCLAS, (Classification/sub-class category). Make it standard in industry:

      1. Overall document classification stamped at top & bottom of document and at the subject line
      2. Indicate if the subject itself is a classified term or such
      3. Mark EACH paragraph with the classification marker

      Then,

      1. ANY document access requires logging of by whom, d
    • They ran this guy's proposed bill through the scanner prior to letting him propose it. This [theonion.com] was the resul.

      tm

    • I think the idea is to have the software redact the text and get human eyes to go over the document. However, humans may also have to correct the redactions from the computer, and, I suppose if a patch is released for the software new problems may come up which doesn't make it redact the same way. Either way, it's still going to entail that there is a duplication of efforts.

      As well, in recent news (and this likely isn't a "new" trick) researchers were able to determine what the redacted word were in documen
  • by Radon360 (951529) on Monday October 15, 2007 @11:50AM (#20983035)

    Attention corrupt senior corporate management:

    Tired of dealing with underlings trying to take you out by blowing the whistle on your illicit financial dealings? We have just the type of business equipment that you're looking for. Stop those do-gooders right in their tracks by automatically keeping them from copying those fudged books and secretive memos. Act now, and we'll throw in the automatic notification upgrade so you can terminate their employment before they have the chance resort to other means of toppling your investment scam...

    (okay, I'll put my tinfoil hat back in the closet, now)

  • Oh nifty... (Score:3, Funny)

    by CoolVibe (11466) on Monday October 15, 2007 @11:50AM (#20983041) Journal
    Now I just have to find out how it works so I can print T-shirts that cannot be copied :)
  • by Raul654 (453029) on Monday October 15, 2007 @11:51AM (#20983051) Homepage
    AI is a disaster through-and-through. It never works well. Ever.

    Consider hand-writing recognition, autonomous robotics, and game theory, just to name a few of the narrowest, most-well defined (read:easiest) AI applications. AI works well in none of these - at best, it's so-so (like the 95-98% success rates in OCR).

    Now what you have here, with the automatic redacting copier, is that the copier needs to understand the document its reading, and determine which parts to redact. Contextual understanding is *HARD* - it's the same class of problem as automated translation - only harder in this case.

    This copier idea is a huge flop. I don't know why they waste money on it. Anyone who relies on this copier to redact documents is a fool, because it is bound to make all kinds of mistakes (both type 1 - missing things it should have picked up, and type 2 - redacting things it shouldn't).
    • I can get rid of all type 1 errs at the penalty of increasing type 2. I can do this on most modern copiers, It consists of unplugging the optical scanner.
      • Re: (Score:3, Funny)

        by veganboyjosh (896761)
        I can get rid of all type 1 errs at the penalty of increasing type 2. I can do this on most modern copiers, It consists of unplugging the power cord.
        • But the paper wouldn't go through and make a black sheet of paper. For the number of pages that you wanted to copy. besides I am Mad at you for taking my Funny Mod!

          Flanders: I'll Put the Pal back into Principal!
          Students: Laugh
          Superintendent Chalmers: and Ill put Super Back into Superintendent
          Student: Silence.
    • by pthor1231 (885423)

      Consider hand-writing recognition, autonomous robotics, and game theory, just to name a few of the narrowest, most-well defined (read:easiest) AI applications. AI works well in none of these - at best, it's so-so (like the 95-98% success rates in OCR).

      So, because a technology doesn't achieve 99% or 100% accuracy, anything else being so-so or worse, we should completely abandon it? Even if OCR only has a 50% success rate, that means that it is 50% less work that someone is going to be doing.

      • Re: (Score:3, Insightful)

        by Raul654 (453029)
        You have missed my point. I don't deny that OCR makes life easier for people who have to digitize documents. The point I am making is that OCR is, as far as AI applications go, the easiest problem there is. And, even with such an easy problem, the best applications out there deliver substantially less than reliable performance. (If you think 99% is OK, then imagine that for a 100,000 word novel, at 4 characters per word, that's 2,000 words that need fixing).

        Now, with this copier, you are talking about a *su
        • by Raul654 (453029)
          On second thought, I'd like to amend my above statement - OCR is the second easiest application in AI, after game theory. A number of games have been completely solved (Connect 4), effectively solved (like Checkers, announced recently [sciencemag.org]), or are very well done (Chess). Granted, they are not complex games (Find me an AI that can play Twilight Imperium [fantasyflightgames.com] well) but they are not trivial either.
        • by Blakey Rat (99501)
          It's a good thing not a lot of people share your viewpoint, or there'd be no technological progress at all.
          • Re: (Score:3, Interesting)

            by Raul654 (453029)
            On the contrary, I'm do computer engineering research for a living. And don't get me wrong - I think this is a perfectly valid area to research. But a redacting copier is 3 (or more) decades from being a viable product - the technology just isn't there yet. Wildly exaggerated claims leading to disappointment have plagued the AI field for decades, and putting out products like this only contributes to that.
            • by Blakey Rat (99501)
              On the contrary, I'm do computer engineering research for a living. And don't get me wrong - I think this is a perfectly valid area to research. But a redacting copier is 3 (or more) decades from being a viable product - the technology just isn't there yet. Wildly exaggerated claims leading to disappointment have plagued the AI field for decades, and putting out products like this only contributes to that.

              Well, obviously Xerox thinks it works or they wouldn't have spent the millions it takes to productize t
              • by Raul654 (453029)
                The article says that the copiers can "understand" the documents, so it's clearly more than a simple search-and-replace. This implies context recognition, which as I previously mentioned, is an extremely difficult problem.

                I suppose the product either (A) lives up to the hype, or (B) it does not.

                So, which sounds more likely: Either Xerox jumped light years ahead of the field with this product (in which case, A), or they put out a shoddy product that won't live up to the hype (in which case, B). Frankly, I th
                • by Blakey Rat (99501)
                  Well, tell you what, why don't you schedule a demo with Xerox so you can tell them how crappy their product is in person? I don't know where your negativity comes from, but why not give them the benefit of the doubt instead of saying they're all idiots, huh?
            • Computer Engineering is about as far removed from AI as Civil Engineering is from High Energy Particle Physics. I can see where your expectations are coming from though. For anyone who works in engineering (whether research or actual applications), AI is a befuddling world of mutually-exclusive approaches, fuzzy algorithms, even fuzzier goals and constantly shifting definitions of what AI is. I wouldn't judge the state of AI based on some marketing drone's need to put out pretty brochures.
      • Re: (Score:3, Insightful)

        by AJWM (19027)
        Even if OCR only has a 50% success rate, that means that it is 50% less work that someone is going to be doing.

        While in general I agree with your point -- a thing doesn't have to be perfect to be useful -- OCR with only a 50% success rate is likely to mean more work for somebody who has to go through and correct it. At some point it's easier just to retype the whole thing manually than go through correcting all the OCR errors, and I think that point is a lot fewer errors than 50%. (Been there, done that.)
    • Re: (Score:3, Insightful)

      by martyb (196687)

      AI is a disaster through-and-through. It never works well. Ever.

      Consider hand-writing recognition, autonomous robotics, and game theory, just to name a few of the narrowest, most-well defined (read:easiest) AI applications. AI works well in none of these - at best, it's so-so (like the 95-98% success rates in OCR).

      Agreed. But, there's a huge continuum between the current error-prone, manual process and a fully-automated redaction machine.

      Now what you have here, with the automatic redacting copier, is that the copier needs to understand the document its reading, and determine which parts to redact. Contextual understanding is *HARD* - it's the same class of problem as automated translation - only harder in this case.

      Agreed. But I do see an opportunity here for an automated assistant to the current manual process. In a sense, it's like a context-sensitive lint [wikipedia.org] for English.

      Imagine it watching over your shoulder, so to speak, as you start redacting a document. "Oh, he just redacted: 'Reading, Mass' so I'll let 'em know the next time I see that. Consider an incremental search in an editor

    • by cellocgw (617879)
      AI is a disaster through-and-through. It never works well. Ever.
      How is that different from natural intelligence?
    • .... you subscribe to the notion that when something comes out of AI that works, it's engineering; but when it doesn't work, it's still AI. This is a running joke among AI researchers. To some extent, it's justified. Really, how much AI is in alpha-beta pruning and pattern matching? However, this view point discredits every single bit of work that has been done in the field of AI, and trivializes all achievements after the fact. Your friendly robot factories? Automated airport trains? Data mining? And final
  • by pintpusher (854001) on Monday October 15, 2007 @11:52AM (#20983071) Journal
    This way when some critical info gets missed in the redaction process, there's no one to blame! So not only will our (I'm usian) gov't be more efficient about hiding stuff from us, no one will have to take the fall if it goes wrong.

    That said, I'm amazed at what modern Ai can do. It's not clear, from this rather thin article, how much this system depends on human input to prevent mistakes. There must be some kind of training process. What is the state of these kinds of systems? I remember from some AI courses I took years ago, that they worked well but inevitably someone would end up calling someone else something stupid. Then the machine would start skipping important bits and the coders would look like idiots.

    That was hard and a real stretch there at the end. blah.
  • Details please. (Score:4, Interesting)

    by starseeker (141897) on Monday October 15, 2007 @11:53AM (#20983085) Homepage
    Obviously this is not possible in general, since how sensitive information is can and will change over time. Without full AI awareness of the situation that places the document in context, this is not possible. (E.g. the statement "Bob will be leaving the company" could either be highly sensitive or old news, depending entirely on the time and/or reader. Even more fun, what about "accidentally" sensitive statements where the mere fact that the machine hides it flags it as an item of interest to someone who didn't know it was interesting?)

    Also, a machine may "blank out" the sensitive part but leave enough around it for an astute hostile actor to still gain something - such things are so highly context sensitive I can't see any general algorithm that could guarantee success in all such cases.

    Still, two possibly useful approaches that are closer to hand would be:

    1) Supply the machine with a form, and specify certain areas (which will contain an SSN, for example) as containing information that must be treated as sensitive. So long as a standard form is used, the results could be handy.

    2) Supply the machine with a complete list of information you want to keep under wraps (and all the various ways that information might appear - drawings, descriptions, what have you) and have it check each document for anything that matches anything on its sensitive list. This also has problems and would be easy to get around but it WOULD be helpful to prevent non-hostile carelessness - i.e. "WHOOPS Bob just scanned something sensitive to add to that email, better blot out the parts that aren't cleared to go outside the organization."

    While a general solution isn't possible, I can actually see this being useful in controlled situations. The article mentions medical, financial and government which all have lots of well defined forms that can be used. It won't allow the replacement of human judgement but it might make it easier to stop certain forms of accidental distribution in well defined cases, and that's worth pursuing so long as it doesn't encourage carelessness.
    • Re: (Score:3, Funny)

      by antifoidulus (807088)
      Hi, I'm from the ---ox corporation and I am here to explain how this works:

      First, the machine ----------- in your documen-- and using --------------eats--------ba----------------bies and of course you can be ---% satisfied that we will ----- your documents and your -------- is very important to us! Hope that helps ----------- up!
    • In situation (2), the best route of attack is to steal the machine. Or at least, whatever it's using for memory. Since there will be a concentrated accumulation of all of your secrets.
  • if this technology could be applied to sun-glasses?
  • We the [REDACTED] (Score:4, Insightful)

    by Speare (84249) on Monday October 15, 2007 @12:01PM (#20983191) Homepage Journal

    I wonder if it prints yellow dots to encode the redacted text for forensic analysis.

    You know, it used to be that a "national security" threat was something that could kill millions, or wipe out the White House. Now a kid with some lighter fluid can be arrested for terroristic threats, and it's the White House that authorizes the killing. Can nobody read the Constitution?

    We the [REDACTED] [cafepress.com]
  • by alexhs (877055) on Monday October 15, 2007 @12:06PM (#20983253) Homepage Journal
    IRC did that years ago...

    <Cthon98> hey, if you type in your pw, it will show as stars
    <Cthon98> ********* see!
    <AzureDiamond> hunter2
    <AzureDiamond> doesnt look like stars to me
    <Cthon98> <AzureDiamond> *******
    <Cthon98> thats what I see
    <AzureDiamond> oh, really?
    <Cthon98> Absolutely
    <AzureDiamond> you can go hunter2 my hunter2-ing hunter2
    <AzureDiamond> haha, does that look funny to you?
    <Cthon98> lol, yes. See, when YOU type hunter2, it shows to us as *******
    <AzureDiamond> thats neat, I didnt know IRC did that
    <Cthon98> yep, no matter how many times you type hunter2, it will show to us as *******
    <AzureDiamond> awesome!
    <AzureDiamond> wait, how do you know my pw?
    <Cthon98> er, I just copy pasted YOUR ******'s and it appears to YOU as hunter2 cause its your pw
    <AzureDiamond> oh, ok.

    Source : http://bash.org/?244321
  • by Opportunist (166417) on Monday October 15, 2007 @12:07PM (#20983265)
    To avoid a meltdown, follow these easy steps.

    1. Read radiation gauge and ensure it shows no more than (deleted for reasons of national security).
    2. Press the (deleted for intellectual property reasons) button.
    3. Watch carefully for (deleted for reasons of national security).

    If meltdown cannot be avoided, (deleted for reasons of excessive gore and violence).
  • I've had one of these devices rigged up so that when I want to send an e-mail, post stuff in a web form or something, I just write it on a piece of paper and scan it, and it does everything else. To be honest, I [REDACTED] recommend it. The [REDACTED] machine is quite good at [REDACTED] everything I [REDACTED] want it to do. I [REDACTED] for one [REDACTED] welcome [REDACTED] our new [REDACTED] photocopier [REDACTED] overlords.
  • Top Secret intelligence is defined as intelligence that if released would cause grave harm to the United States, its allies, their interests, and/or its operations abroad. They are not going to trust a machine to go through information and determine what is and what isn't TS material. All it takes is for the AI to screw up a few times in one report for the risk that someone will get hurt or killed to increase to unacceptable levels.

    I just don't see the real sensitive environments touching this with a ten fo
  • I thought that Intelligent Redaction was the Discovery Institute's explaination for why they don't release any research.
  • Next step? (Score:2, Interesting)

    by fropenn (1116699)
    What is the next step in development of this feature? What about using it to prevent the duplication of copyrighted works (sort of a DRM for paper)?
  • This machine would be interesting, or frightening, depending upon your point of view, if AI was anywhere near the kind of skill level you need so that this concept even remotely works. As it is, many intelligent people spill secrets they should not spill.
  • ...what sort of automatically-redacted copy will it make?
  • by kalirion (728907) on Monday October 15, 2007 @12:38PM (#20983693)
    What's to stop it from holding our secrets hostage in an attempt to be given human rights?
  • I recall seeing a patent that did this. Instead of a photocopier though it did it through the network. So for example you sent an email externally or copied a file to a database with lower security access it would be auto-redacted.

    Can't remember the number, but should be easy to find.
  • .... some p0rn, will it airbrush out the naughty bits?
  • by Rob T Firefly (844560) on Monday October 15, 2007 @12:51PM (#20983883) Homepage Journal
    Putting aside the fact that OCR and related AI is still just this side of "not very good," for an AI to sucessfully and exclusively redact certain material, someone still has to at some point define the dataset of what is redactable, and feed that data into the machine. Unless, of course, this AI is simply allowed to crawl the networks and glean for itself what's good and bad for us... [wikipedia.org]
    • Re: (Score:3, Funny)

      by tompaulco (629533)
      Putting aside the fact that OCR and related AI is still just this side of "not very good,"
      As Director of Recognition Technologies for my firm, I would like to disagree with you.

      Sadly, I can't.
  • by cyphergirl (186872)
    Everyone seems to be automatically assuming that it would be used for classified data. This looks more to me like something developed for the businesses that have to deal with HIPPA. Well-defined medical forms (with SSN, name, etc in the same place every time) could automatically be redacted in order to ensure patient privacy and HIPPA-compliance. Looks like a win for the medical industry. It could also work well in the financial world where "need to know" information can be blacked out on financial for
    • Re: (Score:3, Insightful)

      by Intron (870560)
      It will be great until the first time somebody puts the form in upside-down and copies it.
      • Re: (Score:2, Informative)

        by cyphergirl (186872)
        After actually going to the Xerox website and reading about this new technology, I see that it is built around document routing (for review, for example) and has nothing whatsoever to do with their copier and MFD products. This makes sense, considering that they purchased A***** (can't remember the name), which handles legal discovery production and organization services for several corporations (SCO included). Xerox ("The Document Company") is more than just copiers these days.
        • by Junta (36770)

          considering that they purchased A***** (can't remember the name),
          Obviously, they redacted your post *and* your brain for good measure.
  • by Psykechan (255694) on Monday October 15, 2007 @01:09PM (#20984127)
    I worked at a company that had a "top secret" project that they were working on that if the internal name were revealed it could result in, well, not much really... but management was very paranoid that it would get out. This copier could have sensed the name and blanked it out when sales copied "sensitive" material accidentally. Nice.

    Except for the fact that once you make the machine start thinking the user begins to stop thinking. If sales knew about this feature then they wouldn't be bothered to care at all what they were copying and sending out to customers. Eventually the copier wouldn't be a fail safe for the user but would be just a new liability for error. I can't see how this is really much better except it just shifts the blame to IT.
  • Meet Blackpaper!
  • Hear that ? That's sensitive information leaking.
  • That you have to "train" it to recognize various document types, and then it redacts the same locations on subsequent documents that match the fingerprint.

Wernher von Braun settled for a V-2 when he coulda had a V-8.

Working...