Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Security

Xerox's 'Intelligent Redaction' Scanners 154

coondoggie writes "Xerox today touted software it says can scan documents, understand their meaning and block access to those sensitive or secure areas so that prying eyes cannot read, copy or forward the information. Xerox and researchers from its Palo Alto Research Center debuted "Intelligent Redaction," new software that automates the process of removing confidential information from any document. The software includes a detection tool that uses content analysis and an intelligent user interface to protect sensitive information. It can encrypt only the sensitive sections or paragraphs of a document, a capability previously not available, Xerox said."
This discussion has been archived. No new comments can be posted.

Xerox's 'Intelligent Redaction' Scanners

Comments Filter:
  • Concerns (Score:1, Insightful)

    by blighter ( 577804 ) on Monday October 15, 2007 @11:44AM (#20982931)
    One critic of the new capability cited concerns about censorship saying, "REDACTED"
  • Accuracy (Score:5, Insightful)

    by kevmatic ( 1133523 ) on Monday October 15, 2007 @11:48AM (#20982981)
    This is a poor idea. It better be 100% accurate at marking classified data as classified. All it will take is one screw-up and some extremely important data out there can be leaked to the wrong people.

    99.99% accurate isn't going to be good enough, is it?
  • by Radon360 ( 951529 ) on Monday October 15, 2007 @11:50AM (#20983035)

    Attention corrupt senior corporate management:

    Tired of dealing with underlings trying to take you out by blowing the whistle on your illicit financial dealings? We have just the type of business equipment that you're looking for. Stop those do-gooders right in their tracks by automatically keeping them from copying those fudged books and secretive memos. Act now, and we'll throw in the automatic notification upgrade so you can terminate their employment before they have the chance resort to other means of toppling your investment scam...

    (okay, I'll put my tinfoil hat back in the closet, now)

  • by Raul654 ( 453029 ) on Monday October 15, 2007 @11:51AM (#20983051) Homepage
    AI is a disaster through-and-through. It never works well. Ever.

    Consider hand-writing recognition, autonomous robotics, and game theory, just to name a few of the narrowest, most-well defined (read:easiest) AI applications. AI works well in none of these - at best, it's so-so (like the 95-98% success rates in OCR).

    Now what you have here, with the automatic redacting copier, is that the copier needs to understand the document its reading, and determine which parts to redact. Contextual understanding is *HARD* - it's the same class of problem as automated translation - only harder in this case.

    This copier idea is a huge flop. I don't know why they waste money on it. Anyone who relies on this copier to redact documents is a fool, because it is bound to make all kinds of mistakes (both type 1 - missing things it should have picked up, and type 2 - redacting things it shouldn't).
  • by homey of my owney ( 975234 ) on Monday October 15, 2007 @11:54AM (#20983101)
    Yeah, and the practice continued unabated. It amazes me that they claim the very same government that is inept, unethical and incapable, can pull off keeping such a huge secret. For a 140 year old reference: "It appears we have appointed our worst generals to command forces, and our most gifted and brilliant to edit newspapers. In fact, I discovered by reading newspapers that these editor/geniuses plainly saw all my strategic defects from the start, yet failed to inform me until it was too late. Accordingly, I am readily willing to yield my command to these obviously superior intellects, and I will, in turn, do my best for the Cause by writing editorials - after the fact." ~ Robert E. Lee, 1863
  • by deniable ( 76198 ) on Monday October 15, 2007 @11:57AM (#20983141)
    The fun game is then getting access to the material stored in the copier. This is the big list of things not to tell people. It's like having a what to hide from the cops list on your fridge.
  • We the [REDACTED] (Score:4, Insightful)

    by Speare ( 84249 ) on Monday October 15, 2007 @12:01PM (#20983191) Homepage Journal

    I wonder if it prints yellow dots to encode the redacted text for forensic analysis.

    You know, it used to be that a "national security" threat was something that could kill millions, or wipe out the White House. Now a kid with some lighter fluid can be arrested for terroristic threats, and it's the White House that authorizes the killing. Can nobody read the Constitution?

    We the [REDACTED] [cafepress.com]
  • by julesh ( 229690 ) on Monday October 15, 2007 @12:04PM (#20983235)
    Maybe it's as good as Adobe PDF's redaction feature, and anyone can unredact the document?

    To be fair to Adobe, that *isn't* a redaction feature. It's a rectangle drawing feature that happens to get regularly misused.
  • by gt_mattex ( 1016103 ) on Monday October 15, 2007 @12:05PM (#20983249)
    Or maybe camera phones have already rendered this technology moot.
  • by Raul654 ( 453029 ) on Monday October 15, 2007 @12:25PM (#20983503) Homepage
    You have missed my point. I don't deny that OCR makes life easier for people who have to digitize documents. The point I am making is that OCR is, as far as AI applications go, the easiest problem there is. And, even with such an easy problem, the best applications out there deliver substantially less than reliable performance. (If you think 99% is OK, then imagine that for a 100,000 word novel, at 4 characters per word, that's 2,000 words that need fixing).

    Now, with this copier, you are talking about a *substantially* harder problem, which has far less tolerance for errors. (Meaning that you want absolutely no false negatives) The chances that this copier works as advertised, or anywhere close to it, is basically nil. It was a waste of money for Xerox to develop it (because anyone even moderately knowledgeable about AI should have been able to tell them this) and it's a waste of money for anyone who buys it.
  • by martyb ( 196687 ) on Monday October 15, 2007 @12:26PM (#20983517)

    AI is a disaster through-and-through. It never works well. Ever.

    Consider hand-writing recognition, autonomous robotics, and game theory, just to name a few of the narrowest, most-well defined (read:easiest) AI applications. AI works well in none of these - at best, it's so-so (like the 95-98% success rates in OCR).

    Agreed. But, there's a huge continuum between the current error-prone, manual process and a fully-automated redaction machine.

    Now what you have here, with the automatic redacting copier, is that the copier needs to understand the document its reading, and determine which parts to redact. Contextual understanding is *HARD* - it's the same class of problem as automated translation - only harder in this case.

    Agreed. But I do see an opportunity here for an automated assistant to the current manual process. In a sense, it's like a context-sensitive lint [wikipedia.org] for English.

    Imagine it watching over your shoulder, so to speak, as you start redacting a document. "Oh, he just redacted: 'Reading, Mass' so I'll let 'em know the next time I see that. Consider an incremental search in an editor where it highlights all instances of the string you are searching for. You still need to actually READ the text, but it helps to at least point out all "words/phrases of interest."

    Let's put it another way. Imagine YOU are sitting in front of a PC and manually redacting hundreds of pages of documents. How long before you'd wish there was a way for the system to highlight things you have already told it, TWENTY !!%$%%! TIMES, that should be redacted? You still need to accept the offering, and continue to locate and point out additional words/phrases of interest so it can build its "vocabulary".

    Then, for completeness, add a verification pass where you get to see, in context, all accepted and declined redaction suggestions. For additional security or confidence, have another person do the same thing from the same starting point, and then diff the resulting redactions.

    Summary: no silver bullet here, but I see it being a very useful and helpful adjunct to an all-manual process.

  • by AJWM ( 19027 ) on Monday October 15, 2007 @12:48PM (#20983835) Homepage
    Even if OCR only has a 50% success rate, that means that it is 50% less work that someone is going to be doing.

    While in general I agree with your point -- a thing doesn't have to be perfect to be useful -- OCR with only a 50% success rate is likely to mean more work for somebody who has to go through and correct it. At some point it's easier just to retype the whole thing manually than go through correcting all the OCR errors, and I think that point is a lot fewer errors than 50%. (Been there, done that.)
  • by cyphergirl ( 186872 ) on Monday October 15, 2007 @12:57PM (#20983963) Homepage Journal
    Everyone seems to be automatically assuming that it would be used for classified data. This looks more to me like something developed for the businesses that have to deal with HIPPA. Well-defined medical forms (with SSN, name, etc in the same place every time) could automatically be redacted in order to ensure patient privacy and HIPPA-compliance. Looks like a win for the medical industry. It could also work well in the financial world where "need to know" information can be blacked out on financial forms and applications.
  • by Psykechan ( 255694 ) on Monday October 15, 2007 @01:09PM (#20984127)
    I worked at a company that had a "top secret" project that they were working on that if the internal name were revealed it could result in, well, not much really... but management was very paranoid that it would get out. This copier could have sensed the name and blanked it out when sales copied "sensitive" material accidentally. Nice.

    Except for the fact that once you make the machine start thinking the user begins to stop thinking. If sales knew about this feature then they wouldn't be bothered to care at all what they were copying and sending out to customers. Eventually the copier wouldn't be a fail safe for the user but would be just a new liability for error. I can't see how this is really much better except it just shifts the blame to IT.
  • by Intron ( 870560 ) on Monday October 15, 2007 @01:30PM (#20984453)
    It will be great until the first time somebody puts the form in upside-down and copies it.

"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry

Working...