Xerox Photocopiers Randomly Alter Numbers, Says German Researcher 290
First time accepted submitter sal_park writes "According to a report from German computer scientist D. Kriesel, some Xerox WorkCentre copiers and scanners may alter numbers that appear in scanned documents. Having analyzed the output of two such devices, the Xerox WorkCentre 7535 and 7556, Kriesel found that "patches of the pixel data are randomly replaced in a very subtle and dangerous way": in particular, some numbers appearing in a document may be replaced by other numbers when it is scanned."
JBIG2 (Score:5, Insightful)
Caused by misconfigured JBIG2 compression. When pixel error rate is low enough, similar looking features get printed with the same subimage.
Re:Anti-counterfeiting (Score:5, Insightful)
Maybe you should read the article.
Re:Really? (Score:5, Insightful)
Problem with JBIG2, not OCR (Score:3, Insightful)
Before anyone spreads wrong information: The problem is with the JBIG2 image compression algorithm used when scanning to PDF format. OCR has nothing to do with this. Also, TIFF format images are not affected as they don't use JBIG2.
Re:oh man, what a mess (Score:5, Insightful)
Could also be a problem with an overly aggressive hole filling algorithm. http://www.mathworks.com/help/images/ref/imfill.html [mathworks.com]
I'd expect there's nothing nefarious going on. It's very likely an overly aggressive image processing algorithm.
Re:see the Xerox user manual (Score:2, Insightful)
thanks for mentioning where in the 328 page document you linked that is written :)
Re:oh man, what a mess (Score:2, Insightful)
combined with the fact that he's complaining about errors in scans of a 7-point font. At that size, it probably only takes two erroneous pixels to change a 6 to an 8.
Re:Known Xerox Issue..... in documentation (Score:5, Insightful)
Now the question becomes: what moron made this setting the default? Maybe a setting that can undetectably corrupt your data can be provided if appropriate warnings are given, but it sure as hell should never be the default. I would've thought that was obvious.
Re:see the Xerox user manual (Score:5, Insightful)
That's "Normal" quality? That could be *very* misleading. If you have an option that has negative side-effects such as this then the option should be titled something to indicate the risk - "Super-compressed", "dangerously small" or the like.
Though I'm surprised Xerox would even allow such a compression if such an obvious issue occurs. People would expect image quality to suffer - but full character substitution?
Re:Known Xerox Issue..... in documentation (Score:4, Insightful)
Substitution errors shouldn't happen in corporate level scanning hardware, even if you bury a warning about it 107 pages into the 350 page manual. You can't have something that fundamentally makes your product not fit for purpose and claim that it's ok just because it's a known issue.
Re:Slashdot affected as well (Score:4, Insightful)
Especially with such an international audience.
You must have missed the memo. Slashdot is a US site that tolerates international visitors. These are not, however, encouraged to return.
Re:see the Xerox user manual (Score:5, Insightful)
Seems a little dangerous for that algorithm to be the default, doesn't it? Plus, burying the warning deep in the documentation.
And an insufficient warning, at that.
Something more like:
Normal/Small Mode may not be suitable for documents where faithful reproduction of the original text, numbers or illustrations is critical. Examples would include legal documents (contracts, wills, articles of incorporation, etc.), medical documents (patient charts, orders, medication lists, etc.), financial documents (bills, invoices, statements, reconciliations, etc.), business documents (HR records, meeting minutes, memoranda, etc.), engineering documents (drawings, plans, change orders, instructions, bills of material, etc.) or any other document where incorrect data could result in financial loss, injury, death, property damage or destruction, legal liability, loss of reputation or other harm. These examples should not be considered an exhaustive list of documents not suited for scanning, copying or faxing using Normal/Small mode.
would be more appropriate.
Re:These numbers are not the true numbers (Score:5, Insightful)
Too much XKCD?
There is no such thing as "too much XKCD".
Probably an image quality enhancement fix. (Score:4, Insightful)
I expect the bug is because it is trying clean up the scanned image. Trying to account for what it thinks is missing data.
14.13m^2, 21.11m^2, 17.42m^2
It see 3 blocks of information that probably roughly looks the same to the software accounting for errors. The amount of pixels used in each are fairly close. I expect the scanner sees the three blocks and thinks they are the same, and tries to find the block that seems the most sharp and reproduces them over the other spots.
Scanning isn't pixel perfect you get a different match. So the image cleaning processor will probably try to clean the numbers differently.