Xerox Photocopiers Randomly Alter Numbers, Says German Researcher - Slashdot

Become a fan of Slashdot on Facebook

×

Xerox Photocopiers Randomly Alter Numbers, Says German Researcher 290

Posted by timothy on Tuesday August 06, 2013 @08:47AM from the we-think-you-meant-9 dept.

First time accepted submitter sal_park writes "According to a report from German computer scientist D. Kriesel, some Xerox WorkCentre copiers and scanners may alter numbers that appear in scanned documents. Having analyzed the output of two such devices, the Xerox WorkCentre 7535 and 7556, Kriesel found that "patches of the pixel data are randomly replaced in a very subtle and dangerous way": in particular, some numbers appearing in a document may be replaced by other numbers when it is scanned."

This discussion has been archived. No new comments can be posted.

Xerox Photocopiers Randomly Alter Numbers, Says German Researcher

Search 290 Comments Log In/Create an Account

Comments Filter:

oh man, what a mess (Score:5, Informative)

by Trepidity ( 597 ) writes: <[gro.hsikcah] [ta] [todhsals-muiriled]> on Tuesday August 06, 2013 @08:54AM (#44485213)

Some of these machines have been used for digitizing documents whose originals were later shredded, so some people now have subtly wrong "original" digitals. It's particularly problematic because of the nature of degradation; usual lossy degradation of images is in a non-semantic way, just produces blurring or blocking or other kinds of artifacts, not OCR-error style mistakes.
The issue here seems to be the lossy mode of JBIG2 [wikipedia.org], which tries to find patches of the image that approximately match, and consolidates them. The idea seems to be that if the letter "e" appears 5000 times in a document in the same typeface, you just store some version of it once, and then reference it everywhere it appears. But now you get OCR-style errors, if you end up matching some patches to incorrect partners. You have your lightly printed "8" replaced by the "0" patch now and then, that kind of thing. And unlike people doing OCR, who know they need to take this into account, the operators of these machines likely had no idea this was even a possible failure mode to watch for, so who knows how many numbers are wrong in miscellaneous documents (letters are a little less problematic, because most random letter mutations don't destroy meaning).
Blargh.

Share
twitter facebook
Re:Slashdot affected as well (Score:5, Informative)

by J'raxis ( 248192 ) writes: on Tuesday August 06, 2013 @09:01AM (#44485245) Homepage

That bug is caused by Slashdot still refusing to implement this 20-year-old technology [wikipedia.org]. I mean, this being some sort of cutting-edge tech blog and all, who'd expect them to properly support a character-encoding technology that came out two decades ago?

Parent Share
twitter facebook
Re:Some image smoothing algorithm... (Score:5, Informative)

by Sponge Bath ( 413667 ) writes: on Tuesday August 06, 2013 @09:02AM (#44485257)

This is not smoothing, distortion or individual pad pixels. Entire image patches are copied incorrectly, essentially repeating a scanned section containing one number over another part of the image containing a different number.

Parent Share
twitter facebook
Re:oh man, what a mess (Score:5, Informative)

by Trepidity ( 597 ) writes: <[gro.hsikcah] [ta] [todhsals-muiriled]> on Tuesday August 06, 2013 @09:07AM (#44485297)

Yeah, it's not OCR per se, but it operates on a somewhat similar principle to OCR, identifying which numbers are which and consolidating things it thinks are the same glyph. I agree it's much worse, because it alters the actual image. And it does so in a way that still looks plausible and "clean". Really bad lossy compression that just produced a lot of artifacts so that certain numbers were unreadable would at least telegraph that you shouldn't trust the result, but the numbers here look clean and artifact-free, they just happen to be wrong.

Parent Share
twitter facebook
Re:Really? (Score:5, Informative)

by fuzzyfuzzyfungus ( 1223518 ) writes: on Tuesday August 06, 2013 @09:07AM (#44485305) Journal

Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?
These 'errors' are substantially worse than ordinary scanner suckitude or lossy-compression legovision: JBIG2's pixel-block matching creates the potential for a block containing one character to be mis-identified and replaced with a block containing a different character.
The replaced character will be exactly as legible as text elsewhere on the page, just entirely incorrect.
If it were just the scan quality being lousy, or somebody turning, say, JPEG compression up to the point of pain, mangled characters would be obviously mangled. Not as good as being legible; but the issue is obvious. In this case, the errors will look as good as the rest of the document.

Parent Share
twitter facebook
see the Xerox user manual (Score:5, Informative)

by mejustme ( 900516 ) writes: on Tuesday August 06, 2013 @09:08AM (#44485311)

Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"
Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf [unc.edu]

Share
twitter facebook
Re:Really? (Score:5, Informative)

by xaxa ( 988988 ) writes: on Tuesday August 06, 2013 @09:12AM (#44485345)

Scanning 7pt text at 200dpi with consumer level scanner technology and you're complaining about scan errors. Really?
Consumer level? This isn't a home, or even home-office, machine. It's sold on the website [xerox.co.uk] under the office section.

Parent Share
twitter facebook
Known Xerox Issue..... in documentation (Score:5, Informative)

by Anonymous Coward writes: on Tuesday August 06, 2013 @09:13AM (#44485351)

If you read the documentation from XEROX... it claims that on scanning it is a known problem that "Image quality is
acceptable but some quality degradation and character substitution errors may occur with some
originals." page 107 from http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf
also on page 129 we have the following: "Quality / File Size
The Quality / File Size settings allow you to choose
between scan image quality and file size. These settings
allow you to deliver the highest quality or make smaller
files. A small file size delivers slightly reduced image quality
but is better when sharing the file over a network. A larger
file size delivers improved image quality but requires more
time when transmitting over the network. The options are:
Normal/Small produces small files by using advanced
compression techniques. Image quality is acceptable but some quality degradation and character
substitution errors may occur with some originals."

Share
twitter facebook
Re:oh man, what a mess (Score:5, Informative)

by Trepidity ( 597 ) writes: <[gro.hsikcah] [ta] [todhsals-muiriled]> on Tuesday August 06, 2013 @09:23AM (#44485449)

Ran some numbers to check, and with some assumptions your estimate seems pretty close.
The modern standard "postscript point" is 1/72 in, so a 7-point font has a height 7/72 inches. The stroke distinguishing the 6 from the 8 is maybe 1/4 of the height, so let's say ~0.025 inches. If the print/scan cycle roundtrips at somewhere in the range 75-150 dpi, that's 2-4 pixels. If you can manage a professional-standard 300 dpi, you get more like 7-8 pixels, but that's a fairly optimistic case.

Parent Share
twitter facebook
Re:see the Xerox user manual (Score:3, Informative)

by Anonymous Coward writes: on Tuesday August 06, 2013 @09:26AM (#44485481)

Interesting, since as far as I remember from reading about this issue yesterday, Xerox had not yet responded to this issue. Strange, since it's in the documentation.
But then, reading the manual in context, the quote appears on pages 107, 129, and 179, which is the chapters "Fax", "Workflow Scanning", and "Save and Reprint Jobs" respectively.
It's not in the chapter "Copying" (pages 39..63), so there is no excuse that this issue occurs in simple copy mode.

Parent Share
twitter facebook
Re:Slashdot affected as well (Score:4, Informative)

by Mr Z ( 6791 ) writes: on Tuesday August 06, 2013 @09:38AM (#44485579) Homepage Journal

No, just significantly harder to filter effectively. Also, there were a rash of troll accounts with names that looked like the various Slashdot editors, only using accented variants of letters, such as 'tÍmothy'. All those shenanigans added up to where we are today.

Parent Share
twitter facebook
Re:Really? (Score:5, Informative)

by UnknowingFool ( 672806 ) writes: on Tuesday August 06, 2013 @09:46AM (#44485649)

If you read the article you would see it's not a simple case of scan error where a "13" appears blurry and looks like "B". Whole numbers are changed: 21.11--> 17.43. This is a major issue if it was on a construction drawing for example. A beam 4m too short would be a problem. Even if caught the engineer signing off might have to go through a whole audit process.

Parent Share
twitter facebook
Re:see the Xerox user manual (Score:5, Informative)

by Rob the Bold ( 788862 ) writes: on Tuesday August 06, 2013 @09:55AM (#44485713)

Very interesting find, although that warning only appears in the "Fax" section of the manual, and not in the "Copy" or "Workflow Scanning" sections.
AND I'd be wrong, it's in all three sections. Ctrl-F'ing in Ocular only finds "character substitution" when the words are side-by-side, not split by a line break as they appear in the copying and scanning sections.
That's way worse. Xerox knows about this, and just puts in a little note, rather than a big old: "WARNING: Normal/Small mode may produce undetectable text errors."
And that type of warning should be defined in the beginning of the manual as "operations that may cause data transcription errors resulting in financial harm, damage to property, injury or death".

Parent Share
twitter facebook
Re:These numbers are not the true numbers (Score:2, Informative)

by Joce640k ( 829181 ) writes: on Tuesday August 06, 2013 @10:01AM (#44485779) Homepage

Too much XKCD?
https://xkcd.com/1022/ [xkcd.com]

Parent Share
twitter facebook
Surprised nobody asked this... (Score:4, Informative)

by ZorinLynx ( 31751 ) writes: on Tuesday August 06, 2013 @10:22AM (#44485959) Homepage

Why do we need such aggressive compression algorithms, algorithms that can make the data WRONG, in this day and age when storage and memory is so incredibly cheap?
This is not 1987 when every byte was precious and 1MB of RAM cost a hundred bucks. There is NO EXCUSE for this these days; just use PNG or JPG compression; at least those don't freaking CHANGE THE DATA!!

Share
twitter facebook
Re:Slashdot affected as well (Score:4, Informative)

by dolmen.fr ( 583400 ) writes: on Tuesday August 06, 2013 @10:30AM (#44486073) Homepage

Slashdot uses Perl which is the programming language that has the best support for Unicode [98.245.80.27] (while PHP support for this is comparatively almost inexistent).
But that doesn't make Unicode work magically. The slashcode [slashcode.com] has to take it into account.

Parent Share
twitter facebook
Re:Slashdot affected as well (Score:5, Informative)

by tibit ( 1762298 ) writes: on Tuesday August 06, 2013 @10:37AM (#44486149)

Just in case people miss the obvious: The differing opening and closing quotes are the correct punctuation marks. It was only due to the typewriters and teletypes that the mangling into one quote has begun. The MS Office quotes are not "smart", they are merely correct.

Parent Share
twitter facebook
Re:Anti-counterfeiting (Score:5, Informative)

by Anubis IV ( 1279820 ) writes: on Tuesday August 06, 2013 @10:53AM (#44486341)

That's all I did, and I learned what they were talking about pretty quickly.
It's actually pretty insane. They had architectural diagrams that had the square meters for the rooms copy/pasted by the scanner into other rooms. For instance, here were the room sizes for the three rooms on the diagram as reported on the original diagram and various scans of it (I've bolded incorrect values):
Original Diagram: 14.13m^2, 21.11m^2, 17.42m^2
Xerox WorkCentre 7335 scan: 14.13m^2, 14.13m^2, 14.13m^2
Xerox WorkCenter 7556 scan 1: 14.13m^2, 14.13m^2, 14.13m^2
Xerox WorkCenter 7556 scan 2: 17.42m^2, 21.11m^2, 17.42m^2
Xerox WorkCenter 7556 scan 3: 14.13m^2, 14.13m^2, 17.42m^2
They have images of this happening. It's just outright substituting blocks of text from one part of a scanned image into an entirely separate part. Not just mangling pixels or uniformly displacing each by a few mm, but outright moving them into a different part of the image that was similar, yet slightly different. Maybe it's some sort of optimization or compression gone wrong? I.e. They detected a block that appeared to be the same as a previous one, so assumed they were the same and only kept one copy of that data?
It's bizarre.

Parent Share
twitter facebook
Re:Slashdot affected as well (Score:5, Informative)

by J'raxis ( 248192 ) writes: on Tuesday August 06, 2013 @11:18AM (#44486591) Homepage

The typo in the article evidences that they were using UTF-8. If a quotation mark is turned into three separate characters, that's the tell-tale that it was UTF-8 (multibyte) and not a Windows code page (all single-byte encodings).

Parent Share
twitter facebook
Xerox's Official Response (Score:2, Informative)

by Anonymous Coward writes: on Tuesday August 06, 2013 @12:17PM (#44487359)

http://realbusinessatxerox.blogs.xerox.com/2013/08/06/always-listening-to-our-customers-clarification-on-scanning-issue/?CMP=SMO-EXT#.UgEhdRgk98F
By Francis Tse, principal engineer, Xerox
Recently there have been articles about Xerox devices randomly altering numbers in scanned documents. We take this issue very seriously.
The problem stems from a combination of compression level and resolution setting. The devices mentioned are shipped from the factory with a compression level and resolution that produces scanned files which are optimized for viewing or printing while maintaining a reasonable file size. We do not normally see a character substitution issue with the factory default settings however, the defect may be seen at lower quality and resolution settings.
The Xerox design utilizes the recognized industry standard JBIG2 compressor which creates extremely small file sizes with good image quality, but with inherent tradeoffs under low resolution and quality settings.
For data integrity purposes, we recommend the use of the factory defaults with a quality level set to “higher.” In cases where lower quality/higher compression is desired for smaller file sizes, we provide the following message to our customers next to the quality settings within the device web user interface: “The normal quality option produces small file sizes by using advanced compression techniques. Image quality is generally acceptable, however, text quality degradation and character substitution errors may occur with some originals.”
Xerox is totally committed to customer satisfaction and with this feedback we will look for ways to help our customers better manage their scanning application needs.
For more information, contact Xerox Support at http://www.xerox.com/perl-bin/world_contact.pl#0.

Share
twitter facebook
Re:I call BS (Score:4, Informative)

by Guy Harris ( 3803 ) writes: <guy@alum.mit.edu> on Tuesday August 06, 2013 @01:35PM (#44488361)

I work for Xerox. I specifically support these machines in a tier 3 capacity. I have not seen or heard a single case of this.
So does Francis Tse [xerox.com], and he's apparently heard of it.
My group handles calls from all of North America, and some South.
You might want to talk to somebody who handles calls from Western Europe - Germany [dkriesel.com], in particular.

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

390 comments32-Hour Workweek for America Proposed by Senator Bernie Sanders
358 commentsWhat Should Happen to Empty Downtown Office Spaces?
340 commentsHacktivism Erupts In Response To Hamas-Israel War
324 comments'Feedback' Is Now Too Harsh. The New Word is 'Feedforward'
248 commentsWorkers are Resisting Calls to Return to Offices

"May your future be limited only by your dreams." -- Christa McAuliffe