Forgot your password?
typodupeerror
Bug

Xerox Photocopiers Randomly Alter Numbers, Says German Researcher 290

Posted by timothy
from the we-think-you-meant-9 dept.
First time accepted submitter sal_park writes "According to a report from German computer scientist D. Kriesel, some Xerox WorkCentre copiers and scanners may alter numbers that appear in scanned documents. Having analyzed the output of two such devices, the Xerox WorkCentre 7535 and 7556, Kriesel found that "patches of the pixel data are randomly replaced in a very subtle and dangerous way": in particular, some numbers appearing in a document may be replaced by other numbers when it is scanned."
This discussion has been archived. No new comments can be posted.

Xerox Photocopiers Randomly Alter Numbers, Says German Researcher

Comments Filter:
  • by hawkinspeter (831501) on Tuesday August 06, 2013 @08:50AM (#44485185)
    So, it has come to this.
  • by Anonymous Coward on Tuesday August 06, 2013 @08:54AM (#44485211)

    Kriesel found that âoepatches of the pixel data are randomly replaced in a very subtle and dangerous wayâ

    Slashdot users are advised not to use Xerox copiers for submissions.

    • by J'raxis (248192) on Tuesday August 06, 2013 @09:01AM (#44485245) Homepage

      That bug is caused by Slashdot still refusing to implement this 20-year-old technology [wikipedia.org]. I mean, this being some sort of cutting-edge tech blog and all, who'd expect them to properly support a character-encoding technology that came out two decades ago?

      • by intermodal (534361) on Tuesday August 06, 2013 @09:06AM (#44485295) Homepage Journal

        Especially with such an international audience.

      • by Minwee (522556)

        If that technology is too arcane, perhaps this helpful tool [fourmilab.ch] might be useful.

        On the other hand, it might backfire and wipe out half of the site's users, so maybe that's not such a good idea.

    • by nospam007 (722110) *

      "Slashdot users are advised not to use Xerox copiers for submissions."

      Imagine what Excel, rounding problems and this copy-machine could do to the economy of your enemies.

    • by omnichad (1198475)

      And how will they keep up with making all the dupe posts? The summary is supposed to be grossly wrong here anyway.

  • oh man, what a mess (Score:5, Informative)

    by Trepidity (597) <delirium-slashdo ... g ['ish' in gap]> on Tuesday August 06, 2013 @08:54AM (#44485213)

    Some of these machines have been used for digitizing documents whose originals were later shredded, so some people now have subtly wrong "original" digitals. It's particularly problematic because of the nature of degradation; usual lossy degradation of images is in a non-semantic way, just produces blurring or blocking or other kinds of artifacts, not OCR-error style mistakes.

    The issue here seems to be the lossy mode of JBIG2 [wikipedia.org], which tries to find patches of the image that approximately match, and consolidates them. The idea seems to be that if the letter "e" appears 5000 times in a document in the same typeface, you just store some version of it once, and then reference it everywhere it appears. But now you get OCR-style errors, if you end up matching some patches to incorrect partners. You have your lightly printed "8" replaced by the "0" patch now and then, that kind of thing. And unlike people doing OCR, who know they need to take this into account, the operators of these machines likely had no idea this was even a possible failure mode to watch for, so who knows how many numbers are wrong in miscellaneous documents (letters are a little less problematic, because most random letter mutations don't destroy meaning).

    Blargh.

    • by iguana (8083) * <davepNO@SPAMextendsys.com> on Tuesday August 06, 2013 @09:08AM (#44485317) Homepage Journal

      Could also be a problem with an overly aggressive hole filling algorithm. http://www.mathworks.com/help/images/ref/imfill.html [mathworks.com]

      I'd expect there's nothing nefarious going on. It's very likely an overly aggressive image processing algorithm.

      • by Anonymous Coward on Tuesday August 06, 2013 @09:32AM (#44485529)

        While it isn't nefarious so far as a deliberate plot to destroy documents and their integrity, it is a bug that is of concern for those who want to preserve documents for long-term storage in an archival situation.... such as was the case with the architectural documents being scanned.

        Keep in mind that in some archival situations, the original paper documents are destroyed where the scanned versions in these files are all that remains of those documents. Ultimately, by having the numbers change like this, regardless of why it is happening, now throws serious doubt as to the validity of any of the numbers in that document. This can have an enormous set of consequences if you are using this scanned document as a receipt, for banking purposes (aka the check amount might have a different number than was originally used) or other similar kinds of situations. Engineering offices, banks, and a great many other businesses are shredding mountains of paper and archiving those documents electronically, so it is a big deal.

        I guess it really boils down to understanding the limitations of compression algorithms, and not buy into the hype that a vendor might have where you can save all kinds of storage space with this incredible algorithm.... and find out that all of your documents are worthless when you try to submit them to a judge & jury in a lawsuit as evidence. Perhaps an engineer needs to find the dimensions and tolerance limits of a bolt in an obscure subsystem... and the numbers change? Do you really want to fly in an airplane where the parts specifications have changed because of an error like this? Do you mind if a few hundred or even thousand dollars are taken out of your bank account that you didn't authorize?

      • by Hatta (162192) on Tuesday August 06, 2013 @09:33AM (#44485543) Journal

        That's what she said.

      • by Agent0013 (828350)
        Can't be a hole filling algorithm. The 8 that replaces the 6 still has the little dent on the left between the two round parts. It isn't just filling in the 6 to make an 8, it is actually replacing the 6 with a copy of the 8 from elsewhere on the page.
    • Re: (Score:2, Insightful)

      by sh00z (206503)

      The issue here seems to be the lossy mode of JBIG2

      combined with the fact that he's complaining about errors in scans of a 7-point font. At that size, it probably only takes two erroneous pixels to change a 6 to an 8.

      • by Trepidity (597) <delirium-slashdo ... g ['ish' in gap]> on Tuesday August 06, 2013 @09:23AM (#44485449)

        Ran some numbers to check, and with some assumptions your estimate seems pretty close.

        The modern standard "postscript point" is 1/72 in, so a 7-point font has a height 7/72 inches. The stroke distinguishing the 6 from the 8 is maybe 1/4 of the height, so let's say ~0.025 inches. If the print/scan cycle roundtrips at somewhere in the range 75-150 dpi, that's 2-4 pixels. If you can manage a professional-standard 300 dpi, you get more like 7-8 pixels, but that's a fairly optimistic case.

        • by dj245 (732906) on Tuesday August 06, 2013 @09:35AM (#44485563) Homepage

          Ran some numbers to check, and with some assumptions your estimate seems pretty close.

          The modern standard "postscript point" is 1/72 in, so a 7-point font has a height 7/72 inches. The stroke distinguishing the 6 from the 8 is maybe 1/4 of the height, so let's say ~0.025 inches. If the print/scan cycle roundtrips at somewhere in the range 75-150 dpi, that's 2-4 pixels. If you can manage a professional-standard 300 dpi, you get more like 7-8 pixels, but that's a fairly optimistic case.

          Why wouldn't you use at least 300dpi?

          Most "serious" office printers print at 600dpi or better, so the information is there. Even my $100 brother laser printer defaults to 600dpi. Every recent office multifuntion I have seen can scan at 200, 300, or 600dpi, but every single one defaults to 200dpi. 200dpi scans are hard on the eyes. I always scan at 600dpi, the file size isn't bad in the age of 300GB laptop hard drives, and if I need to send it to someone external to the company, I can always reduce the size.

          • by N1AK (864906)

            I always scan at 600dpi, the file size isn't bad in the age of 300GB laptop hard drives, and if I need to send it to someone external to the company, I can always reduce the size.

            In which case it begs the question why bother using an algorithm that substitutes in the real content to save space if space isn't an option regardless of what DPI you use? Clearly space saving was a consideration for someone ;)

      • by v1 (525388)

        I think the problem isn't so much the problem recognition, but the reproduction. It may be looking at two numbers that both look about the same, and using the same compressed data to draw both of them back out. Making them look identical. So if you started with two numbers, say one that was 70% like a 6 and 30% like a 8, and another that was 40% like a 6 and 60% like an 8, it's deciding they're "close enough" and is drawing the 70/30 image in both places. A human could figure out the second one was supp

    • by N1AK (864906)
      I have to admit I'm actually really surprised by this. The idea and technology are good but I would think it fundamentally breaks a key feature of digitising a document: removing the need to keep the hard copy. The moment the digitised copy is more than an electronic representation of the physical document then the authenticity of anything in the digitised document is in doubt. Can it really be used to prove what someone read and signed for example, even if the chance of an error in any case is 1/10,000?
    • Thanks for the quick explanation. This is kind of hilariously unfortunate, since it has the potential to undermine the reliability of lots of documents.
  • JBIG2 (Score:5, Insightful)

    by Anonymous Coward on Tuesday August 06, 2013 @08:54AM (#44485217)

    Caused by misconfigured JBIG2 compression. When pixel error rate is low enough, similar looking features get printed with the same subimage.

  • by Anonymous Coward on Tuesday August 06, 2013 @09:05AM (#44485283)

    Before anyone spreads wrong information: The problem is with the JBIG2 image compression algorithm used when scanning to PDF format. OCR has nothing to do with this. Also, TIFF format images are not affected as they don't use JBIG2.

  • by mejustme (900516) on Tuesday August 06, 2013 @09:08AM (#44485311)

    Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"

    Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf [unc.edu]

    • Re: (Score:2, Insightful)

      by Racemaniac (1099281)

      thanks for mentioning where in the 328 page document you linked that is written :)

    • Re: (Score:3, Informative)

      by Anonymous Coward

      Interesting, since as far as I remember from reading about this issue yesterday, Xerox had not yet responded to this issue. Strange, since it's in the documentation.

      But then, reading the manual in context, the quote appears on pages 107, 129, and 179, which is the chapters "Fax", "Workflow Scanning", and "Save and Reprint Jobs" respectively.

      It's not in the chapter "Copying" (pages 39..63), so there is no excuse that this issue occurs in simple copy mode.

    • by timeOday (582209)
      Seriously, how did you happen to know about that?
    • Quote: "Normal/Small produces small files by using advanced compression techniques. Image quality is acceptable but some quality degradation and character substitution errors may occur with some originals"

      Source: http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf [unc.edu]

      Very interesting find, although that warning only appears in the "Fax" section of the manual, and not in the "Copy" or "Workflow Scanning" sections.

      • by Rob the Bold (788862) on Tuesday August 06, 2013 @09:55AM (#44485713)

        Very interesting find, although that warning only appears in the "Fax" section of the manual, and not in the "Copy" or "Workflow Scanning" sections.

        AND I'd be wrong, it's in all three sections. Ctrl-F'ing in Ocular only finds "character substitution" when the words are side-by-side, not split by a line break as they appear in the copying and scanning sections.

        That's way worse. Xerox knows about this, and just puts in a little note, rather than a big old: "WARNING: Normal/Small mode may produce undetectable text errors."

        And that type of warning should be defined in the beginning of the manual as "operations that may cause data transcription errors resulting in financial harm, damage to property, injury or death".

    • by Atzanteol (99067) on Tuesday August 06, 2013 @09:41AM (#44485593) Homepage

      That's "Normal" quality? That could be *very* misleading. If you have an option that has negative side-effects such as this then the option should be titled something to indicate the risk - "Super-compressed", "dangerously small" or the like.

      Though I'm surprised Xerox would even allow such a compression if such an obvious issue occurs. People would expect image quality to suffer - but full character substitution?

    • The problem is that most people only read the manual when they discover something is wrong and there is no immediately obvious problem with the results of these scans. The problem only gets noticed much later when someone tries to work with the scanned information and discovers that it is readable but doesn't make sense.

      I also notice that the manual says that the other options give larger files with better image quality but does not state clearly whether compression algorithms that can cause character subst

  • How could Xerox make copiers for this length of time and not have a proofreading algorithm that works with a super-resolution scan & no interpolation to "machine check" the final commercial copier as a way of quickly finding errors?

    Internatlly, Xerox engineering had to know they were "correcting" pixels, rather than just "copying" them, so how did they verify their software?

    • by Fnord666 (889225)

      How could Xerox make copiers for this length of time and not have a proofreading algorithm that works with a super-resolution scan & no interpolation to "machine check" the final commercial copier as a way of quickly finding errors?

      Internatlly(sic), Xerox engineering had to know they were "correcting" pixels, rather than just "copying" them, so how did they verify their software?

      They do know [slashdot.org] about it.

  • Free Speech (Score:5, Funny)

    by BradyB (52090) on Tuesday August 06, 2013 @09:11AM (#44485341) Homepage

    Hey, even photo copiers and faxes need freedom of speech.

  • by Anonymous Coward on Tuesday August 06, 2013 @09:13AM (#44485351)

    If you read the documentation from XEROX... it claims that on scanning it is a known problem that "Image quality is
    acceptable but some quality degradation and character substitution errors may occur with some
    originals." page 107 from http://www.cs.unc.edu/cms/help/help-articles/files/xerox-copier-user-guide.pdf

    also on page 129 we have the following: "Quality / File Size
    The Quality / File Size settings allow you to choose
    between scan image quality and file size. These settings
    allow you to deliver the highest quality or make smaller
    files. A small file size delivers slightly reduced image quality
    but is better when sharing the file over a network. A larger
    file size delivers improved image quality but requires more
    time when transmitting over the network. The options are:
      Normal/Small produces small files by using advanced
    compression techniques. Image quality is acceptable but some quality degradation and character
    substitution errors may occur with some originals."

  • They probably have some parts made of wub fur [wikipedia.org]. Those machines are more advanced than I thought!
  • I just spent ten minutes describing exactly how JBIG works here before noticing someone already realised what is happening and put it up on the page.

  • ImageRunner (Score:4, Funny)

    by poofmeisterp (650750) on Tuesday August 06, 2013 @09:26AM (#44485483) Journal

    OMG, my Canon ImageRunners are doing the same thing! It must be a virus!

    I'd better write up a research document on this and request some grant money.

  • Interesting (Score:4, Interesting)

    by jones_supa (887896) on Tuesday August 06, 2013 @09:31AM (#44485527)
    The things you learn. I never knew before about JBIG2 and how scanners use it to repeat pieces of image. Seems to me that the JBIG2 parameters are tuned incorrectly in these scanners.
  • by Dunbal (464142) * on Tuesday August 06, 2013 @09:33AM (#44485537)
    This was a decision by Xerox to get around ever being sued for copyright violations...
  • NSA BUG (Score:2, Funny)

    by Sentrion (964745)

    It's just a bug in the NSA eavesdropping algorithm.

  • by joh (27088) on Tuesday August 06, 2013 @09:46AM (#44485639)

    how a compression that may lead to documents altered in such a way (numbers replaced by other numbers) can be considered fit for use in a photocopier. This can lead to very real, expensive and even dangerous problems down the line.

  • by ZorinLynx (31751) on Tuesday August 06, 2013 @10:22AM (#44485959) Homepage

    Why do we need such aggressive compression algorithms, algorithms that can make the data WRONG, in this day and age when storage and memory is so incredibly cheap?

    This is not 1987 when every byte was precious and 1MB of RAM cost a hundred bucks. There is NO EXCUSE for this these days; just use PNG or JPG compression; at least those don't freaking CHANGE THE DATA!!

  • by JeanCroix (99825) on Tuesday August 06, 2013 @10:23AM (#44485983) Journal

    I printed out the article in order to hang it on the wall above my office's Workcentre as a warning to coworkers. But apparently printing it fixed the problem, because the article headline became:

    "Xerox scanners/photocopiers Scan Documents Flawlessly and are the Best in the Industry"

  • This is HUGE! (Score:5, Interesting)

    by tekrat (242117) on Tuesday August 06, 2013 @11:15AM (#44486563) Homepage Journal

    This is how people get shot, because the police are given the wrong address to raid a house. This is how people get foreclosed on because a few account numbers are switched.

    Holy crap. That makes me never want to go near a copier again.

  • by nanospook (521118) on Tuesday August 06, 2013 @03:18PM (#44489905)
    Something like this shouldn't have passed QA.. did we outsource or what?

"I have just one word for you, my boy...plastics." - from "The Graduate"

Working...