Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Security

Academic Journal Claims It Fingerprints PDFs For 'Ransomware,' Not Surveillance (vice.com) 70

An anonymous reader quotes a report from Motherboard: One of the world's largest publishers of academic papers said it adds a unique fingerprint to every PDF users download in an attempt to prevent ransomware, not to prevent piracy. Elsevier defended the practice after an independent researcher discovered the existence of the unique fingerprints and shared their findings on Twitter last week. "The identifier in the PDF helps to prevent cybersecurity risks to our systems and to those of our customers -- there is no metadata, PII [Personal Identifying Information] or personal data captured by these," an Elsevier spokesperson said in an email to Motherboard. "Fingerprinting in PDFs allows us to identify potential sources of threats so we can inform our customers for them to act upon. This approach is commonly used across the academic publishing industry."

When asked what risks he was referring to, the spokesperson sent a list of links to news articles about ransomware. However, Elsevier has a long history of pursuing people who pirate or share its paywalled academic articles. [...] It's unclear exactly how fingerprinting every PDF downloaded could actually prevent ransomware. Jonny Saunders, a neuroscience PhD candidate at University of Oregon, who discovered the practice, said he believes Elsevier is trying to surveil its users and prevent people from sharing research without paying the company.
"The subtext there is pretty loud to me," Saunders told Motherboard in an online chat. "Those breaches/ransoms are really a pretext for saying 'universities need to lock down accounts so people can't skim PDFs. When you have stuff that you don't want other people to give away for free, you want some way of finding out who is giving it away, right?"

"Saying that the unique identifiers *themselves* don't contain PII is a semantic dodge: the way identifiers like these work is to be able to match them later with other identifying information stored at the time of download like browser fingerprint, institutional credentials, etc," Saunders added. "Justifying them as a tool to protect against ransomware is a straightforward admission that these codes are intended to identify the downloader: how would they help if not by identifying the compromised account or system?"
This discussion has been archived. No new comments can be posted.

Academic Journal Claims It Fingerprints PDFs For 'Ransomware,' Not Surveillance

Comments Filter:
  • But yeah, this representative of the University is lying. Poorly I might add.
    • by Anonymous Coward on Monday January 31, 2022 @11:13PM (#62225921)
      Elsevier is not a university, they're a scumbag publisher trying to lock up taxpayer-funded knowledge and studies as if it were their own intellectual property.
      • You may not like the way they have a pay for access model but that's their bussiness not yours. It's very often the case open access is a matter of who-pays not free from payment. So for example if you publish in Chem-comm then will her you can pay the page charges yourself to make it open access or it can be closed to those that subscribe to chem-comm. So given their entire cost structure friends in getting paid they should be able to take action against people who that replicate copyrighted works. Fi

      • by Anonymous Coward

        Elsevier is not a university, they're a scumbag publisher trying to lock up taxpayer-funded knowledge and studies as if it were their own intellectual property.

        Well, thank goodness you clarified the difference...'cause you know universities...I mean they certainly don't represent part of a multi-trillion dollar business of extorting the populous for almost-worthless information to create an addiction to "higher" education, right?

        Somebody pass that student a study guide. And the bong.

    • "Elsevier spokesperson"

      It was found by a person at a university, and they agreed with you... That the explanation was a flimsy pretext to prevent sharing of downloaded copies.

      I don't understand why they haven't just DRM'd it somehow. Force you to log in each time you view it. Then if too many simultaneous people do, you blacklist that copy. I assume Acrobat has this ability.

      • by Anonymous Coward on Monday January 31, 2022 @11:46PM (#62225979)

        I don't understand why they haven't just DRM'd it somehow. Force you to log in each time you view it. Then if too many simultaneous people do, you blacklist that copy. I assume Acrobat has this ability.

        Where is the profit from pro-actively preventing people from viewing the material?

        Take it from Microsoft and the BSA: it's far better to let people pirate your warez at will. Then you can sue them for uppity "damages," greatly exceeding what people would have decided to pay if they had been forced to pay before they could use it.

        • by Joce640k ( 829181 ) on Tuesday February 01, 2022 @04:19AM (#62226261) Homepage

          Where is the profit from pro-actively preventing people from viewing the material?

          They can trick some people into paying to view it.

          Take it from Microsoft and the BSA: it's far better to let people pirate your warez at will. Then you can sue them for uppity "damages," greatly exceeding what people would have decided to pay if they had been forced to pay before they could use it.

          That's exactly what they're planning. This "waterprinting" will just make their case easier by being able to show who shared it. People are more likely to pay up to avoid the embarrassment of going to court.

          • by bagofbeans ( 567926 ) on Tuesday February 01, 2022 @10:34AM (#62226919)

            ...some customer in California can demand a dump of their data from Elselvier under the CCPA (California Consumer Privacy Act) and see what the dump has on PDF records. If Elselvier don't show anything, then they become liable under CCPA should they attempt to link the embedded codes.

        • by necro81 ( 917438 )

          Take it from Microsoft and the BSA: it's far better to let people pirate your warez at will. Then you can sue them for uppity "damages,"

          I didn't realize the Boy Scouts of America was that keen on making sure every copy of a merit badge pamphlet was genuine. Given the battered copies I used back in the day, those booklets couldn't have been much of a revenue stream!

      • Then if too many simultaneous people do, you blacklist that copy. I assume Acrobat has this ability.

        Then you view it with Apple Preview instead. As a bonus, Preview doesn't have to be "updated" every time you use it.

    • by hey! ( 33014 ) on Monday January 31, 2022 @11:50PM (#62225983) Homepage Journal

      Yes Elsevier is lying. Personally, I think if you don't like Elsevier's use policies you shouldn't do business with them, but they should simply tell their customers that they're fingerprinting documents to catch them if they're the source of pirated documents that end up on the Internet.

      Lying shows disrespect to its customers, which I suppose is another reason not to do business with them if you can avoid it.

    • Seems obvious that they want to go after people uploading to scihub.

      • This. Elsevier is in the business of selling access to documents. They don't want them shared by others. OTOH, there are some actual "open access" documents published by Elsevier, such as papers produced by federal government employees on work time which are legally in the public domain; if those are given the same "fingerprint" and tracking treatment, I'd have a problem with it. Finally, PDFs can be read by a variety of software, not just Adobe; might be interesting to know how, for instance, the Linux doc

  • Elsevier ransoms (Score:5, Insightful)

    by awwshit ( 6214476 ) on Monday January 31, 2022 @10:38PM (#62225849)

    Right, exactly, so Elsevier know who to deploy their ransoms against. Elsevier has admitted that its business product is really a form of ransomware. Pay Elsevier the ransom for their wares or meet their lawyers.

    • Wow, that's an amazing spin on "selling stuff"
      • Re:Elsevier ransoms (Score:4, Interesting)

        by ls671 ( 1122017 ) on Tuesday February 01, 2022 @03:42AM (#62226233) Homepage

        1) Download PDF
        2) Print PDF using "print as PDF"
        3) Distribute new PDF
        4) Profit!

        Their scheme seems silly for anybody wanting to circumvent it. Maybe that's why they don't want people to know about it but I guess now the cat is out of the bag.

        • First you have to know that the thing is watermarked and all that stuff is necessary.

          Even then it might not get rid of it. Postscript is a wonderfully convoluted thing.

          • Spy agencies know that people, e.g. journalists, will go to great lengths to remove any digital traces from documents. They also included uniquely identifiable errors/differences in distributed documents so that they can be identified even when they've been 'laundered.' I wonder how long it'll take Elsevere to start doing that too.
            • Magic pixels, yes. Otherwise, you could print it then scan it on somebody else's machine, or take a picture with your phone, and use essentially the analog hole to wipe out any hidden trackers. But magic pixels exist that can make a watermark survive that process.

          • When the MPAA distributes screening copies to its members during awards season (or at least when they used to, they seem to be moving away from sending physical copies), the copies each had a 'For Your Consideration..." subtitle overlaid at random intervals. As I understood it, the exact frame at which each overlay started varied from disk to disk, which is how they did their watermarking. Ripping a copy and sending a low-res copy to your friends as an .mp4 or something didn't change the watermarking one
          • by ceoyoyo ( 59147 )

            True, but Elsevier is also wonderfully incompetent.

      • How else does marking documents for individuals help against ransomware?

  • Order .pdf copies of their standards and you will find your name and email address imprinted at the bottom of each page of the standards.
    • This sounds like something less obvious than that.

      I gather that it's a unique and invisible watermark that you don't see when viewing the file but that can be read by someone who knows where to look for it.

      Digital movies in commercial movie theatres have something similar. Each projector embeds a watermark into the video and audio stream that the audience can't see or hear but if someone records the movie off of the screen the movie company can tell what theatre it came from and what date and time it was r

      • by CaptQuark ( 2706165 ) on Tuesday February 01, 2022 @02:07AM (#62226141)

        They wouldn't need to do anything so surreptitious. There are plenty of advanced metadata fields that are not obvious unless you search the properties of the document, including one named Custom that lets you add any data you wish to the metadata.

        I'm assuming the tracking number was easy to spot when comparing the same article downloaded by two different people. For example, here is the metadata from a PDF on CAD/CAM history that I recently downloaded:


        [x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04 "]
              [rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"]
                    [rdf:Description rdf:about=""
                                xmlns:dc="http://purl.org/dc/elements/1.1/"]
                          [dc:format]application/pdf[/dc:format]
                          [dc:creator]
                                [rdf:Seq]
                                      [rdf:li]Springer-SBM[/rdf:li]
                                [/rdf:Seq]
                          [/dc:creator]
                          [dc:title]
                                [rdf:Alt]
                                      [rdf:li xml:lang="x-default"]sv-lncs[/rdf:li]
                                [/rdf:Alt]
                          [/dc:title]
                    [/rdf:Description]
                    [rdf:Description rdf:about=""
                                xmlns:xmp="http://ns.adobe.com/xap/1.0/"]
                          [xmp:CreateDate]2012-07-21T17:49:59+10:00[/xmp:CreateDate]
                          [xmp:CreatorTool]Microsoft Word 2010[/xmp:CreatorTool]
                          [xmp:ModifyDate]2012-07-22T13:57:45+10:00[/xmp:ModifyDate]
                          [xmp:MetadataDate]2012-07-22T13:57:45+10:00[/xmp:MetadataDate]
                    [/rdf:Description]
                    [rdf:Description rdf:about=""
                                xmlns:pdf="http://ns.adobe.com/pdf/1.3/"]
                          [pdf:Producer]Microsoft Word 2010[/pdf:Producer]
                    [/rdf:Description]
                    [rdf:Description rdf:about=""
                                xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"]
                          [xmpMM:DocumentID]uuid:bcec36f1-b156-44b5-9a90-d4de40f60b45[/xmpMM:DocumentID]
                          [xmpMM:InstanceID]uuid:81e81de0-09cd-42de-807f-a34af2833126[/xmpMM:InstanceID]
                    [/rdf:Description]
              [/rdf:RDF]
        [/x:xmpmeta]

        Those last two fields, DocumentID and InstanceID would be very easy to generate during the download and tie to a specific user account. Unless two different people download the same PDF and compare the files closely, it isn't something the average person is going to associate with a tracking system.

        --
        (Bent brackets replaced with square brackets)

    • I've had the same with NXP but the difference is that the NXP documents contained their intellectual property and we signed an NDA. That makes total sense. Elsevier doesn't own the IP and in fact much of the research was taxpayer funded.
      • by ceoyoyo ( 59147 )

        Elsevier owns the copyright for the specific layout. Everything else is owned by the authors, at least with all the Elsevier journals I've ever submitted to. Authors are welcome to do whatever they want with their non-Elsevier formatted documents, and Elsevier is actually reasonably open about that.

        Actually, Elsevier's "Green" open access option, which is the free one that all their journals offer, says you can "post a version of your article in a repository after an embargo so others can access it freely"

  • Wrong (Score:4, Insightful)

    by xalqor ( 6762950 ) on Monday January 31, 2022 @10:54PM (#62225891)

    The identifier in the PDF helps to prevent cybersecurity risks to our systems and to those of our customers

    Don't believe it. An identifier on a document completely ineffective at preventing anything, except the problem of not having an identifier on a document...

  • It's for the children
    2 weeks to flatten the curve
    Check is in the mail
    The dog at my homework
    I have read the terms and conditions. ...

  • Identify without PII (Score:5, Interesting)

    by GigaplexNZ ( 1233886 ) on Monday January 31, 2022 @11:28PM (#62225947)

    "there is no metadata, PII [Personal Identifying Information] or personal data captured by these,"

    Cool.

    "Fingerprinting in PDFs allows us to identify potential sources of threats so we can inform our customers"

    ...I thought there was no PII to be able to identify the customers that need notifying.

    • by tlhIngan ( 30335 )

      "there is no metadata, PII [Personal Identifying Information] or personal data captured by these,"

      Cool.

      "Fingerprinting in PDFs allows us to identify potential sources of threats so we can inform our customers"

      ...I thought there was no PII to be able to identify the customers that need notifying.

      You can do both.

      You can fingerprint something by assigning it a serial number. The serial number is just an arbitrary number - it could be the total number of downloaded files, so having your document stamped #56828

      • I don't think the parent was questions how this could technically be done, he was highlighting that the two statements are contradictory.

        The way I read between the lines here is that the evil vendor is merely reminding its customers that when they say fingerprinting "allows us to identify potential sources of threats", the threat they are referring to is the threat to their business of customers making their own copies, and when they say their purpose is to "inform our customers" they mean they are absolute

  • by Cochonou ( 576531 ) on Tuesday February 01, 2022 @12:39AM (#62226051) Homepage
    On IEEE articles there has been for a decade a footnote on the downloaded PDFs: Authorized licensed use limited to: XXX. Downloaded on YYY at ZZZ UTC from IEEE Xplore. Restrictions apply.
    I think the message is pretty clear and nobody says this is to prevent ransomware. At any rate, itâ(TM)s probably better than a stealthy fingerprint.
  • Hypothetically, would 'print to pdf' scrub this fingerprint?
    • by rkinch ( 608630 ) <kinch@truetex.com> on Tuesday February 01, 2022 @01:10AM (#62226097) Homepage

      Hypothetically, would 'print to pdf' scrub this fingerprint?

      No, not necessarily, depending on how it's implemented.

      For example, a unique ID could be coded by simply scrambling the order of appearance of a series of elements necessary to the document. There are limitless ways to hide information inside PDFs, and no filter could possibly detect them all.

      • There are limitless ways to hide information inside PDFs, and no filter could possibly detect them all.

        Or you could just compare the same journal from three different sources. That could easily tell you if there is something fishy going on with that document.

      • You can run a script that converts the pdf to image then run ocr back into pdf. I process some PDF with aggregated medical information (list of lab results for N people), and automate the process of sending each person a black-marker censored PDF with only their info. I do it with a bash script calling convert (imagemagick) for the conversion and black-marker modifications, then tesseract-ocr, then pdfunite (poppler) to build back the multiplage.

        Conversion to png is full guarantee I did not miss metadata, h

    • This particular mark? Yes. Actually, running minuimus.pl with --discard-meta would probably do it, though I'd want to check.

      But now this mark has been caught, I expect a new and stealthier one is in the works. There are so very many places you can hide information within a PDF file quite easily.

  • And lying badly. Also not a surprise. The time for scientific "publishers" is over. They offer nothing of value. Printing journals is past and the real value is added by the reviewers. These do not get paid though, so they can review on a public-access server just as well. Publishing via scientific publishers means you get less exposure, less reads and you often have to pay to get published in addition.

  • by buck-yar ( 164658 )

    If they want to surveil the studies I read, I'm ok with that. I encourage it, they'll learn a whole bunch.

  • Does running a PDF through a PDF printer strip this kind of data from the document?

    • Depends on how the "protection" is implemented and how the virtual printer is implemented. But there would most likely be visual differences. Ohh, and how well the PDF complies with the crazy Adobe specification. I have worked with PDFs for a while and only few times have seen proper implementation of PDF structures. Such artifacts can be used as identifiers to tie the downloaders account to the "stolen" knowledge item, PDF.
  • they could have waited and made the announcement on Aaron Swartz day.
  • The fact that is often used as a Pry, Leaver, Scraper, Probe, Hole Puncher, Awl... Never mind that reinforced handle that can take hammer blows like a Chisel, is not relevant to the fact that its primary job is to tighten and loosen screws.

  • Elsevier is a parasitic leech that has attached itself to science, and needs to be destroyed. I actively try to avoid publishing my work in Elsevier journals (or Springer, Nature etc.). Unfortunately, this cancer has infested many formerly great journals.

    • True, so true. Science should be free. Elseviers "protection" is like they sneak into the Gutenberg press and throw coffee dregs into the machine. Not gonna work, just keep your tail between your legs and run, your time has ended.
  • Here's the problem. In days of yore, scientific papers were written out by hand. Someone had to type them in, including special formulas and notations that only specifically trained and expensive technical typists were able to do accurately. Draftsmen had to draw any required illustrations. Copies had to be printed on expensive machines, distributed for review, amended, corrected, and finally printed. This was a very expensive process, with very small press runs, and journal publishers charged a lot,

  • PDF to Word and print a new PDF?

    • Probably, but not necessarily. Hidden info could still be in image objects, which might get just copied as is into Word. This needs some testing. Not made easier by MS Word cutting some corners against Adobe specification, which is understandable, as the spec is crazy. And Word method would not be scalable. And elseleeches might put the "protection" only on suspect downloaders, not every copy.

Real programmers don't bring brown-bag lunches. If the vending machine doesn't sell it, they don't eat it. Vending machines don't sell quiche.

Working...