Academic Journal Claims It Fingerprints PDFs For 'Ransomware,' Not Surveillance (vice.com) 70
An anonymous reader quotes a report from Motherboard: One of the world's largest publishers of academic papers said it adds a unique fingerprint to every PDF users download in an attempt to prevent ransomware, not to prevent piracy. Elsevier defended the practice after an independent researcher discovered the existence of the unique fingerprints and shared their findings on Twitter last week. "The identifier in the PDF helps to prevent cybersecurity risks to our systems and to those of our customers -- there is no metadata, PII [Personal Identifying Information] or personal data captured by these," an Elsevier spokesperson said in an email to Motherboard. "Fingerprinting in PDFs allows us to identify potential sources of threats so we can inform our customers for them to act upon. This approach is commonly used across the academic publishing industry."
When asked what risks he was referring to, the spokesperson sent a list of links to news articles about ransomware. However, Elsevier has a long history of pursuing people who pirate or share its paywalled academic articles. [...] It's unclear exactly how fingerprinting every PDF downloaded could actually prevent ransomware. Jonny Saunders, a neuroscience PhD candidate at University of Oregon, who discovered the practice, said he believes Elsevier is trying to surveil its users and prevent people from sharing research without paying the company. "The subtext there is pretty loud to me," Saunders told Motherboard in an online chat. "Those breaches/ransoms are really a pretext for saying 'universities need to lock down accounts so people can't skim PDFs. When you have stuff that you don't want other people to give away for free, you want some way of finding out who is giving it away, right?"
"Saying that the unique identifiers *themselves* don't contain PII is a semantic dodge: the way identifiers like these work is to be able to match them later with other identifying information stored at the time of download like browser fingerprint, institutional credentials, etc," Saunders added. "Justifying them as a tool to protect against ransomware is a straightforward admission that these codes are intended to identify the downloader: how would they help if not by identifying the compromised account or system?"
When asked what risks he was referring to, the spokesperson sent a list of links to news articles about ransomware. However, Elsevier has a long history of pursuing people who pirate or share its paywalled academic articles. [...] It's unclear exactly how fingerprinting every PDF downloaded could actually prevent ransomware. Jonny Saunders, a neuroscience PhD candidate at University of Oregon, who discovered the practice, said he believes Elsevier is trying to surveil its users and prevent people from sharing research without paying the company. "The subtext there is pretty loud to me," Saunders told Motherboard in an online chat. "Those breaches/ransoms are really a pretext for saying 'universities need to lock down accounts so people can't skim PDFs. When you have stuff that you don't want other people to give away for free, you want some way of finding out who is giving it away, right?"
"Saying that the unique identifiers *themselves* don't contain PII is a semantic dodge: the way identifiers like these work is to be able to match them later with other identifying information stored at the time of download like browser fingerprint, institutional credentials, etc," Saunders added. "Justifying them as a tool to protect against ransomware is a straightforward admission that these codes are intended to identify the downloader: how would they help if not by identifying the compromised account or system?"
I try not to assume the worst of people (Score:2)
Re:I try not to assume the worst of people (Score:5, Informative)
Yeah they lied, but fingerprinting is fine (Score:2)
You may not like the way they have a pay for access model but that's their bussiness not yours. It's very often the case open access is a matter of who-pays not free from payment. So for example if you publish in Chem-comm then will her you can pay the page charges yourself to make it open access or it can be closed to those that subscribe to chem-comm. So given their entire cost structure friends in getting paid they should be able to take action against people who that replicate copyrighted works. Fi
Re: (Score:1)
Elsevier is not a university, they're a scumbag publisher trying to lock up taxpayer-funded knowledge and studies as if it were their own intellectual property.
Well, thank goodness you clarified the difference...'cause you know universities...I mean they certainly don't represent part of a multi-trillion dollar business of extorting the populous for almost-worthless information to create an addiction to "higher" education, right?
Somebody pass that student a study guide. And the bong.
Re: (Score:2)
"Elsevier spokesperson"
It was found by a person at a university, and they agreed with you... That the explanation was a flimsy pretext to prevent sharing of downloaded copies.
I don't understand why they haven't just DRM'd it somehow. Force you to log in each time you view it. Then if too many simultaneous people do, you blacklist that copy. I assume Acrobat has this ability.
Re:I try not to assume the worst of people (Score:4, Interesting)
I don't understand why they haven't just DRM'd it somehow. Force you to log in each time you view it. Then if too many simultaneous people do, you blacklist that copy. I assume Acrobat has this ability.
Where is the profit from pro-actively preventing people from viewing the material?
Take it from Microsoft and the BSA: it's far better to let people pirate your warez at will. Then you can sue them for uppity "damages," greatly exceeding what people would have decided to pay if they had been forced to pay before they could use it.
Re:I try not to assume the worst of people (Score:5, Insightful)
Where is the profit from pro-actively preventing people from viewing the material?
They can trick some people into paying to view it.
Take it from Microsoft and the BSA: it's far better to let people pirate your warez at will. Then you can sue them for uppity "damages," greatly exceeding what people would have decided to pay if they had been forced to pay before they could use it.
That's exactly what they're planning. This "waterprinting" will just make their case easier by being able to show who shared it. People are more likely to pay up to avoid the embarrassment of going to court.
One way to find out... (Score:4, Insightful)
...some customer in California can demand a dump of their data from Elselvier under the CCPA (California Consumer Privacy Act) and see what the dump has on PDF records. If Elselvier don't show anything, then they become liable under CCPA should they attempt to link the embedded codes.
Re: (Score:2)
I didn't realize the Boy Scouts of America was that keen on making sure every copy of a merit badge pamphlet was genuine. Given the battered copies I used back in the day, those booklets couldn't have been much of a revenue stream!
Re: (Score:3)
Then if too many simultaneous people do, you blacklist that copy. I assume Acrobat has this ability.
Then you view it with Apple Preview instead. As a bonus, Preview doesn't have to be "updated" every time you use it.
Re:I try not to assume the worst of people (Score:5, Insightful)
Yes Elsevier is lying. Personally, I think if you don't like Elsevier's use policies you shouldn't do business with them, but they should simply tell their customers that they're fingerprinting documents to catch them if they're the source of pirated documents that end up on the Internet.
Lying shows disrespect to its customers, which I suppose is another reason not to do business with them if you can avoid it.
Re: I try not to assume the worst of people (Score:3)
Seems obvious that they want to go after people uploading to scihub.
Re: (Score:2)
This. Elsevier is in the business of selling access to documents. They don't want them shared by others. OTOH, there are some actual "open access" documents published by Elsevier, such as papers produced by federal government employees on work time which are legally in the public domain; if those are given the same "fingerprint" and tracking treatment, I'd have a problem with it. Finally, PDFs can be read by a variety of software, not just Adobe; might be interesting to know how, for instance, the Linux doc
Elsevier ransoms (Score:5, Insightful)
Right, exactly, so Elsevier know who to deploy their ransoms against. Elsevier has admitted that its business product is really a form of ransomware. Pay Elsevier the ransom for their wares or meet their lawyers.
Re: (Score:2)
Re:Elsevier ransoms (Score:4, Interesting)
1) Download PDF
2) Print PDF using "print as PDF"
3) Distribute new PDF
4) Profit!
Their scheme seems silly for anybody wanting to circumvent it. Maybe that's why they don't want people to know about it but I guess now the cat is out of the bag.
Re: (Score:2)
First you have to know that the thing is watermarked and all that stuff is necessary.
Even then it might not get rid of it. Postscript is a wonderfully convoluted thing.
Re: Elsevier ransoms (Score:3)
Re: (Score:2)
Magic pixels, yes. Otherwise, you could print it then scan it on somebody else's machine, or take a picture with your phone, and use essentially the analog hole to wipe out any hidden trackers. But magic pixels exist that can make a watermark survive that process.
Re: (Score:2)
Re: (Score:2)
True, but Elsevier is also wonderfully incompetent.
Re: (Score:2)
How else does marking documents for individuals help against ransomware?
ASME has done something similar for years (Score:2)
Re: (Score:2)
This sounds like something less obvious than that.
I gather that it's a unique and invisible watermark that you don't see when viewing the file but that can be read by someone who knows where to look for it.
Digital movies in commercial movie theatres have something similar. Each projector embeds a watermark into the video and audio stream that the audience can't see or hear but if someone records the movie off of the screen the movie company can tell what theatre it came from and what date and time it was r
Re:ASME has done something similar for years (Score:4, Insightful)
They wouldn't need to do anything so surreptitious. There are plenty of advanced metadata fields that are not obvious unless you search the properties of the document, including one named Custom that lets you add any data you wish to the metadata.
I'm assuming the tracking number was easy to spot when comparing the same article downloaded by two different people. For example, here is the metadata from a PDF on CAD/CAM history that I recently downloaded:
[x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.2.1-c043 52.372728, 2009/01/18-15:08:04 "]
[rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"]
[rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"]
[dc:format]application/pdf[/dc:format]
[dc:creator]
[rdf:Seq]
[rdf:li]Springer-SBM[/rdf:li]
[/rdf:Seq]
[/dc:creator]
[dc:title]
[rdf:Alt]
[rdf:li xml:lang="x-default"]sv-lncs[/rdf:li]
[/rdf:Alt]
[/dc:title]
[/rdf:Description]
[rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/"]
[xmp:CreateDate]2012-07-21T17:49:59+10:00[/xmp:CreateDate]
[xmp:CreatorTool]Microsoft Word 2010[/xmp:CreatorTool]
[xmp:ModifyDate]2012-07-22T13:57:45+10:00[/xmp:ModifyDate]
[xmp:MetadataDate]2012-07-22T13:57:45+10:00[/xmp:MetadataDate]
[/rdf:Description]
[rdf:Description rdf:about=""
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"]
[pdf:Producer]Microsoft Word 2010[/pdf:Producer]
[/rdf:Description]
[rdf:Description rdf:about=""
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"]
[xmpMM:DocumentID]uuid:bcec36f1-b156-44b5-9a90-d4de40f60b45[/xmpMM:DocumentID]
[xmpMM:InstanceID]uuid:81e81de0-09cd-42de-807f-a34af2833126[/xmpMM:InstanceID]
[/rdf:Description]
[/rdf:RDF]
[/x:xmpmeta]
Those last two fields, DocumentID and InstanceID would be very easy to generate during the download and tie to a specific user account. Unless two different people download the same PDF and compare the files closely, it isn't something the average person is going to associate with a tracking system.
--
(Bent brackets replaced with square brackets)
Re: (Score:2)
It's pretty fucking sad when we can't post proper code here because it "looks like ASCII art".
Re:ASME has done something similar for years (Score:4, Funny)
Re: (Score:2)
Re: ASME has done something similar for years (Score:2)
Re: (Score:3)
Elsevier owns the copyright for the specific layout. Everything else is owned by the authors, at least with all the Elsevier journals I've ever submitted to. Authors are welcome to do whatever they want with their non-Elsevier formatted documents, and Elsevier is actually reasonably open about that.
Actually, Elsevier's "Green" open access option, which is the free one that all their journals offer, says you can "post a version of your article in a repository after an embargo so others can access it freely"
Wrong (Score:4, Insightful)
Don't believe it. An identifier on a document completely ineffective at preventing anything, except the problem of not having an identifier on a document...
Re: (Score:2)
Add it to the list (Score:2, Funny)
It's for the children ...
2 weeks to flatten the curve
Check is in the mail
The dog at my homework
I have read the terms and conditions.
Re: (Score:2)
2 weeks to flatten the curve
Huh?
It's a 2020 thing.
Re: (Score:2)
Your privacy is important to us
Identify without PII (Score:5, Interesting)
"there is no metadata, PII [Personal Identifying Information] or personal data captured by these,"
Cool.
"Fingerprinting in PDFs allows us to identify potential sources of threats so we can inform our customers"
...I thought there was no PII to be able to identify the customers that need notifying.
Re: (Score:2)
You can do both.
You can fingerprint something by assigning it a serial number. The serial number is just an arbitrary number - it could be the total number of downloaded files, so having your document stamped #56828
Re: (Score:3)
I don't think the parent was questions how this could technically be done, he was highlighting that the two statements are contradictory.
The way I read between the lines here is that the evil vendor is merely reminding its customers that when they say fingerprinting "allows us to identify potential sources of threats", the threat they are referring to is the threat to their business of customers making their own copies, and when they say their purpose is to "inform our customers" they mean they are absolute
IEEE parallel (Score:3)
I think the message is pretty clear and nobody says this is to prevent ransomware. At any rate, itâ(TM)s probably better than a stealthy fingerprint.
Asking for a friend... (Score:2)
Re:Asking for a friend... (Score:4, Interesting)
No, not necessarily, depending on how it's implemented.
For example, a unique ID could be coded by simply scrambling the order of appearance of a series of elements necessary to the document. There are limitless ways to hide information inside PDFs, and no filter could possibly detect them all.
Re: (Score:2)
There are limitless ways to hide information inside PDFs, and no filter could possibly detect them all.
Or you could just compare the same journal from three different sources. That could easily tell you if there is something fishy going on with that document.
Re: (Score:2)
You can run a script that converts the pdf to image then run ocr back into pdf. I process some PDF with aggregated medical information (list of lab results for N people), and automate the process of sending each person a black-marker censored PDF with only their info. I do it with a bash script calling convert (imagemagick) for the conversion and black-marker modifications, then tesseract-ocr, then pdfunite (poppler) to build back the multiplage.
Conversion to png is full guarantee I did not miss metadata, h
Re: (Score:2)
This particular mark? Yes. Actually, running minuimus.pl with --discard-meta would probably do it, though I'd want to check.
But now this mark has been caught, I expect a new and stealthier one is in the works. There are so very many places you can hide information within a PDF file quite easily.
Re: (Score:2)
That works with Microsoft docx files, but not PDFs. The file structure of PDF files is completely different. You can see for yourself very easily by opening a PDF in a text editor and looking at the document. There are plenty of human readable sections, xml file structures, and headings inside that are completely different from a ZIP file.
Re: (Score:2)
I have studied the PDF specification in detail. It's a horror. Seven iterations of patching and awkwardly squeezed-in extensions that try to maintain backwards compatibility. It's up to Lovecroftian levels - all who read it risk madness.
Re: (Score:2)
You left out "steal underpants".
Re: (Score:2)
Re: (Score:2)
PDFs do not work that way.
Re: (Score:2)
Or just download it off sci-hub
Re: (Score:2)
Elsevier is lying? What a surprise.... (Score:2)
And lying badly. Also not a surprise. The time for scientific "publishers" is over. They offer nothing of value. Printing journals is past and the real value is added by the reviewers. These do not get paid though, so they can review on a public-access server just as well. Publishing via scientific publishers means you get less exposure, less reads and you often have to pay to get published in addition.
OK (Score:2)
If they want to surveil the studies I read, I'm ok with that. I encourage it, they'll learn a whole bunch.
Re: (Score:2)
They can only know your IP and timestamp when you download. And it is fine (even required by law) to log IP and timestamp of the server-client interactions. Also, it's Elsevier, which is located in Amsterdam for over a century, so there is no doubt they made it in a way that is compatible with EU laws.
PDF Printer (Score:2)
Does running a PDF through a PDF printer strip this kind of data from the document?
Re: (Score:2)
how insensitive (Score:2)
Screw driver is meant to tighten and loosen screws (Score:2)
The fact that is often used as a Pry, Leaver, Scraper, Probe, Hole Puncher, Awl... Never mind that reinforced handle that can take hammer blows like a Chisel, is not relevant to the fact that its primary job is to tighten and loosen screws.
Strip out the fingerprint (Score:2)
It's Elsevier, what did you expect? (Score:2)
Elsevier is a parasitic leech that has attached itself to science, and needs to be destroyed. I actively try to avoid publishing my work in Elsevier journals (or Springer, Nature etc.). Unfortunately, this cancer has infested many formerly great journals.
Re: (Score:2)
Scientific publishers don't do anything (Score:2)
Here's the problem. In days of yore, scientific papers were written out by hand. Someone had to type them in, including special formulas and notations that only specifically trained and expensive technical typists were able to do accurately. Draftsmen had to draw any required illustrations. Copies had to be printed on expensive machines, distributed for review, amended, corrected, and finally printed. This was a very expensive process, with very small press runs, and journal publishers charged a lot,
Do 2 conversions remove it? (Score:2)
PDF to Word and print a new PDF?
Re: (Score:2)