Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments? 238
First time accepted submitter supachupa writes "It seems the past couple of years that spearfishing is getting very convincing and it is becoming more and more likely someone (including myself) will accidentally click on a PDF attachment with malicious javascript embedded. It would be impossible to block PDFs as they are required for business. We do disable javascript on Adobe reader, but I would sleep a lot better knowing the code is removed completely. I have looked high and low but could not find a cheap out of the box solution or a 'how to' guide for automatically neutralizing PDFs by stripping out the javascript. The closest thing I could find is using PDF2PS and then reversing the process with PS2PDF. Does anyone know of a solution for this that is not too complex, works preferably at the SMTP relay, and can work with ZIPed PDFs as well, or have some common sense advice for dealing with this so that once its in place, there is no further action required by myself or by users."
Foxit Reader? (Score:5, Informative)
As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.
Re:Foxit Reader? (Score:4, Informative)
dang...I was about to say the same...
but yea...best way to sanitize is by not using Adobe Acrobat (or Acrobat Reader).
on OSX and many Linux distros have their own builtin viewer ("Preview" in OSX, and "Display" at least on Ubuntu).
Also, you can probably use Google Apps to do the same as well.
Print to PDF (Score:5, Informative)
The way I'd do it is to create a dummy printer driver that just writes to a file. Print the PDF to the dummy printer, which in turn creates a new PDF without all the junk.
Be careful modifying documents (Score:5, Informative)
You can change the legality of a document for example by modifying it.
A solution that modifies the PDF viewer is much better than one that alters the document. That means not using Adobe. Pity the company refuses to build a version that doesn't do Javascript in the first place.
Re:Print to PDF (Score:5, Informative)
Like
lpr -P Cups-PDF file.pdf
Use sandboxie (Score:1, Informative)
Re:Rasterize and reencapsulate (Score:4, Informative)
If you rasterize and re-encapsulate your user's PDF attachments, your users will hate you, and work around your "stupid filter that breaks pdf attachments". You are better off blocking all PDF attachments by email. It'll save yourself a ton of work, and your users can skip the frustration of mangled attachments and go directly to working around your filter.
Re:Why are you doing this? (Score:5, Informative)
Signed PDFs can be read in any reader, but the signature will be still validated (if the reader is not defective.) Encrypted PDFs will not be even readable if they are not encrypted to you. Password-protected PDFs may require the password to be readable, let alone printable or changeable.
In other words, PDFs are not designed for wanton modification. Some of them can be modified, but others cannot. This means that you cannot build a reliable method for converting suspect PDFs into safe PDFs.
Re:Be careful modifying documents (Score:5, Informative)
I believe that for a PDF document to be a legal document, it needs to be in PDF/A format. This format prohibits the use executable code, such as Javascript.
Re:Be careful modifying documents (Score:5, Informative)
I believe that for a PDF document to be a legal document, it needs to be in PDF/A format.
Where does this belief comes from?
Many states have legislation regarding the font, margins and paper sizes used for some legal documents.
US courts, archivists and many case management / COPS systems only accept documents in PDF/A.
acrobat reader sanitized 100% (Score:5, Informative)
In the install tree find the file JSByteCodeWin.bin and rename it. Works for me.
Summary (Score:4, Informative)
I did not know it was possible to detect javascript in a PDF, and I think this is possibly a better approach than a full rewrite (btw: I found this python script: http://blog.didierstevens.com/programs/pdf-tools/ [didierstevens.com] ) So instead of rewriting every PDF, you just choose to delete any PDF attachments that are detected with JavaScript. I assume this will then not break any legitimate PDFs that have comments or forms, etc? It will need testing, I guess.
The mail relay can then be configured to detect and delete any javascript-containing PDFs and allow everything else through (including encrypted, which is more likely to be legit than not). Once again, this is not the only protection against this malicious code, but just one facet. I found some recent exploits that don't need javascript at all, so it seems the safest, yet most likely to make you hated, approach is to rewrite the PDF completely or not allow PDFs at all.
Ghostscript (Score:5, Informative)
I use Ghostscript when attempting to compress a "bloated" PDF (such as generated by Xsane). The input is a PDF, output is a PDF:
# Use ghostscript to re-write the PDF
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf
Also handy to combine multiple PDFs into a single document, or copy out certain pages from a PDF:
# Combine PDFs
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf 01.pdf 02.pdf 03.pdf
# Copy pages 3 & 4 from an existing PDF
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -sOutputFile=new.pdf current.pdf
Re:Print to PDF (Score:4, Informative)
Re:Foxit Reader? (Score:5, Informative)
I run a ghostscript shell script to print a PDF as a new PDF:
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=NEW_FILE.pdf -dBATCH OLD_FILE_1.pdf OLD_FILE_2.pdf
In this case OLD_FILE_1.pdf and OLD_FILE_2.pdf will be combined into NEW_FILE.pdf. AFAIK this strips javascript.