Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?

Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments? 238

First time accepted submitter supachupa writes "It seems the past couple of years that spearfishing is getting very convincing and it is becoming more and more likely someone (including myself) will accidentally click on a PDF attachment with malicious javascript embedded. It would be impossible to block PDFs as they are required for business. We do disable javascript on Adobe reader, but I would sleep a lot better knowing the code is removed completely. I have looked high and low but could not find a cheap out of the box solution or a 'how to' guide for automatically neutralizing PDFs by stripping out the javascript. The closest thing I could find is using PDF2PS and then reversing the process with PS2PDF. Does anyone know of a solution for this that is not too complex, works preferably at the SMTP relay, and can work with ZIPed PDFs as well, or have some common sense advice for dealing with this so that once its in place, there is no further action required by myself or by users."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: How Do You Automatically Sanitize PDF Email Attachments?

Comments Filter:
  • Foxit Reader? (Score:5, Informative)

    by Anonymous Coward on Wednesday July 17, 2013 @10:06PM (#44314153)

    As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.

    • Re:Foxit Reader? (Score:4, Informative)

      by MoFoQ ( 584566 ) on Wednesday July 17, 2013 @10:09PM (#44314167)

      dang...I was about to say the same...

      but way to sanitize is by not using Adobe Acrobat (or Acrobat Reader).

      on OSX and many Linux distros have their own builtin viewer ("Preview" in OSX, and "Display" at least on Ubuntu).

      Also, you can probably use Google Apps to do the same as well.

      • Re:Foxit Reader? (Score:5, Insightful)

        by fuzzyfuzzyfungus ( 1223518 ) on Wednesday July 17, 2013 @11:14PM (#44314533) Journal

        That isn't really 'sanitizing', though: It's certainly good that you practice safe text on your computer; but if you are the mailserver guy, and may or may not have as much control as you'd like over the users and their filthy, weatherbug-encrusted, systems, you want to modify the file such that it no longer contains a potential payload, not merely use a reader that doesn't execute payloads.

        • perhaps you could sanitize their systems by preventing them from running Adobe products.
          • Corporate policy could enforce an alternative PDF reader. And everyone would be happy as PDF viewing would be a much nicer, faster, experience.

        • by bwcbwc ( 601780 )

          Yeah, but if your sanitizing is defective in some way and you load into Adobe Reader, the remaining JS will still execute. With a reader that is incapable of running javascript, it doesn't matter.

          On the flip side, if you don't sanitize the JS and pass the file along to an unsuspecting 3rd party, they may get infected. So the best option seems to be to do both: try to strip JS from the files and use a reader that doesn't parse JS.

        • Re:Foxit Reader? (Score:5, Informative)

          by Mashdar ( 876825 ) on Thursday July 18, 2013 @09:25AM (#44316733)

          I run a ghostscript shell script to print a PDF as a new PDF:

          gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=NEW_FILE.pdf -dBATCH OLD_FILE_1.pdf OLD_FILE_2.pdf

          In this case OLD_FILE_1.pdf and OLD_FILE_2.pdf will be combined into NEW_FILE.pdf. AFAIK this strips javascript.

    • Nitro and foxit now display JavaScript. You can disable it in options. Unless you want to use old versions, your best bet is sumantraPDF.

    • As far as I know, Foxit Reader strips out any JavaScript. The PDF readers in Chrome and Firefox also should do the same.

      That ony prevents it running on the machine used to view it - it's still in the PDF. The best way is to either insist on PDF/X or convert to it. PDF/X does not allow active content such as scripting, etc.

  • Print to PDF (Score:5, Informative)

    by digitalhermit ( 113459 ) on Wednesday July 17, 2013 @10:09PM (#44314169) Homepage

    The way I'd do it is to create a dummy printer driver that just writes to a file. Print the PDF to the dummy printer, which in turn creates a new PDF without all the junk.

    • Re:Print to PDF (Score:5, Informative)

      by Kludge ( 13653 ) on Wednesday July 17, 2013 @10:23PM (#44314245)

      lpr -P Cups-PDF file.pdf

    • Re:Print to PDF (Score:5, Interesting)

      by DJ Jones ( 997846 ) on Wednesday July 17, 2013 @10:57PM (#44314453) Homepage
      Sadly a lot of PDF printers will retain javascript code even if you print it and re-assemble it back into a PDF. The problem lies in the fact that Adobe allows javascript to be embedded inside image objects and compressed blocks of PDF binary. It's not as simple as opening the file and stripping out anything that starts with <script>. Code can be fired on almost any user event and it can be attached to almost any high-level object. It's not impossible to create a scrubber but it's a lot more complicated than you might think.

      I spent the better part of a week attempting to create a PDF scrubber at my office for this same reason. We had become victim to highly targeted attacks from PDF sources. I wrote a scrubber in PHP using an open-source PDF parser [] and a series of regular expressions to strip out any javascript. At the end of the day, I came very close to a working solution but I ran into issues with encrypted PDF's.

      The project was shelved in favor of making users open all external PDF's on a virtual server that was hardened and re-imaged every evening to prevent any malicious code from running rampant. That's the simplest solution.
  • by Anonymous Coward on Wednesday July 17, 2013 @10:14PM (#44314191)

    You can change the legality of a document for example by modifying it.

    A solution that modifies the PDF viewer is much better than one that alters the document. That means not using Adobe. Pity the company refuses to build a version that doesn't do Javascript in the first place.

  • Sumatra PDF (Score:5, Insightful)

    by shellster_dude ( 1261444 ) on Wednesday July 17, 2013 @10:17PM (#44314201)
    Check out Sumatrapdf []. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.
    • Check out Sumatrapdf []. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.

      Is it vulnerable to font description overloading and the other PDF exploits out there? A large portion of the malicious PDFs I've seen lately didn't use forms or javascript containers as the main attack vector (usually shellcode via some markup bug).

    • by mjwx ( 966435 )

      Check out Sumatrapdf []. It's super fast and does not support javascript or actionscript in PDF's. I use it exclusively now.

      Sumatra PDF used to be light, a mere 800 KB, I just installed the latest version, a whopping 3.6 MB. It's suffering the same form of featuritis as the other PDF readers I dumped because they became slow and unwieldy (Adobe and Foxit).

  • The best way to protect your computer from malicious Javascript embedded within a PDF is to not install Adobe Reader. If you cannot open the file, your computer cannot be infected.

    • by plover ( 150551 )

      That's almost 100% correct. The problems could potentially be infecting a document previewed or even the search indexer, though. There have been successful attacks on Windows taking advantage of the JPEG previewer as well as WMF, TTF, and others.

      I don't know of any such successful attacks on Windpws 7 or higher. Doesn't prove they're impossible, just that they haven't been encountered yet.

    • by sjwt ( 161428 )

      Nuke them from orbit--it’s the only way to be sure.

  • javascript? (Score:5, Insightful)

    by sjames ( 1099 ) on Wednesday July 17, 2013 @10:18PM (#44314209) Homepage Journal

    Why in the world is javascript included in PDF documents? PDF is already a Forth like programming language and environment.

    • by jbolden ( 176878 )

      I think you are thinking PostScript. PDF requires that all computations resolve to a well defined value based on information contained within the document (i.e. not turning complete). So then of course Adobe had to add a turing complete language back in.

      • by sjames ( 1099 )

        I am more familiar with PostScript (by way of Forth (Warnock can claim they're unrelated all he wants). Looking further, I see that PDF strips out flow control. Perhaps they should just put it back?

        • by jbolden ( 176878 )

          Well the idea of PDF was to avoid indefinite resolution, inconsistent RIP times.... I think it makes more sense to keep it out but I'd say really the question is, "why not bring back Postscript for more complex documents"?

      • Re:javascript? (Score:4, Interesting)

        by fuzzyfuzzyfungus ( 1223518 ) on Wednesday July 17, 2013 @11:47PM (#44314677) Journal

        I think you are thinking PostScript. PDF requires that all computations resolve to a well defined value based on information contained within the document (i.e. not turning complete). So then of course Adobe had to add a turing complete language back in.

        I don't know if any implementations are stupid enough to implement this(at least without some very careful sanitizing); but(in addition to ramming in javascript and the ability to embed basically anything at all, thanks for nothing 'rich media annotations'), they even added: Launch Actions!

        " Launch Actions
        A launch action launches an application or opens or prints a document. Table 203 shows the action dictionary
        entries specific to this type of action.
        The optional Win, Mac, and Unix entries allow the action dictionary to include platform-specific parameters for
        launching the designated application. If no such entry is present for the given platform, the F entry shall be
        used instead. Table 203 shows the platform-specific launch parameters for the Windows platform. Parameters
        for the Mac OS and UNIX platforms are not yet defined at the time of publication."

        Your Standards Compliant Solution for executing arbitrary binaries with arbitrary parameters. No need for messy, version-sensitive, exploit code! Combine with javacript and web-interaction support to build documents that search the target's hard drive for interesting things upon being opened... Or(miracle of miracles!) build a PDF that runs the adobe update utility when you open it, you're sure to find something new every time!

        • by mysidia ( 191772 )

          Will the next version of reader come complete with a PWN ME sign hung automatically around the back of your computer, and included in the Acrobat plugin version string announced by the web browser in the User Agent headers towards every site you visit?

        • by jbolden ( 176878 )

          Wow! .pdfs become binaries. Talk about a glaring security hole.

    • Because in the post-Microsoft world, there is no separation between code and data. Where have you been since 1991?

    • by jimicus ( 737525 )

      Because Adobe PDF Reader is not - and hasn't been for some years, if ever - a plain PDF viewer. It's not meant to be.

      It's a fat-client application for a whole lot of other products that need to produce interactive forms that require both the presentation to the user and the end result to look predictable.

      99 times out of 100, nobody needs this functionality. But for that 1 time out of 100 where it is needed, Adobe's salesmen have a nice easy job: "No obscure third-party software required by your clients in o

  • by 140Mandak262Jamuna ( 970587 ) on Wednesday July 17, 2013 @10:24PM (#44314249) Journal
    Why don't you sanitize the reader? Use a reader with javascript ignored. Or build one from whatever open source pdf reader you can find, if there isn't one already. Or run the pdf reader inside a sandbox without internet access or permanent disk write. If that breaks the portability and the documents don't render correctly when javascript is diabled, tell the sender and blacklist the sender too for good measure. If enough companies lock javascript out of pdf documents eventually the authoring tools will stop using it.
    • The submitter is looking for a code-based solution to a sociological/psychological problem, and it's just not going to be effective.

      The real solution is to educate and train your users so they don't fall prey to these sorts of attacks. I know a lot of IT people aren't comfortable dealing with people, and I know it takes quite a bit of time and doesn't look as snazzy on your résumé - but, really, it's the best long-term approach.

      • by dbIII ( 701233 )
        That works for a while, then some guy gets employed to be in the middle of the stack that doesn't know any better and tries to change the "culture" and throw off the restrictive yoke of the IT people. All those senseless rules like one software licence per install and not using warez have to go!
      • The real solution is to educate and train your users

        I am intreagued by your solution, and would like to subscribe to your magazine.

  • by Kardos ( 1348077 ) on Wednesday July 17, 2013 @10:25PM (#44314261)

    You don't need a solution that rewrites the PDF. At best it will work correctly "most of the time", and break PDFs the rest of the time. For example, pdf->ps->pdf, or the "print to pdf" solution mentioned earlier in the comments may work fine for scanned PDFs, but if there are annotations/comments then they'll get stripped. This will lead to massive user frustration ("but the comments are there, I sent it in the last email") and people having to find ways to work around your filter. Modifying people's attachments is a bad move. A more reasonable solution is to detect if the PDF contains any javascript code, and if it does, block the PDF entirely.

  • The problem is that PDF and the PostScript used in it is an executable language. This falls under "executable code in non-executable containers". If you need to be sure, convert the PDF to a series of JPG or GIF pictures and recreate a PDF from them. With any less harsh approach, you may retain malicious PostScript (and other) code.

    And, yes, what you are trying to do is non-trivial. Expect anything "simple" will be insecure.

    • by Kardos ( 1348077 ) on Wednesday July 17, 2013 @10:37PM (#44314355)

      If you rasterize and re-encapsulate your user's PDF attachments, your users will hate you, and work around your "stupid filter that breaks pdf attachments". You are better off blocking all PDF attachments by email. It'll save yourself a ton of work, and your users can skip the frustration of mangled attachments and go directly to working around your filter.

      • by gweihir ( 88907 )

        Your problem only applies if the PDFs have to be editable or if you rasterize with too low or too high resolution. You can also run the images through OCR to get back come level of editability.

        Otherwise you have work with possibly infected PDF. There are a few settings where that is not acceptable and users will not work around it (e.g. "you infect this system and then it turns out you where not following procedure, you go to prison for a few years"-environments.)

        While I agree that security should not hinde

        • by Kardos ( 1348077 )

          If users will be fired/jailed for working around a PDF mangling filter, the solution is to ban all PDFs, not mangle them and expect the users to keep doing their jobs. Permit raster image attachments, not PDFs.

          • by gweihir ( 88907 )

            And what if these users have to work with things sent in from the outside world? Fail!

  • You don't (Score:5, Interesting)

    by PNutts ( 199112 ) on Wednesday July 17, 2013 @10:33PM (#44314319)

    At some point you trust technology and also reinforce proper user behavior. I hate catch-phrases but your e-mail hygiene should have layers of protection (defense in depth). Assuming that the message got through IP reputation filters, SPAM analysis, malware scans, and was delivered to your user, you rely on desktop protection and cross your fingers that nobody opens it.

    We have SMTP appliances from Axway and we used to stop all executable attachments and deliver a notification to the user to call the help desk and request a release. Times changed and we don't do that any more. However, you could annotate the message to remind the user that if they don't know who it's from or what it is or if they weren't expecting it to not open it. And some will anyway. We also used to hold certain attachments for four hours until the virus definitions (and the other defenses) received a couple of updates and then reprocess the message.

    If you do try to roll your own, be aware that everyone and their dog creates PDF files with varying degrees of success and we had certain PDF files that caused services to fail on our gateway while they tried to scan and process them. You didn't mention the volume but make sure your solution scales well.

    • by King_TJ ( 85913 )

      I'm thinking along the same lines here.... I can't say that I've really seen Javascript embedded PDFs as much of an attack vector where I work. By and large, your Mac OS X users wouldn't encounter this anyway, since they generally use the "Preview" app that's part of the OS to view and print PDFs. Adobe Reader is usually rather pointless to install in OS X, since Preview renders pages far faster anyway AND gives the ability to do things like add signatures to a document, re-order the pages and annotate,

    • by dbIII ( 701233 )
      It's the exceptions that stuff everything up. If you have staff members that require that email is the way that they are going to send executable files around, and not any of the million other options from FTP onwards, about the best you can do is force them to zip it up. Then if you combine that with a requirement from some staff that they MUST use MS Outlook (and in one memorable case, even now in 2013 on Win7 the utter fucking virus magnet MS Outlook Express), you've then got another weakness with the
  • Evince is the PDF/Documents viewer for gnome, It also gets compiled for windows.

    In the linux world, its a heavy weight gnome app.(compared to e/x pdf), but its far far far lighter than Adobe Acrobate Reader, and it doesn't do javascript at all. I've yet to come accross issues with PDFs not working, as most legimiate PDFs don't use javascript.

    It also comes from a long standing respected open source project, GNOME,(read comparable quality as commericial software,), not a drive by night freeware operation of d
  • by king neckbeard ( 1801738 ) on Wednesday July 17, 2013 @11:05PM (#44314497)
    If you use anything but Adobe, it probably won't support javascript because it's fucking stupid to have javascript in a PDF. Just avoid Adobe, because they are allergic to security.
  • Test the Attachments (Score:4, Interesting)

    by Flere Imsaho ( 786612 ) on Wednesday July 17, 2013 @11:21PM (#44314567)

    There's a couple of vendors (and many more playing catch-up) selling appliances that detonate attachments on sandboxed VMs running in fast virtual memory.
    They executed/open attachments and watch to see what happens - registry changes, file drops, network activity, attempts to contact known C&C servers, etc.
    Anything that exhibits non-legit behavior get quarantined. FireEye have a box that does this and also crawls network shares, testing files.

    Aside from whitelisting, I think it's the best defense against zero day malware. It's a little too pricy for the company I work at right now, but as more vendors add this functionality, the price will come down.

  • You Don't (Score:5, Interesting)

    by SuperCharlie ( 1068072 ) on Wednesday July 17, 2013 @11:44PM (#44314665)
    For a long time, I thought like you, that it was my duty to ward off and protect the "children". After a while, you realize 2 things.

    First, it is most likely your duty to inform and educate. Do that. Do it well, do it loud, and do it as often as you can. When someone eventually opens up one of those attachments, it will get around, and peer pressure will make everyone else gun-shy. After a user or two of mine got bit by an attachment, and I had repeatedly warned my users about these things.. I ended up with people at my desk occasionally asking..can you come look at this.. it just looks funny.. it was all about the peer pressure and not wanting to be That Guy who clicked the stupid link.

    Second, and I hate to say it, this is what we do, and this is job security. You can't save em all Hasselhoff, if ya did, there would be nothing left to do..
  • Learn the file format and write a program to strip out any executable script elements. [].

  • by Skapare ( 16644 ) on Wednesday July 17, 2013 @11:58PM (#44314719) Homepage

    Javascript should not be given the capability of doing damaging things, It should be confined to a narrow execution context that is limited to being able to do only the things that enhance the experience of that ONE information resource. Dynamic layout is certainly a useful thing. Dynamically changing your system is not. It should not have access. I blame the developers. It doesn't matter if it is mail or web. It might do cute things inside a PDF like give you a calculator for a certain algorithm the PDF is written about. But it should not be able to access even /etc/hosts on your computer.

    • You are not the only one who thinks this, the problem is things designed this way are rarely as secure as the designer wishes. See for example, Java Applet exploits.
  • by fermat1313 ( 927331 ) on Thursday July 18, 2013 @12:28AM (#44314817)

    Lots of people here saying "Don't use Adobe" and suggesting alternatives. Reality is, for many of us, we deal with complex PDF forms and applications that integrate directly with Adobe Acrobat. In my business (CPA firm) we use lots of applications, and most of them are highly vertical with often just one realistic competitor that can function adequately for a firm our size. Many of our apps integrate directly with Acrobat (and Office) so not using Acrobat simply isn't a choice we can make.

    So how do we deal with Adobe Acrobat? As some pointed out earlier, defense in depth. Spam filters, multiple virus scans, and our two most important measures: End users don't have admin on their computers and Adobe is one of our "High Priority" upgrade applications. Updates must be pushed out within one day of being released.

    BTW, the other other High priority apps are Java and Flash, again, both required by our software. With Acrobat, they make up my "Axis of Evil" of insecure software.

    • by jgrahn ( 181062 )

      Lots of people here saying "Don't use Adobe" and suggesting alternatives. Reality is, for many of us, we deal with complex PDF forms and applications that integrate directly with Adobe Acrobat. In my business [---] "Axis of Evil" of insecure software.

      That seems like an accurate, believable description of a common situation. I just wish I'd some time see someone *try to get out* of a lock-in situation like that. Or try to avoid creating more such situations. It has been well-known for decades that you can end up there, and yet organizations still plunge in, head first, all the time.

      (Note that the lock-in isn't just about paying $$$ to the vendor indefinitely. It also means your data is cut off from the rest of the ecosystem; you can't benefit from in

  • Seriously, why do people still run acrobat? PDF is a standard format, there are countless programs which support it and the only reason such files are a target is because adobe reader is basically a monoculture and represents a very large and attractive target. We need diversity among PDF readers, just like diversity among web browsers. It was diversity among web browsers more than anything else that reduced browser attacks and caused hackers to concentrate on proprietary monoculture plugins instead.

  • Print then delete the file.

  • by jjohn_h ( 674302 ) on Thursday July 18, 2013 @02:43AM (#44315235)

    In the install tree find the file JSByteCodeWin.bin and rename it. Works for me.

    • by Kythe ( 4779 )
      Glad to see this got modded up. The general conclusions of the comments here are that it's impossible to sanitize 100% of Adobe PDF files while retaining needed functionality. If you have to use Acrobat Reader, you could do a lot worse than removing the capability to execute JS at the code level (rather than settings). I suppose you'll need to prevent uncontrolled updating of the software, as well.
  • I think all the people commiting suicide at their Seattle office might be getting to them.

    Their Seattle office is right under the Aurora bridge, popular with jumpers...

  • Summary (Score:4, Informative)

    by supachupa ( 823309 ) on Thursday July 18, 2013 @03:10AM (#44315321)
    So the vast majority of people are recommending to ditch Adobe Acrobat, which is not where I was wanting to focus the discussion, but I appreciate your advice. I do agree that using something like Sumatra would be a good part of a defense-in-depth approach, but that approach does not protect your organisation from inadvertently sending out an infected PDF to another organisation.

    I did not know it was possible to detect javascript in a PDF, and I think this is possibly a better approach than a full rewrite (btw: I found this python script: [] ) So instead of rewriting every PDF, you just choose to delete any PDF attachments that are detected with JavaScript. I assume this will then not break any legitimate PDFs that have comments or forms, etc? It will need testing, I guess.

    The mail relay can then be configured to detect and delete any javascript-containing PDFs and allow everything else through (including encrypted, which is more likely to be legit than not). Once again, this is not the only protection against this malicious code, but just one facet. I found some recent exploits that don't need javascript at all, so it seems the safest, yet most likely to make you hated, approach is to rewrite the PDF completely or not allow PDFs at all.

  • Ghostscript (Score:5, Informative)

    by nullchar ( 446050 ) on Thursday July 18, 2013 @03:29AM (#44315383)

    I use Ghostscript when attempting to compress a "bloated" PDF (such as generated by Xsane). The input is a PDF, output is a PDF:

    # Use ghostscript to re-write the PDF
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf

    Also handy to combine multiple PDFs into a single document, or copy out certain pages from a PDF:

    # Combine PDFs
    gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=combined.pdf 01.pdf 02.pdf 03.pdf

    # Copy pages 3 & 4 from an existing PDF
    gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=4 -sOutputFile=new.pdf current.pdf

  • CCC (Score:4, Interesting)

    by Warbothong ( 905464 ) on Thursday July 18, 2013 @11:38AM (#44318183) Homepage

    There's an interesting talk from Chaos Communication Camp 2011 about making a verified PDF scanner in the Coq proof assistant: []

FORTUNE'S FUN FACTS TO KNOW AND TELL: A giant panda bear is really a member of the racoon family.