Pdf.js Reaches First Milestone 164
theweatherelectric writes "The pdf.js project aims to implement a PDF viewer using standards-compliant Web technologies. The project has reached its first milestone: it renders the sample PDF (a paper on Mozilla's Tracemonkey JavaScript engine) perfectly. However, that perfection currently comes with some caveats: 'pdf.js produces different results on pretty much every element in the browser×OS matrix. We said above that pdf.js renders the Tracemonkey paper "perfectly" if you're running a Firefox nightly. On a Windows 7 machine where Firefox can use Direct2D and DirectWrite. If you ignore what appears to be a bug in DirectWrite's font hinting. The paper is rendered less well on other platforms and in older Firefoxen, and even worse in other browsers. But such is life on the bleeding edge of the web platform.'"
Why all the complaining? (Score:1)
goal to make things suck? (Score:3, Insightful)
I can understand the use of this to find and fix browser bugs.
But it seems amazingly inferior to a platform native PDF reader, on any platform imaginable. It will be slower the native x86/ARM code by far, and won't integrate well with the desktop environment.
What's with this trend recently to build everything on fundamentally sucky technologies?
Re: (Score:3)
What's with this trend recently to build everything on fundamentally sucky technologies?
I think it's becoming increasingly obvious that browsers need something that allows native client functionality without the burden of shoe horning everything through Javascript's loosely typed, garbage collectioned, non addressable world. LLVM is gaining a lot of steam so perhaps it should be that with each app seeing a limited API that maps out onto the DOM. Perhaps that can even be created from JS, e.g. an vmEval(url, canvas) function that loads bitcode from some url, turns it into an invokable object wh
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I'm suggesting that there should be provision for LLVM bitcode to be compiled and executed natively in the browser. It's only interaction with the outside world is via exposed DOM apis which are already security hardened. A canvas would be its "display", it sockets would map to websockets, it's file io to web storage and so on. The display could
Re: (Score:2)
Does your PDF reader integrate well with the browser environment?
One of the major benefits of rendering PDFs in the browser, aside from the fact that users don't have to download, trust, and run a separate PDF viewer, is that you reduce the security vulnerability surface area. PDFs (well, Adobe Reader) is a major vector for attacks, but that goes away when you sandbox it in the browser.
I think you might
Re: (Score:3)
If it's a vulnerability thing, then what you really need to do is go over to Adobe and bitch smack the moron there that decided that it was a good idea to include scripting and linking abilities into a document format. And if you choose the Seattle branch you're just a short ways from MS so you can bitch slap the hell out of them for doing the same sort of bullshit with .DOC.
Documents are for reading, if you want people to be able to fill in a form, then they should have to use a separate program. It's just
Re: (Score:2)
Oddly enough, PDF (a descendant of PostScript) has always been more program than data. Like PostScript, a pdf document is a program in a Forth like language that draws the document on a canvas. Adobe's mistake is in letting it out of the sandbox with bolted on extras.
A number of others have managed to implement PDF sandboxes (often without the bolt-ons) without all the holes.
Re:goal to make things suck? (Score:4, Interesting)
But it seems amazingly inferior to a platform native PDF reader, on any platform imaginable. It will be slower the native x86/ARM code by far, and won't integrate well with the desktop environment.
What's with this trend recently to build everything on fundamentally sucky technologies?
You're absolutely right. A platform native PDF reader is technically superior. But opening up a new window for each PDF you display really sucks as a user experience. To eliminate this sucky UI experience, browsers support PDF natively (I'm not sure why this hasn't happened), and not rely on Adobe reader, or some other helper application. Even if all the major browsers supported that TODAY, it would be literally years before a broad enough spectrum of people upgraded to use inline PDFs in a design.
What implementing a PDF reader in javascript accomplishes is across the board inline PDFs today. No upgrades required. I think that's worth some sucky technology and inefficient code.
Re: (Score:2)
So are you asserting that a browser window with a PDF document is somehow worse than a PDF viewer's window? Or that a tab is better than a window?
Re: (Score:2)
I'm saying none of that. I'm saying that sometimes it would be very useful to display a PDF inline in the same page, and not have it displayed in another window, or tab. Another poster pointed out this is already possible. I'm not familiar with how well this works, and the limitations of this method. I will say that being able to treat a PDF like any other object and have it be manipulated programatically would be a huge advantage for some people.
Re: (Score:2)
It would be nice if text-based PDF's interacted with my browser the same way text-based HTML does. So find, save as and other functions would be browser native rather than the kooky half-breed we have today (with pdf reader stuffed into a browser tab for some docs and mysteriously for other pdf docs a new pop up window with native pdf reader).
Re: (Score:2)
But opening up a new window for each PDF you display really sucks as a user experience.
Having "defected" from Win XP to Mac OS X back when Vista was released, it's been many years since I used Windows or Linux for long periods of time, rather than temporarily in VMs for work purposes. Now and then, stories like this, or even entire pieces of technology like the renderer in question, remind me just how awful things still are on other platforms.
Another poster asked if the PDF renderer was integrated into the browser, rather than the OS. What a bizarre question. My PDF renderer is integrated
Re: (Score:2)
It's not a kludge, it's not a bodged add-on, it's an extensible, intelligent, well integrated piece of technology that's part of a wider architecture that makes more sense than any other OS architecture I've seen above kernel level.
Indeed. I came from a long line of AmigaOS based systems, after which I finally "gave up" and got an XP based laptop after being forced through Win2k at work. That drove me mad for awhile and after significant playing with Desktop Linux (just never feels "right" to me... very happy with it on my servers, but not my dekstop), now I've got mostly Macs in the house.
The system you're referring to here is indeed very simple and elegant. It reminds me a lot of what AmigaOS did with the "datatype" system - Appl
Re: (Score:2)
> But opening up a new window for each PDF you display really sucks as a user experience.
My browser can show PDF in tabs.
But I almost always want PDF-s to open in a new window, full-screen.
Why would I want to read a 200 page report inside a browser?
The same goes for video, I usually want to view it full-screen not embedded in some page in a browser.
Re: (Score:2)
I actually didn't know you could already embed a PDF in a page. I'd guess you can't create your own controls to move to a different page, or manipulate the PDF in other ways, and are reliant upon the helper application.
The point being, viewing a PDF with programmatic controls would allow for a much richer environment than relying on helper applications.
Re: (Score:2)
But it seems amazingly inferior to a platform native PDF reader, on any platform imaginable. It will be slower the native x86/ARM code by far, and won't integrate well with the desktop environment.
Regarding speed, two things: First, this will spend most of its time in calls to the browser's Canvas API, which all browsers implement in C++. So it isn't clear that it should be significantly slower than a native implementation. Second, even if this were in 100% JavaScript, that is just around 5X slower than C++ these days. Rendering PDFs might be plenty fast enough at that speed, since you typically render once then show it for a long time. In other words, this isn't something like a game engine that nee
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Have you tried it? It's fast.
One interesting fact that wasn't called out in the blog post is that since Gecko's is GPU-accelerated on Windows 7, on that platform pdf.js is GPU-accelerated, unlike every other PDF viewer I know of.
Browser developers are doing tons of work to make graphics and JS incredibly fast. pdf.js leverages that.
Re: (Score:2)
It is the PDF language that matters, which is basically a successor to Postscript, not the bloated document reader.
Re: (Score:2)
It's actually rather fast here, as fast as the native reader.
Somehow, its not as smooth in Chrome as it is in Firefox however.
Can it be closed? (Score:3)
I currently have PDFs set to be downloaded and opened in an external application, because PDF rendering in a browser tab (using Adobe's PDF plugin) fucks up important shortcuts: Cmd-W no longer closes the tab but throws up an annoying dialog. That alone would be reason enough to switch.
Re: (Score:2)
Small note to webmasters everywhere (if you think about what the parent said): what I hate is websites that force PDF files to be downloads instead of letting my browser handle them. On Mac OS X, viewing a PDF is basically the same as viewing a JPEG. No Adobe reader required, it just works.
Re: (Score:1)
Re: (Score:2)
OS X itself handles PDF just fine (Quick Look, Preview, Safari, print directly to PDF from any application that can print, etc).
Re: (Score:2)
That's because OS X's underlying display API is... display PDF! Similar to ye olde Solaris Display PostScript. As a side effect, display and generation of PDFs is trivial - you're outputting to a file rather than to the rasterizer.
It's also the reason why PDFs are trivially displayed in iOS as well - again, being based on OS X means it also inherits display PDF.
Re: (Score:2)
The problem is that the web site incorrectly specifies the file mime type as e.g. "Content-Type=text/html" instead of "Content-Type=application/pdf". While in theory the ".pdf" extension or content inspection could be used to guess it, Firefox (for example) does not use mime type guessing since it is a security issue: What should Firefox do with this file? [mozillazine.org].
Re:Can it be closed? (Score:4, Informative)
I have it disabled since it's buggy and it's a huge security risk.
Re: (Score:1)
a) Google Quick View (my favourite),
b) Chrome's built-in PDF viewer (which is fast, and doesn't crash often, and doesn't hang everything while the PDF is being downloaded.), or
c) Foxit's plugin (very rarely),
depending on the browser and OS being used. But I tried it out, and though the rendering was horrible (Chromium daily on Natty), it didn't seem to hang or ask anything on being closed. The slide-out sidebar was neat, but the open file button didn't do anything.
Re: (Score:2)
I was hoping somebody around here might explain the point of opening PDFs embedded in the browser. Instead, your post just confirms my own prejudices. The PDF plugins that I've seen trade off screen space for another toolbar, restrict the functionality over standalone PDF viewer, and break the browser's UI. Chrome's handling of PDF was the single reason I ditched it after a few weeks last year when I tried to switch to it from Firefox (even set to open the PDF viewer was broken as it didn't seem to pass
"older Firefoxen"?? (Score:1)
So, "Firefoxen" is now the plural of Firefox?
Re: (Score:1)
Emacs --> emacsen; ergo, firefox --> firefoxen. Now, let's go back to comparing Officen and OSen.
Re:"older Firefoxen"?? (Score:4, Insightful)
No. The plural of "fox" is "foxes".
If someone can't use the English language correctly, how seriously do you expect me to take anything they write?
Re: (Score:2, Insightful)
Despite your relatively low uid, you must be new to hacker slang.
from the Jargon File [catb.org]:
On a similarly Anglo-Saxon note, almost anything ending in ‘x’ may form plurals in ‘-xen’ (see VAXen and boxen in the main text). Even words ending in phonetic /k/ alone are sometimes treated this way; e.g., ‘soxen’ for a bunch of socks. Other funny plurals are the Hebrew-style ‘frobbotzim’ for the plural of ‘frobbozz’ (see frobnitz) and ‘Unices’ and ‘Twenices’ (rather than ‘Unixes’ and ‘Twenexes’; see Unix, TWENEX in main text). But note that ‘Twenexen’ was never used, and ‘Unixen’ was seldom sighted in the wild until the year 2000, thirty years after it might logically have come into use; it has been suggested that this is because ‘-ix’ and ‘-ex’ are Latin singular endings that attract a Latinate plural.
Now get off my lawn.
Re: (Score:2)
That's as may be, but *I* am not new to hacker slang, and frankly, it sounds stupid.
Yes, the plural of ox is oxen. No, the plural of box is not boxen, nor is foxen the plural of fox.
Now you get off *my* lawn.
Re: (Score:2)
Actually proper English dictates that with Emacs and Firefox you'd need a partitive, making it versions of Emacs or versions of Firefox.
Re: (Score:2)
Re: (Score:2)
The one true markup (Score:3, Interesting)
Re: (Score:3)
Re: (Score:2)
Are you asking if there's really a C-to-JavaScript translator? If so, then yes:
https://github.com/kripken/emscripten/wiki [github.com]
Re: (Score:2)
Are you sure you want to do that? I can understand typesetting math in the browser, but typesetting entire TeX documents?
There's already an AMS-endorsed way of typesetting TeX math (Javascript-based) called MathJax (http://www.mathjax.org/), and it works pretty well (well enough for sites like http://mathoverflow.net./ [mathoverflow.net.]
Re: (Score:2)
But TeX/LaTeX works by having a fixed page size. While one can make the vertical height large enough to accommodate the page, how do you adjust the width if a user resizes the browser?
What's the point? (Score:1)
I'm the one who finds this "We do all things now in the browser" highly suspect. I already have a perfectly fine Pdf viewer, called Okular. Why not just give me a link to the Pdf file, so that I can download it, use my favorite Pdf-Viewer and print it out if I like?
I would really appreciate their affords, but I just known this is not done for my convenience, but for cooperate interests. The only reason this is developed, so that they can put some Ads inside the Pdf file, prevent me from downloading it or pr
Re: (Score:2, Informative)
Re: (Score:2)
I already have a perfectly fine Pdf viewer, called Okular.
From Okular's web site [kde.org]: "For Windows have a look at the KDE on Windows Initiative webpage for information on how to install KDE on Windows." The download page on KDE Windows Initiative [kde.org] links to detailed installation instructions [kde.org]. I'm not in a position to try it myself because the PC on which I'm typing this has integrated graphics, which isn't enough to run KDE according to a forum post [google.com] linked from a Google search for kde system requirements.
Re: (Score:2)
I'm not in a position to try it myself because the PC on which I'm typing this has integrated graphics, which isn't enough to run KDE according to some idiot who doesn't know what he's talking about.
Fixed that for you. KDE 4 works perfectly with integrated graphics, you just have to turn desktop effects off. It's perfectly usable without desktop effects enabled, all applications detect it and degrade gracefully, and all the controls etc. work pretty much the same. I have a laptop with integrated graphics that doesn't support desktop effects, and I don't notice the difference apart from once a week or so when I suddenly wonder why my terminal emulator doesn't have a transparent background.
This graceful
Re: (Score:2)
Oh, and I also use Okular on Windows. It works quite nicely.
Re: (Score:2)
Is this a troll, or did you just spectacularly miss the point? That forum post is fairly obviously about the system requirements for the KDE equivalent of Aero Glass or whatever it's called these days...
Re: (Score:1)
The end game is that by shifting focus from desktop applications to cloud applications makes the desktop operating system much less important.
Envisage a day when you dont need to run just so you can run that one specific app.
this might sound over the top - but i am sure that given time we will be able to play the new "Crysis" (whatever that might be) in the browser on any operating system. (of course there will still likely be some beefy hardware requirements and a juicy broadband). Although im fairly con
tinySVG support preferred. (Score:2)
pdf is old vector graphics news. If they want to help a parky [google.com] out they can get TinySVG support built in to Firefox so I can finish rebuilding all of my XUL UI's in SVG. ...that don't work now unless the user knows how to re-enable support then ends up getting owned instead of a warning like getting a self signed cert... Cough. Sorry. Oh while I'm dreaming, getURL, putURL, and parseXML functions so I don't have to "if typeof (parseXML=='undefined')" override them every time would be nice too :) Oh and t
Carts before horses (Score:2)
I find it quite hilarious that people speak seriously about coding artificial intelligence as if it will happen in the this decade, when at the same time we can't even achieve a consistent rendering of the same elements in different browsers.
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
In Soviet Russia, Windows runs away from AI!
Fun fact: (Score:5, Informative)
Re: (Score:2)
And no malware has ever been transmitted through the browser, so another problem brilliantly solved!
Now if only we could remove the OS and only have a browser that does exactly what the OS did but under a different name, the world would be a far better place...
Re: (Score:2)
Over reach much? (Score:3)
This is just silly. While I can appreciate it from a point of curiosity and it is probably a fun project, this is really overloading the browser.
I would submit that things like this are actively breaking the browser paradigm. Every PDF viewer allows you to save a local copy of the PDF after they have read it from the temp directory or the download directory. To implement this thing correctly is would require that JS have direct access to the file system, which as I understand it, aint fucking supposed to happen, since that would create untold numbers of security problems in a system already plagued by security problems.
While there may be arguments that this would be ok, they would all be moronic.
The entire notion of the browser needs to be forked out to an application shell with hard as nails security and a presentation shell and never the twain shall meet.
Re: (Score:1)
To implement this thing correctly is would require that JS have direct access to the file system, which as I understand it, aint fucking supposed to happen
Too late. [w3.org]
The entire notion of the browser needs to be forked out to an application shell with hard as nails security and a presentation shell and never the twain shall meet.
What a novel [chromium.org] idea [webkit.org]!
Re: (Score:2)
Next step: implementing gecko or webkit in javascript, so that developers have the same experience everywhere.
In fact, now that I'm thinking of it, that would not be such a bad idea.
Re: (Score:2)
Nice way to open even more holes in the browser (Score:2, Funny)
Direct2D and DirectWrite? Sorry but browser graphical acceleration must end.
WebGL should be implemented with no hardware acceleration, using graphic card emulation.
Re: (Score:2)
Neither are related to WebGL specifically. They're used for much more mundane things such as Canvas rendering.
Re: (Score:2)
> Sorry but browser graphical acceleration must end.
You can have two of the following three:
1) Scrolling that doesn't feel like molasses.
2) Box blur and shadow effects of various sorts.
3) Sofware rendering backends.
You're presumably voting for #1 and #3, but web designers are voting with the figers for #2, so a browser's options at that point are limited to either not supporting those and reaping the web-developer hate consequences (c.f IE6) or dropping either #1 or #3. Which one do you think they're
It's not doing the rendering (Score:2)
The Javascript code isn't doing the rendering of text. It uses dynamically loaded fonts and lets the platform's own font renderer render the glyphs. The Javascript code isn't pushing pixels.
There's less need for PDF than there used to be, now that you can download fonts in the browser. It might be worthwhile to take this PDF viewer and turn it into a server-side PDF to HTML translator.
Re: (Score:2)
Thank you for saying that. It seems to be that displaying an HTML file isn't much different that displaying a PDF file. At high level the program reads the description (eg this font size, put the text here, etc) and hands it off to some renderer.
Re: (Score:2)
The Javascript code isn't doing the rendering of text. It uses dynamically loaded fonts and lets the platform's own font renderer render the glyphs. The Javascript code isn't pushing pixels.
There's less need for PDF than there used to be, now that you can download fonts in the browser. It might be worthwhile to take this PDF viewer and turn it into a server-side PDF to HTML translator.
OS X is Display PDF and 10.7 is OpenGL 3.2 system-wide. PDF is extremely lightweight on OS X.
But why? (Score:2)
But WHY? Why spend precios cycles that eat battery life and heat up your PC innards doing the same thing through twenty layers of twisted human logic, that a piece of native runtime plugin code can do as well? 'Plugin' is just a word, it doesn't need to be insecure, alien, buggy or . And even if they are that, the problem lies at another level.
If anything, Pdf.js will be suitable where and when energy and resource conservation isn't a factor.
As for me, I prefer to avoid all the extra layers of abstractions
Re: (Score:2)
Please enlighten me, a software developer of many years, what is this gold that is Pdf.js? I mean, apart from proof-of-concept being gold in itself.
Gold is just a bit of an overstatement. More like a valuable, but not precious metal like copper.
The value is that you can display a PDF inline with the website, rather than bringing up a clunky external application like adobe reader. "But you could do that with a plugin!" you say? Correct, but what you can't do with a plugin is actually get most people to in
Re: (Score:2)
> that a piece of native runtime plugin code can do
> as well
Because in some environments you can't drop in native runtime plugin code.
Or put another way, where's that 64-bit Flash for Linux? Or heck, for that new architecture that people will come up with tomorrow?
Whereas this way porting a web browser (a must for a new consumer-facing architecture anyway) will get you a PDF renderer for free.
> 'Plugin' is just a word, it doesn't need to be
> insecure, alien, buggy
A PDF-renderer plug-in can be s
Re: (Score:2)
This is a great point. With the current scarcity of precious computer cycles left in the world why would we waste them on this? The latest government report on general availability of cycles estimates that we will be running out in the next 50 years. Just think of that - a world with no computer cycles remaining! In fact we'll feel the crunch far before that as the scarcity drives up computer cycle prices. Every cycle needs to be preserved and only used for productive purposes!
This post made using as f
Firefoxen? (Score:2)
Re: (Score:1, Informative)
Goatse
Re: (Score:1)
under non-free Licence
looks very open to me... (congrats, 'twas a while ago I was goatsed the last time)
Re: (Score:2)
(congrats, 'twas a while ago I was goatsed the last time)
I hope you mean saw that picture!
Re: (Score:2)
Umm, how is parent a troll for posting a link to the actual thing the top parent spoke of? He's not.
Re:Wow (Score:4, Interesting)
If you read the article (I know, I know)...
pdf.js has now reached the point where a significant portion of its issues are actually browser-rendering-engine bugs, or missing features. Finding these gaps and filling some of them has been one of the biggest returns on our investment in pdf.js so far.
The problem isn't what they've written so much as the browsers not being able to support the latest and greatest HTML5/JS functionality.
Re: (Score:2)
ha thats a ghost argument
"see my infinity energy machine works perfectly fine, its just the laws of physics have not caught up yet"
does not sound like any less bullshit
Re: (Score:2)
Except that you can expect the missing browser functionality to be added, or in the case of open source browsers potentially add it yourself.
Changing the laws of physics is a rather different matter.
Re: (Score:2)
Changing the laws of physics is a rather different matter.
Black holes do it all the time! :p
Re:Wow (Score:5, Insightful)
They've actually failed to grasp the point of PDF. You might as well go back to HTML if your PDF reader can't render the same everywhere considering that was the whole point of PDF to begin with.
Re: (Score:3)
I'm guessing you're modded Funny because - sadly - this hasn't been true of PDF in a long time now.
Even between desktop PDF readers there's now too much of a difference to even remotely be able to 'rely' on it. Even bitmaps are getting less and less reliable with applications choosing to either respect or ignore gamma tags, let alone color profile information.
As it is, I used to use FoxIt, but that started to get bloaty and including oddball toolbars. So I switched to PDF-Xchange. I'm about to switch aga
VM / Sandbox (Score:1)
> Either way, it looks like Adobe Reader will have to remain installed for when these alternatives don't quite get things right.
You might consider installing a VM like VirtualBox or some kind of sandboxing solution so that you can convert / print / export and subsequently erase any "side effects".
Memory and surplus CPU power is getting cheaper and cheaper, I don't understand why more people don't talk about going this route.
Re: (Score:2)
Re: (Score:2)
Back when I still used Windows, Acrobat Reader was an absurdly large app and by far the slowest PDF reader I knew over all platforms. It always struck me as absurd that Apple and Linux users had built-in, capable, lightweight PDF viewers while most Windows users used that bloated POS. Maybe acroread is better these days, but I kind of doubt it.
Re: (Score:2)
Reader 10 is still a large download but it loads much faster than, say, Reader 8 or 9.
I believe it is mainly because they don't preload plugins anymore.
Re:Wow (Score:5, Informative)
Or perhaps you've failed to grasp the point of a v0.2 pre-release on github? In fact TFA specifically states that pixel perfect rendering _is_ their goal.
The blog post describes the current progress; it now has good rendering on one platform, progress from last week.
Re: (Score:2)
the problem though is that they will never be able to test it on more platforms than those that have native pdf rendering.
Wrong. The test suite compares the canvas rendering against reference images (potentially generated on a different device).
Re: (Score:2)
I *think* it's supposed to be:
We said above that pdf.js renders the Tracemonkey paper "perfectly" if you're running a Firefox nightly on a Windows 7 machine where Firefox can use Direct2D and DirectWrite, if you ignore what appears to be a bug in DirectWrite's font hinting.
Re: (Score:1)
They cut the sentences into separate clauses to emphasise each of them ...
Re: (Score:2)
Re: (Score:2)
If it renders it all as images, then why go to the trouble of making this client-side?
Render as images on the server once and your problems are over for all platforms and browsers. Unless, of course, you do something stupid like using the BMP image format instead of PNG.
Re: (Score:1)
Why not use a vector format which can properly scale the text and even include the text in a format readable to search engines?
Like, for example, PDF?
Re: (Score:2)
Render as images on the server once
"The document could not be displayed because you are not connected to the Internet."
Re: (Score:2)
You can't do those things with most PDF files anyway.
Re: (Score:2)
The big downside is that it's all images and you can't do all those fancy things you can do with text. Like select, copy & search.
I'm working on it. To get text out of pdf.js as is, you just implement a TextGraphics object (like their existing CanvasGraphics one) and just implement the text and coordinate transform commands. There's lots of ways of getting that into a copy/pasteable form afterwards, but its early days and I'm just coding up the OCR-ish algorithms needed to infer reading order from non-tagged pdf (the most common case).
I'm not associated with the project, but this is on their todo list too, and someone else might get i
Re: (Score:2)
https://github.com/andreasgal/pdf.js [github.com] (It's BSD licensed, minus the credits clause).
Who's the asshole now?