WEB Advent 2012 / PhantomJS

My, that’s a pretty web browser you’re using, concurrently making requests to various servers, interpreting and rendering the nearly indecipherable HTML and CSS. It’s a fantastic piece of technology, yet for developers it has a fatal flaw. You. Browsers need users, and when it comes to testing, this is a major drawback when working on today’s complex web apps.

PhantomJS is a rather new kid on the block. It’s a headless WebKit-based browser. It’s statically built against QT, so it works across platforms (Windows, Linux, and Mac OS X downloads available), and it has minimal dependencies. The great part about it being headless is that you don’t need X installed to use it, so it’s suitable to install on your servers. In fact, I’ve already installed it on 80 of our servers powering Where’s It Fast.

Once you’ve got everything installed, it’s time to give the examples a whirl. Load speed is easy and interesting: ./bin/phantomjs ./examples/loadspeed.js http://webadvent.org/ yields:

SyntaxError: Parse error

Page title is Web Advent 2012
Loading time 610 msec

It takes 610 msec for PhantomJS to load the page inside its headless browser. Note that this isn’t a basic request with cURL to simply grab the index document; it’s “rendering” the HTML, grabbing the style sheets, images, &c. The “Parse Error” we’re seeing is the output of the JavaScript engine erroring out as it grabs a misconfigured resource. Whether you prefer the JavaScript or CoffeeScript source, the code involved is almost disappointingly trivial. Record the current time, attempt to open the page, on success subtract then from now and print. If there’s an error, print the appropriate message. That so few lines of code were required to return such a useful monitoring metric really pushed me to look harder into PhantomJS.

For a more in-depth look at the resources it’s grabbing, the netsniff.js example is perfect. It similarly retrieves the requested page, but it also returns detailed information on each resource that it requests in its log from requesting webadvent.org.

Here, we get some basic app info, followed by the details on the page, including the pageTimings variable — it took only 393ms for the browser to hit onLoad (the Wi-Fi must have improved). What follows is an entry for each resource requested, including both request and response, as well as various timing details. This far more granular view of the page allows us to not only ensure that it loads quickly, but also determine which resources took the longest. Using startedDateTime as well as the timing information, it’s possible to replicate the detailed page load waterfall you’d receive from Firebug or similar tools.

I find these figures tremendously interesting and exciting. An incredibly large pool of research has shown that the speed at which your page loads and becomes usable is tremendously important. Faster pages equals more sales. Being able to automate monitoring of these full page statistics provides an additional layer of assurance that pages are working the way their provider expects. These are often values that are easily hidden if it’s your static JavaScript server slowing things down, rather than your main dynamic server, or your database.

But wait, there’s more!

PhantomJS is also capable of rasterizing your page, and spitting out a PDF for your viewing pleasure. This site uses a second (and appropriate) CSS file for printing, so I’ll use my own site for comparison here. The PDF that PhantomJS generates (with /bin/phantomjs ./examples/rasterize.js http://wondernetwork.com/ wondernetwork.pdf letter) can be compared directly with wondernetwork.com. Apart from needing to do some work to better support narrow browsers (my bad) I’m incredibly impressed with how well that worked. Generating thumbnails of pages was cutting-edge stuff very recently (and let’s be honest, generating PDFs programmatically is almost always a pain).

Further reading

Beyond these examples (and the clear, though unspoken, implication that you can edit the rather clear code to do more), there’s also been a few things built on top of PhantomJS to further utilize this headless web browser. In particular, I’d like to call your attention to the friendly CasperJS, which eases scripting and testing with PhantomJS. Their site documents a simple way to iterate over some Google search results, possibly to ensure your continued ranking, but I’m more interested in the testing aspect. I’ve explored using Node.js for testing my JavaScript in the past, but was never really happy with how it required me to change how I was handling things (since I wasn’t deploying with Node.js). With CasperJS, I’m able to test my pages the way I’ve written them, without any modifications or other chicanery.

All in all, I think PhantomJS is a fantastic tool to add to your toolkit. Monitoring, imaging, and testing, all in one package.

Other posts