Everyone wants a fast web site. Making one is a bit more difficult. Profiling code during development is easy, thanks to the excellent Xdebug extension (which also provides handy debugging tools), but this still leaves us blind in production, where we care the most. It’s also frequently more than a little different than the development machine.
Facebook faced a similar problem in their pre-HipHop days, and developed XHProf, a profiler lightweight enough to run on your production servers. It still adds load to the server when it runs, but the tradeoff is worth it, because you’ll know why your pages are slow. XHProf can record different levels of detail about your app for different levels of performance sacrifice. I generally run it full out, because I prefer the highest level of detail, and I accept the performance hit.
XHProf will need to be compiled and configured on your
server. This is currently trivial on most Linux-like servers. A Windows version
is in the works (it already works apart from
returning zero for the number of CPU ticks). Once the extension is installed
and confirmed through phpinfo()
, you’re off to the races. It’s
also capable of showing you a call graph if you have Graphviz
installed. While you’re playing on the server, you may want to look for this.
The included GUI works, but it only provides information on a run-by-run basis, and it uses serialized data on disk for storage. I’d heartily recommend XHGui for your UI needs.
XHGui includes two components, a data recorder (in the
/external
directory) and a UI for display (
/xhprof_lib
and /xhprof_html
).
The recorder is used to actually profile your page loads. The easiest way
to do this is to set PHP to automatically prepend
header.php
and append the footer.php
.
This can be trivially accomplished with two Apache
directives for a given virtual host:
php_value auto_prepend_file "/var/www/xh.example.com/external/header.php"
php_value auto_append_file "/var/www/xh.example.com/external/footer.php"
The UI directories contain a config file (in /xhprof_lib
).
You’ll need to copy config.sample.php
to config.php
and make the appropriate changes. There are a bunch of settings to be aware of:
exceptionURLs
- These URLs will never have the link to the result page displayed at the bottom, even for administrators. This is useful for URLs that return non-HTML output, such as images, videos, XML, &c.
exceptionPostURLs
- The app records GET, POST, and cookie information when profiling a page. This is useful in determining exactly why a given code path was used. For example, it may reveal that a user was logged in, or submitted a form comment. POST data, however, will occasionally include privileged information such as login credentials. The URLs listed here will prevent saving their POST data.
controlIPs
- The ability to turn profiling on or off and to display the link to profile results is restricted to users accessing the page from within a control IP. Add IPs here to give yourself control. When you’re in the controlIP list, you’ll be able to turn profiling on for all your requests by appending
?_profile=1
to a URL (you’ll be given a cookie and redirected back to the original page); a link to the results for that run will also be appended to the page. weight
- One of every
$weight
requests will be profiled.
There are two functions within the configuration file that are designed to be edited on each install:
_urlSimilator
- This function accepts the URL of a requested resource. You should collapse URLs in such a way that functionally-identical endpoints return the same value. For example, if
/blog/?post=25
and/blog/?post=26
are just two blog posts, you can probably assume that these URLs follow an identical path within PHP. The only difference is a parameter passed to the database (or, hopefully, memcached). Supply the_urlSimilator
with as much intelligence as you can to help it collapse these URLs. _aggregateCalls
- This function controls the pie chart within a specific run. The goal is to collapse pie slices together to give the developer a better idea of how certain sections of your app are doing. For example, you may choose to collapse
mysql_*
function calls toDatabase
, this will then be shown as a single slice on the pie chart (full details will still be available within the data table). The sample function collapses everything starting withmysql_
tomysql
, andload::
toLoading
. Adding your own basic rules is pretty easy.
The last step is to follow a rather ridiculous process and rename the
file associated with your chosen database system in
/xhprof_lib/utils/
to be simply xhprof_runs.php
.
Patches accepted. :-)
The UI
The main page of the UI will show you some basic stats on the 25 most recently profiled runs. This page is interesting if you’re looking for a recent event. Beyond that, it’s just a jumping-off point. It’s worth noting that the URLs are purposefully hackable; you can view the last 273 requests by modifying the URL. If you’re running XHProf against multiple sites, or on multiple servers, you can also tweak those parameters here.
Low-hanging fruit
The easiest place to start if you just want to make things faster is likely the “hardest hit” tab. It will share details on how many requests have been made for which URLs, how long these requests took (on average) to load, and how much time in total was spent loading them. A frequently-loaded page — even if it doesn’t take that long — is a good candidate for improvement, lowering the load on your server. Clicking on the URL will display runs against that URL. Selecting any one of them will bring you to the profile results for that request.
The pie chart that appears on the run request page is a great
place to start; it lists the most expensive functions (by exclusive wall
time). Hopefully this gets you pointed in the right direction. The sortable
table at the bottom of the page should give you even more detailed
statistics on exactly what your app is up to. I’ve found several
situations where simple functions were placed inside a loop, repeatedly
rechecking the length of a string or for the presence of a character.
strlen()
may not take long once, but on the 3,000th
execution, you might start to care. Clicking on a function call should
reveal which functions call it and what it calls (parent and child,
respectively).
Most-expensive and longest-running
These tabs are incredibly attractive, and you may want to tie yourself to the mast to avoid their siren song. The main failing of this display is that it’s incredibly easy for it to fill with aberrations rather than the norm. Over 24 hours, it’s easy for random (expensive) admin actions, back-catalog search results, and expensive actions like new signups or revocations to add up, especially on a busy site. All of those fail to account for pages that load when other things could be taxing the system (like a backup script or a large file migration). Where the “Hardest Hit” tab aggregates by URL and displays totals, this just looks at those single runs. So, while this tab does display useful information on expensive pages, make sure you look at the stats for specific URLs before spending too much time here. Remember that the page for a specific run will give you the min, max, average, and 95th percentile data as well. That last statistic is incredibly helpful when trying to determine if a page is regular a poor performer, or if those bad runs are just anomalies.
Scripts that I find within these sections generally make poor decisions when it comes to data access, such as queries on columns that haven’t been indexed (or have been indexed poorly), API access running on page load (rather than batched, or handed off to something like Gearman), or on-demand image manipulation (like thumbnail generation).
Most RAM
While this table fails to aggregate results just like “Most Expensive” and “Longest Running”, it does have the advantage of being less prone to background server tasks (those wont affect RAM consumed).
Frequently, I’ll find scripts in this tab that are unnecessarily doubling their RAM requirements, or legacy code doing something silly with triplicates of database results in different variables.
Last trick
Each request has a unique identifier; note one, then load up another. By pasting the first request’s ID in the delta box, you can get a comparison of the two runs. This is particularly helpful if you’re trying to determine how some changes have affected performance.
Effect of opcode caches on XHProf
The standard PHP page load consists of Zend Compile turning your PHP into opcodes, and then Zend Execute running them. Opcode caches allow the first step to be skipped. Since XHProf is started with a PHP function call, the Zend Compile step has passed, whether it was omitted or not. That said, efficiencies added by optimizing the opcodes may appear in your results.