WEB Advent 2009 / You Don’t Need All That

I’ve been writing PHP for a long time. I am not one of its dinosaurs, but I’ve been making it do my bidding since PHP 3. I have also seen a lot of trends come and go. As an example, PHP 4 was all the rage for optimizing with references. PHP 3 still had some parts that were so poorly written, it was possible to get 20% performance increases just by changing the way you did something. PHP 5 is obviously trending with object-oriented programming.

Speaking of object-oriented programming, we know the standard drill. A cornucopia of MVC frameworks have rained down on us — some of which actually follow an MVC pattern — and have inundated us thoroughly with the standard request workflow:

  1. The web server initiates the bootstrap.
  2. The bootstrap initiates the front controller.
  3. The front controller initiates the controller method.
  4. The controller uses the model.
  5. The controller prepares view data.
  6. The controller instantiates the view and provides the data.
  7. The request completes.

Et cetera, et cetera, ad infinitum. Mind you, this isn’t a bad workflow at all. We have a nice, clear separation of concerns. We always know where to look. It’s great, right?

Well, the idea is great. Allow me to share something I did the other day. I decided that I would benchmark the difference between traditional, hard require statements, and a clean autoloader. Bah! Micro-optimization, right? Well, I beg you to humor me.

I set up a PHP benchmarking environment with an old server, and I wrote a small set of scripts to generate the following components:

  • A directory with 100 class files. These classes are simple blanks in the form of class MyClass {}. The actual content of the class wasn’t relevant to what I wanted to benchmark.
  • A script that performs a hard require statement for each of the 100 classes and then instantiates them: require './classes/MyClass.php'; $myclass = new MyClass();
  • A script that implements a very simple autoloader, registers it, and then instantiates one of each of the 100 classes.

I tried to think of how to introduce as little bias into my benchmark as possible, and I came to the conclusion that I would run the benchmarks against a stock Ubuntu LAMP installation with a concurrency of 1. This seemed fairest, since it’s a close approximation to what what is provided by many hosting providers. The PHP version was 5.2.9. The test is slightly unrealistic, because an app using hard require statements and multiple entry points would not load 100 different files to perform any given task. You would do tend to bundle things a bit more neatly. However, these files would usually contain some wasted code not used by every request. I figured the balance was fair, and I was satisfied with that. The point is to test how far apart require and an autoloader are, not to test the efficiency of your code organization.

I benchmarked the script with hard require statement and the script with the autoloader by using apachebench. Each run was 1,000 requests with a concurrency of 1. For each script, I performed 4 apachebench runs. The first run was to warm APC, the file system, the moon phases aligning with Venus, et cetera. The requests per second of the next three runs were recorded, and the middle value was chosen. I expected to find a difference in performance. There is no question that autoloading takes more time, but the results astounded me. Specifically, the runs for the script using exclusively hard require statements yielded 46.3% more requests per second than the script using the autoloader. That’s a real performance difference.

The above establishes a significant, quantifiable performance difference for the simple operation of including code. There are, however, observable differences in behavior. I will try to enumerate some of these in as unbiased a fashion as possible:

  • If your loading mechanism is based on the idea of loading discrete objects one at a time, such as with classes, remember that some objects are very small in size. A number of your class files may not have more than one or two static functions in them. Since the autoload operation is expensive, loading this code becomes an inefficient operation.
  • If your loading mechanism is based on the idea of loading families of code, such as with include files, you are inclined to group small classes together in a single file. This of course makes the loading operation much cheaper, but it also means you use slightly more memory and compilation time. If you use an opcode cache, the addition of a 20-line class appended to the contents of a file that contains a 200-line class is a relatively minute performance concern. Additionally, this is a subjective problem, since it is dependent on the programmer’s efficiency (or lack thereof) of organizing code.
  • If your loading is based on classes, you will always load classes. For example, the class that’s actually being used as a pseudo namespace, containing only two static functions, forces an extra symbol lookup on every call, because it’s a class method and not a regular function. This makes minute code slower. This depends upon the programmer’s organization of the code, so it’s not always a concern.
  • If your loading is based on files, there is nothing wrong with using a function or a class method instead.

Considering these observations, I would like to introduce a slightly less formal comparison I made. I wrote a simple, identical blog app in two different styles. In the first style, I wrote it with only a smattering of object-oriented code (specifically, for the model), and otherwise used a traditional structure where my app had multiple entry points (one entry point per primary task) and simple template files (PHP includes).

In the other style, I used Zend Framework to write an identical blog app, with slight differences to accommodate the style of the framework. I will spare you the details of the benchmarks, but in short, my simple app with multiple entry points was able to deliver about 280% as many requests per second as the Zend Framework app when testing with Siege. Additionally, the Zend Framework app was (as is obvious from the benchmark) significantly slower in its response times. This leads me to more observations:

  • Even when you are only autoloading the things you need, the deeply nested, tree-like structure of a modern MVC framework will often make you load a lot of stuff (and spend a lot of time loading it) you don’t need. This can negate the benefits of easier organization.
  • There were only a couple of moments where I thought to myself, “Gosh, I wish I had some code already written to do this for me by convention.” To me, this indicates that there is a relationship between the complexity of an app and the desired abstraction of your code. Many things simply don’t need deeply nested object trees to get their work done cleanly, effectively, and with understandable source code.

I understand why a well-organized MVC framework can be a wonderful thing. If your separation of concerns is dictated, and everything is organized by simple rules and types, then starting to write code to solve complex problems is much easier. However, not everything is a complex problem. Additionally, frameworks can often turn simple problems into very complicated ones, to the point where writing it yourself from the start may just have been easier. Sometimes, simpler is just better. Considering all of the above, I would like to make these simple assertions for you to evaluate in your own time:

  • You do not necessarily need a dictated structure to organize your app logically, but it can help a lot.
  • There is a definitive performance difference between autoloading chains of classes versus simple require statements.
  • The response time of your MVC app can shock you.
  • For the best of both worlds, require things you know you will always need, and fall back on an autoloader. Or, put some effort into writing more general, commonly-used classes rather than lots of little ones. Maybe you want to replace the really busy calls to your app with simple PHP scripts.

The point I am trying to make is we’re getting a little crazy with our architectures. These days, we write PHP just like Java, but it’s not Java. It works completely differently, and we shouldn’t try to force patterns that aren’t necessarily the best fit. You don’t need all that. Just write good code.

Other posts