WEB Advent 2008 / PHP Without PHP

Take a simple PHP trick and follow it on a huge tangent to the philosophy of good web architecture.

It's an honor to be asked to share my ideas with the PHP community. When Chris and Sean asked me to write an article for PHP Advent, I had to accept. Like last year, this article will be quite long. If you need something short and sweet like the other advent articles, you can just read the first section. But, if you read it all, there might be a worthwhile concept buried in this logorrhea.

Funky caching

Funky caching is an obscure trick often attributed to Rasmus but actually invented by Stig. It is also known as the ErrorDocument trick, Smarter Caching, and Rasmus's Trick. It was first presented by PHP creator Rasmus Lerdorf in his fun Tips and Tricks talk.

It entails the following:

First you create an ErrorDocument line in your httpd.conf:

ErrorDocument 404 /error.php

This tells the web server to redirect all missing files to error.php in your directory root. The following listing provides an example error.php:

<?php

$filepath = parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH); // or $_SERVER['REDIRECT_URL']
$basepath = dirname(__FILE__).DIR_SEP;

// Test to see if you can work with it.
if (FALSE) {
    // EDIT
    include '404.html'; // See http://alistapart.com/articles/perfect404 for tips.
    return;
}

// Generate the file.
// EDIT
$data = 'something';

// Send a 200 instead of a 404.
header(sprintf('%s 200', $_SERVER['SERVER_PROTOCOL']));
echo $data;

// Store the page to avoid ErrorDocument on the next request. Use a
// temp file with a link trick to avoid race conditions.
$tmpfile = tempnam($basepath . 'tmp', 'fc');
$fp = fopen($tmpfile, 'w');
fputs($fp, $data);
fclose($fp);
@link($basepath . $filepath, $tmpfile); // Suppress errors caused by losing race.
unlink($tmpfile);

?>

Other than the two lines commented with EDIT, the code above is pretty canonical.

What does this trick do? Basically, when a file doesn't exist, PHP can create the data and return it instead of a 404.

This is when the magic happens. It places the generated file in its proper location under document root. Thus, the next request to the same resource doesn't invoke the ErrorDocument handle; the resource is served directly by the web server henceforth.

.

This is truly PHP without PHP.

Words have meaning through paradigm

The foundation of Steve McConnell's seminal text, Code Complete, was that software development should be based around the paradigm of construction. That was fundamentally flawed because of the mythical man-month—the man-month term itself originally comes from construction work. We now know McConnell is wrong and software isn't construction, it's engineering. We're called software engineers, not software workers.

My title at Tagged is currently software architect. And I have a "radical" idea that maybe titles are that way because they mean something. Meaning that if I'm hired as a software architect, then I should think like an architect and find my inspiration from architecture.

Fallingwater

Nestled along a creek in the woods of southwestern Pennsylvania is a house with angular features that are cantilevered 40 feet above a waterfall. This was the summer home of the Kaufmann family, owners of a Pittsburgh department store that's now part of Macy's. (I remember this store well, because I spent many days in a neighboring newsstand reading issues of MAD magazine and Cracked.)

When I was a kid, my dad took us to visit the place, and I became one of millions of visitors to Fallingwater, a home that was hailed by Time Magazine on its inception and became known as the quintessential example of the organic architecture of architect Frank Lloyd Wright.

Fallingwater

Why is Fallingwater, a summer home for a Pittsburgh family, so obviously beautiful that hundreds of thousands make the trek (50 miles from the nearest city) each year, it was voted best all-time work of American architecture in 1991 by the American Institute of Architects, and pictures of it are as instantly recognizable as any natural wonder?

Maybe it'd be enlightening to consider how Frank Lloyd Wright built it. Before he started, he commissioned a survey of the entire topography around the waterfall and had them include all trees and boulders. He then came up with an idea of a cantilevered house that would stretch in a manner that would look like it floated in air above the waterfall.

Perhaps the following details are more important:

From details like these to the whole view taken in at once, one gets a feeling that, in spite of the sharp horizontal and vertical lines of the building, the whole lives in harmony with its environment "instead of lording above [it] in an isolated spot as a man-made imposition."

Frank Lloyd Wright designed on the principles of "organic, democratic, plasticity, continuity." We can see how this example holds true to these values.

Could this building have been built anywhere else?

Why is funky caching so prevalent in the PHP world?

If you look at funky caching, it doesn't need PHP to implement it. This begs the question as to why it first appeared. Why is this obscure design pattern so ubiquitous in the PHP world? In fact, you, as a PHP developer, use it every day when you visit PHP.net and visit a page like http://php.net/strstr to figure out the order of the needle in the haystack.

A cynic would say, because PHP is so slow to execute, it needs solutions like this to perform well. The problem with this argument is that no single web language outperforms the fastest static servers out there, or even come close to the slow ones. There is no web language that wouldn't benefit from this trick.

But there is truth to the cynic's statement. The PHP world may have discovered this first because it trades off speed of execution with speed and ease of development. As Andrei mentioned earlier, that is fundamental to its design. In fact, all dynamically-typed scripting languages make this tradeoff.

The ubiquity of this trick in the PHP world is because it—like Frank Lloyd Wright's Fallingwater—lives in harmony with its environment. The environment is an Apache web server, persistent data store in the form of a relational database, and the demands of large-scale, consumer-facing web sites. Would this solution exist without an ErrorDocument handler built into Apache? Would this solution exist if we didn't persist content on a (relatively) slow data store like a database? Would this solution exist if the consumer didn't demand millisecond response time for dynamic content?

Funky caching in the PHP world lives in harmony with that environment. It lives in harmony with PHP itself.

The architectural principles of PHP

PHP is a language that is designed to solve the web problem.

PHP is a component of web architecture as Maggie mentioned earlier. Without Apache serving it, without a database backing it, without the demands of the Web behind it, without thousands of hosting sites installing it, without hundreds of open source packages written in it, it would be useless.

The language, like Fallingwater, is customized for the problem at hand and compliments the environment in which it lives. Just like Wright's design lives true to his principles, so does PHP and its solutions live true to its principles: "cheap, scalable, pragmatic."

PHP Design Patterns

When using PHP, let us not forget PHP's three principles that attract us to the language in the first place:

  1. Cheap (developer time and resources): "A project done in Java will cost 5 times as much, take twice as long, and be harder to maintain than a project done in a scripting language such as PHP or Perl." —Phillip Greenspun
  2. Scalable (shared-nothing architecture): "That a Java servlet performs better than a PHP script, under optimal conditions [has] nothing to do with scalability. The point is can your application continue to deliver consistent performance as volume increases. PHP delegates all the 'hard stuff' to other systems." —Harry Fuecks
  3. Pragmatic (designed to solve the web problem): "PHP is not about purity in CS principles or architecture; it is about solving the ugly web problem with an admittedly ugly, but extremely functional and convenient solution. If you are looking for purity, you are in the wrong boat. Get out now before you get hit by a wet cat!" —Rasmus Lerdorf

Could this "ugly, but extremely functional and convenient" web language have been built to solve anything other than the ugly web problem?

Bellefield Tower

One block from where my mother used to work, on the corner of Fifth Avenue and Bellefield in Pittsburgh, stands a strange sight. A very modern building wraps around in a Jobsian-loving rounded rectangle, narrowly avoiding a gothic Romanesque tower a century its senior. An uglier and more out-of-place architectural juxtaposition I have never seen.

Bellefield Tower

If you weren’t in Pittsburgh in the late 1980s, you wouldn't understand how his could have happened. On this ground once stood the original Bellefield church, built in the 1880s. Since its congregation had been moved farther down Fifth Avenue, the building was sold a century later to developers trying to exploit the new business attracted by the Pittsburgh Supercomputing Center and the joint CMU/Pitt software building. They wanted to level it and build a new building, but were blocked when people mobilized to save the old tower. The developer then proceeded to honor this by demolishing everything but the tower and building the ironically-named "Bellefield Towers" next to it.

You can see the current Bellefield Presbyterian Church as a common example of the gothic architecture of the area. You can also note the Carnegie Library of Pittsburgh and the Cathedral of Learning—both next door, both reflecting the gothic Romanesque architecture, and both figuring prominently in iconic photos of the most famous game in baseball.

Why is Bellefield Towers so obviously ugly? The old Bellefield Church tower stands next to Bellefield Towers with a sawed-off quality to it. The curved, modern architecture of the latter serves only to emphasize how it was built with no consideration of the surrounding environment. The Oakland and Shadyside areas of the city that the old Bellefield Church straddled contain many unique examples of Romanesque gothic architecture. When faced with a gorgeous 100 year old example of the area's architecture, instead of working with the environment like Frank Lloyd Wright did with Fallingwater—in the same area of Pennsylvania no less!—the developer simply sawed it off!

I remember watching it happen, and this literal architectural lesson guides me to this day about the follies of architectural hubris in software.

What hubris?

Have you ever seen developers write code without considering the environment in which the code will live?

I guess my big beef with most frameworks is that they're often written with no consideration of the environment—that is almost by definition. The best frameworks are ones that are less frameworks than applications which force constraints of an environment.

As Paul mentioned earlier, even if you build it your way and customize the solution for your application, it's still a framework. But it's a framework most likely to have at least one successful user. You.

"I'm a developer. I can make the software conform to my needs."

Oh really? That sounds a lot like trying to "lord over the environment with an isolated man-made imposition."

"But what I mean is it's all man-made in software. There is no environment."

You don't develop in a community as Chris mentioned earlier? That’s environment. You never took over a project you didn't write or worked at a company with a pre-existing code base? That's environment. You never dealt with an installation problem because your host was configured differently than your development environment? That's environment. You never had business needs trump the little feature creature sitting on your shoulder? That's environment. You've never listened to a user request, as Paul mentioned earlier? That's environment.

"There is no danger of that environment being different."

When I joined my current company, they had a couple of services written in Java, only Zend Accelerator could opcode cache their PHP 4 installation, Oracle RAC powered the back-end, and engineers developed while working in cubes with a relatively heavyweight waterfall development process.

Although I prefer Python to Java for services, we've increased our Java development to almost half of our code base! Although I prefer MySQL to Oracle, we still use Oracle as our back-end. Even the transition to the open office occurred after it became apparent the company had outgrown cubes.

Why? Because your solutions have to work within the environment. Anything else is architectural hubris.

"But that's not an architecture decision."

Let's say it is the early days of social networking, and you join a company that is using Java/J2EE instead of PHP, or Oracle instead of MySQL, or they're using Perl/Mason instead of your favorite (PHP) framework, as Marco mentioned earlier—there are so many to choose from that the number is second only to Java.

Do you go in and say your experience building a CMS or online store trumps their experience working on a nascent social network? Do you replace all the Java engineers with PHP ones? Do you replace MySQL with Oracle? Do you rewrite the site from scratch using your favorite framework?

These things and more have happened.

"So you're always right?"

I'm not saying that in all these instances these architects shouldn't have made the decisions they did. I am not qualified to answer that.

What I do know is that in the vast majority of cases, people went in without considering the existing environment. I do know the dynamics of a Facebook is different from the dynamics of GameSpot or Amazon. I do know a social network is different from a CMS or online store. And all these solutions are very different from ones in the enterprise.

Like building Fallingwater without getting an adequate survey done, every day people make the mistake of not looking before acting. They try to make PHP look like Java with dollar signs, as Luke mentioned earlier. They expected the environment to conform to their reality so they can lord over it with "some isolated man-made imposition."

And in those cases, you're more likely to build a Bellefield Towers than a Fallingwater.

The Golden Gate Bridge

I've long since moved from the woods of Western Pennsylvania to the San Francisco Peninsula. I am fortunate that my weekly run passes with a near-constant view of the most recognizable architecture in the American West:

Golden Gate Bridge

What's interesting is that there are much longer spans in the country and the world. Even in the same city, there exists a beautiful bridge that is both longer and of more utility. And yet this bridge represents the icon of San Francisco and the state as a whole.

Why?

I’m not sure, but consider these things:

  • The original design was for a hybrid cantilever and suspension structure. But, it was replaced with a pure suspension design because the former was deemed too ugly. A pure suspension of this length had never been attempted before.
  • Irving Morrow designed the bridge tower, lighting, and pedestrian walkways with an entirely Art Deco decorative influence.
  • The bridge was painted in a specially formulated anti-rust paint in International Orange on demand from locals.

Think a moment about any of those design decisions. Each of them, along with the building of the structure in the first place, was fought as an uphill battle against economists, the rail lines, engineers, The War Department, and others. The Navy alone originally demanded it be painted black with yellow stripes to assure visibility with passing ships.

Can you imagine that?

I run by or cycle over the Golden Gate Bridge once a week at all times of day in all weather conditions, and, whether seen from the north side or the south, from the east or the west, I'm struck by the salient fact that it is iconic because the rust-colored, suspension-only Art Deco structure is just right for the environment it is in.

The rust-colored paint evokes the hills of Marin to the north as well as the setting sun. It is natural and visible enough to be safe. It becomes an icon. Every week, I pass by it and am inspired and thankful I can live in such a beautiful city.

The design pattern

To me, the most salient point of a design pattern comes from its original definition. From Christopher Alexander's book on architecture, The Timeless Way of Building:

Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice.

So, if funky caching is a design pattern, then it too can be used a million times over, without ever doing it the same way twice.

How are we to know which way to do it, or even if it is the right pattern to be using in our situation?

The answer is found in how both a house in Pennsylvania and a bridge in San Francisco represent ultimate expressions of architecture. They are wholly appropriate for the environment in which they stand.

When choosing between a singleton and a global variable, which pattern to use is determined by the environment. A cantilever is wholey appropriate to create the floating look of Fallingwater, but that same pattern would disrupted the naturalness of the Golden Gate Bridge.

So too must the solutions that use funky caching (or PHP in general) be wholly appropriate for the problem at hand.

In Rasmus's original talk, he suggests that this solution can also be used to search for the closest matching valid URL and redirect, or use the attempted URL text as a DB lookup. We can see PHP.net's solution outlined right there!

At Plaxo, we had the problem where images are stored in the database but need to be generated in multiple sizes and thumbnails and streamed fast to the user. Databases are slow, lumbering stores. The solution was funky caching:

Funky Caching

Recently, Tagged has run across the very same performance (size and number) issues with JavaScript that Helgi mentioned earlier. The solution: funky caching hooked up to a JavaScript compressor powered by a Java service back-end to dynamically catenate and compress JavaScript into a unique URL on demand.

We recently imported 1/64th of our production data over to the staging environment for testing, but the users' images would take too much time and disk space to import. We could just link the images, but then testers didn't know which ones they uploaded and which ones were proxied from the live website. The solution was to spend an hour writing a funky caching proxy. If the image was missing, the ErrorDocument handler would try to grab the image from the production web site and add a watermark.

Tagged with Funky Caching

Here is the complete code. Since this is only for testing, there is no need to waste disk space by storing the created file. The performance hit of real-time generation of redundant requests is unnoticeable to QA.

<?php

$watermark = '3129080702_c4e76f71d7_o.png';
$dead_url = 'http://example.com/dead_image.png';

// {{{ start_image($filename, &$data)
/**
* Creates a gd handle for a valid file
* @param $filename string the file to get
* @param $data array the imagesize
* @return resource GD handle
*/
function start_image($filename, &$data) {
  $data = @getimagesize($filename);
  if (empty($data)) { return null; }
  $data['ratio'] = $data[0]/$data[1];
  switch($data[2]) {
      case IMG_GIF: return imagecreatefromgif($filename);
      case 3: // Problem where IMG_PNG is not bound correctly for my install. :-(
      case IMG_PNG: return imagecreatefrompng($filename);
      case IMG_JPG: return imagecreatefromjpeg($filename);
      case IMG_WBMP: return imagecreatefromwbmp($filename);
      case IMG_XPM: return imagecreatefromxbm($filename);
  }
  return null;
}
// }}}
$requestimg = $_SERVER['REDIRECT_URL'];
if (!$_SERVER['QUERY_STRING']) {
  // Redirect user to invalid image.
  tag_http::redirect($dead_url);
  return '';
}
// Grab image to temp. {{{
$ch = curl_init($_SERVER['QUERY_STRING']);
$tempfile = tempnam('/tmp', 'prod_remote_');
$fp = fopen($tempfile, 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
// }}}
// Configure image and dimensions. {{{
$size_data = array();
$im = start_image($tempfile, $size_data);
if (!$im) {
  unlink($tempfile);
  tag_http::redirect($dead_url);
  return;
}
// }}}
// Get watermark information. {{{
$wm_data = array();
$wm = start_image($watermark, $wm_data);
if (!$wm) {
  unlink ($tempfile);
  tag_http::redirect($dead_url);
  return;
}
// }}}
// Add watermark. {{{
if ($size_data['ratio'] > $wm_data['ratio']) {
  // Image is wider than the watermark.
  $new_smaller_dim = $wm_data[0] * ($size_data[1]/$wm_data[1]);
  $dst_x = ($size_data[0] - $new_smaller_dim)/2;
  $dst_y = 0;
  $dst_w = $new_smaller_dim;
  $dst_h = $size_data[1];
} else {
  // Image is taller than the watermark.
  $new_smaller_dim = $wm_data[1] * ($size_data[0]/$wm_data[0]);
  $dst_x = 0;
  $dst_y = ($size_data[1] - $new_smaller_dim)/2;
  $dst_w = $size_data[0];
  $dst_h = $new_smaller_dim;;
}
imagecopyresized($im, $wm, $dst_x, $dst_y, 0, 0, $dst_w, $dst_h, $wm_data[0], $wm_data[1]);
header(sprintf('%s 200', $_SERVER['SERVER_PROTOCOL']));
header(sprintf('Content-Type: %s',$size_data['mime']));
// }}}
switch ($size_data[2]) {
  case IMG_GIF: imagegif($im); break;
  case 3:
  case IMG_PNG: imagepng($im); break;
  case IMG_JPG: imagejpeg($im); break;
  case IMG_WBMP: imagewbmp($im); break;
  case IMG_XPM: imagexbm($im); break;
}
imagedestroy($wm);
imagedestroy($im);
unlink($tempfile);

?>

With a bit of creativity, this concept can apply to modern applications where, instead of caching on the filesystem, you cache in memcache; instead of bypassing the application server, you bypass the web servers themselves with a CDN; instead of serving static content from the edge, you serve dynamic pages.

Whether to use it and how to use it is always determined by the environment.

Comments

I hope this tour helps you see software development in a different way—that finding solutions are about using the right solution in a manner that fits with the environment. Even when we do, I don't think we can architect structures that work as harmoniously together as a city such as San Francisco:

San Francisco

But one can always hope. :-)

Happy Holidays from me and the PHP community to you and yours.

If you would like to comment on this article or read some additional commentary, please visit this post on my blog.

Other posts