WEB Advent 2010 / Output Buffering

When I chose to write about output buffering for this year’s PHP Advent, my depth of knowledge on the subject was very limited. I picked a topic that I could learn well, and then explain thoroughly without writing an entire book. It’s a feature that will likely be new to beginners, but which even intermediate and advanced users may not have used much. Output buffering has simple, practical applications, and it can also play a roll in more complicated systems. It is one of those tools that you might not realize you need if you don’t know that it exists, and it is my pleasure to introduce you to it.

PHP’s output buffering trinity

A typical PHP installation actually has three different layers of output buffering. The layer closest to the client is controlled by the output_buffering directive in php.ini. This setting can be set to On, Off, or an integer that represents the number of bytes at which PHP should flush the buffer. The purpose of this buffer is to control how much data is sent to the browser at a time. The options are fairly self-explanatory; Off sends the data immediately, and On collects the entire output of the script and sends it all at once. I’ll call this layer the output buffering layer.

I call the next layer the flush layer. It is another control that can simply be turned On or Off using the implicit_flush directive in php.ini. When implicit_flush is on, every output operation flushes immediately to the output buffering layer; otherwise, you have to call flush() to manually flush this buffer. By default, implicit_flush is disabled, except when using the CLI SAPI. This is a sensible default, because the constant flushing can generate a lot of overhead, particularly when output_buffering is disabled. If the purpose of the output buffering layer is to control how much data is output, this layer’s purpose is to control when data is output.

The last layer is the userspace output buffer, which is controlled by the various ob_* functions. It provides far greater control than the other layers, as well as greater flexibility. While this layer can be used to control how much data is sent and when the sending occurs, those are just two of its many tricks. The true purpose of this layer is to provide control over which data is output. I call this layer the ob layer, and it is the primary focus of this article.

The ob layer

Let’s start with a simple example:

<?php

ob_start();
echo "Here is some text.\n";
header('X-Some-Header: Some value');
ob_flush();

The above example is very simple, and it should be fairly obvious what is going on. First, we create an output buffer by calling ob_start(). From this point on, anything we output will be stored in this buffer. When we call ob_flush(), the contents of the buffer created by ob_start() are flushed to the next output buffer layer, which should be the flush layer, in this case. It’s that simple to create an output buffer.

One use of output buffers that you are sure to hear about is the ability to send a header after you output something. Since output is held in a buffer, you can still send headers and avoid the infamous “headers already sent” warning. This is particularly useful if you need to use a function that writes directly to the output, but you aren’t quite ready for it to do so. Some people argue that buffering output so that you can send headers later adds to the complexity of the code. Regardless of whether you agree, this feature only scratches the surface of output buffer utility. Let’s take a look at another example:

<?php

function output_handler($output) {
    return "<OB>\n" . $output . "</OB>\n";
};

ob_start('output_handler');
echo "This output just got handled.\n";
ob_end_flush();
echo "Some text outside of the buffer.\n";

You’ll notice two important additions to this code. First, ob_start() takes a callback or closure as its first argument. (I highly recommend using a string callback, which I’ll explain later.) The function referenced by that argument should take the content of the output buffer as its first argument, and it should return a string containing the processed output. The second thing you ought to notice is the call to ob_end_flush(). This will flush the current buffer and close it, so that future output does not use it. If you ran this code, you would see that only the content from the first echo is wrapped in the <OB> tags:

<OB>
This output just got handled.
</OB>
Some text outside of the buffer.

At this point, it should be easy to start dreaming up some uses for output buffers. The output handler argument gives you a lot of flexibility to process any amount of output without having to manage concatenating all of your output into a single variable.

You can already do some neat things with what we’ve learned so far, but PHP’s output buffer support goes much further. If you would prefer to apply different buffers to different pieces of output, simply call ob_end_flush() followed by a second ob_start():

<?php

function handler1($output) {
    return "<OB1>\n" . $output . "</OB1>\n";
};

function handler2($output) {
    return "<OB2>\n" . $output . "</OB2>\n";
};

ob_start('handler1');
echo "Output from the first output buffer.\n";
ob_end_flush();

ob_start('handler2');
echo "Output from the second output buffer.\n";
ob_end_flush();

Predictably, this outputs the following:

<OB1>
Output from the first output buffer.
</OB1>
<OB2>
Output from the second output buffer.
</OB2>

Nesting output buffers

You might discover that sometimes you want to use one output buffer on most things, and a separate output buffer on a small portion of your output. In this case, you can nest two (or more) output buffers. To do so, simply call ob_start(), then call it again before calling ob_end_flush(). Your output will be handled by the most recently opened buffer first, and work its way back to the first buffer that you opened.

<?php

function parent_handler($output) {
    return "<PARENT>\n" . $output . "</PARENT>\n";
};

function child_handler($output) {
    return "<CHILD>\n" . $output . "</CHILD>\n";
};

ob_start('parent_handler');
echo "Part of the parent ob.\n";
echo ob_get_level() . "\n";

ob_start('child_handler');
echo "Part of the child ob.\n";
echo ob_get_level() . "\n";

ob_end_flush();

echo "Back in the parent ob.\n";
echo ob_get_level() . "\n";

ob_end_flush();

Here is the output:

<PARENT>
Part of the parent ob.
1
<CHILD>
Part of the child ob.
2
</CHILD>
Back in the parent ob.
1
</PARENT>

Because it is so easy to start a new output buffer, it is important to keep track of which buffers you already have open. In the previous example, you’ll notice the calls to ob_get_level(), which always returns an integer to describe how many output buffers are open. Since there is no way to switch to a parent without closing the child, this also happens to describe the level of the current output buffer.

There are two more functions which are useful for keeping track of your open output buffers, ob_get_status() and ob_list_handlers(). ob_get_status() returns an array with the current level and the name of the callback function, as well as some other details about the buffers. If you call it with TRUE as the first argument, you will not only get details about the current output buffer level, but also all of the other levels that are currently open. ob_list_handlers() will return an array of all of the output handlers that will process the current output buffer. Earlier, I recommended using a string callback instead of a closure, because these functions can only tell you the name of the handler function if the handler function actually has a name.

There are some other important output handler functions. ob_clean() immediately discards the contents of a buffer but leaves the buffer open. ob_end_clean() discards the contents of the buffer and then destroys it. ob_get_length() returns an integer which represents the size of the current buffer. ob_get_contents() returns a string of the entire contents of the buffer, leaving the buffer in place.

There are two curiously named functions: ob_get_clean() and ob_get_flush(). You might suspect ob_get_clean() to work more or less like ob_get_contents(), followed by ob_clean(), but you’d be wrong. In fact, ob_get_clean() actually works more like ob_get_contents() followed by ob_end_clean(). It returns the output buffer’s contents as a string, and discards and closes the buffer. ob_end_get() seems like a more reasonable name to me, but would it really be PHP if all of the names made sense?

Conclusion

People are using output buffers to solve all sorts of problems already, and a bit of Googling will lead you to some interesting, terrible, and amazing ideas. Some of them are simple problems like replacing content (censoring profanity, adding HTML abbreviations, etc.) or stripping the unnecessary white space from output to reduce the output size. Others have found more complicated uses for output buffers, including templating engines, custom caching solutions, streaming, and more robust output buffer interfaces. The wide variety of applications that people have found for output buffers is a testament to their flexibility and utility. I hope this introduction has been helpful and has left you with some ideas about how to use output buffers in your next project.

Other posts