WEB Advent 2010 / Bits and PHPieces

I like to think of PHP as a mixed bag of tricks. The language itself was born out of some simple, real-world use cases, and that ideology still drives the language forward today. It sometimes isn’t pretty, but PHP isn’t meant to be pretty — it’s meant to solve problems.

One (some would say unfortunate) side-effect of this ideology is that there are quite a few things in the PHP language that exist only to solve one particular, and sometimes obscure, problem. Sure, the addition of an object model in PHP 4 (later rewritten in PHP 5) and the addition of closures and lambda functions in PHP 5.3 both constituted welcome additions to the language. Even the often-derided goto, feared by many a developer, is a general-purpose operator that has a relatively broad area of usefulness. But, what about some of the more obscure things that exist in PHP, or the multitude of extensions?

Down the rabbit hole

My favorite example of obscure functionality is embodied in tick functions and the declare construct. If you’ve never used these before, don’t worry; you’re not alone. What are they, you ask? Thankfully, the PHP documentation team has done a wonderful job describing what this oddity actually does.

“A tick is an event that occurs for every N low-level tickable statements executed by the parser within the declare block. The value for N is specified using ticks=N within the declare blocks’s directive section.”

So, it looks like this is meant to allow an arbitrary function to be executed “every N low-level tickable statements executed by the parser.” This, of course, comes with some pretty serious overhead; you’re basically telling PHP to execute a bunch of opcodes, stop, do whatever this tick function is supposed to do, and then resume.

Now, before you go out and write tick functions for profiling your apps (which is the first thing that comes to mind for most people), let me remind you that this functionality was introduced in the early days of PHP 4, and has generally not seen much use out in the wild; the highly inefficient nature of tick functions coupled with the fact that most of their usefulness has been superseded by more efficient and special-purpose solutions (e.g., Xdebug for profiling and debugging). Don’t be surprised if tick functions get removed from PHP at some point in the future, either. Considering most developers don’t even know the exist, I don’t believe they’ll be missed.

Curiouser and curiouser

PHP has a lot of built-in functionality for manipulating and searching strings, which isn’t surprising, considering that most web apps consist of fancy ways to display text-based data collected through HTML forms. Most developers will feel right at home with functions like htmlentities() and str_replace(), but what about metaphone() or soundex() to write a spell checker? Or, hebrev() to convert “logical Hebrew text to visual text.” And, there’s always my favorite string function, str_rot13(), which brings me back to my forum-trolling days when movie spoilers and the like were ROT13ed to ensure that casual readers wouldn’t mistakenly read the synopsis for last night’s episode of Friends.

Yet another of my favorite, lesser-known functions is token_get_all(). Now, if you’re familiar with some basic compiler theory and know what a lexer is, then this method is hardly surprising. For those unfamiliar with these concepts, don’t fret — the functionality of token_get_all() can be easily demonstrated with a simple example:

<?php

$tokens = token_get_all('<?php echo "Why, this watch is exactly two days slow."; ?>');

/*
$tokens = array(
    array(368, '<?php', 1), array(316, 'echo', 1),
    array(371, " ", 1), array(315, 'Why, this watch is exactly two days slow', 1')
);
*/

?>

We can see here that token_get_all() takes a string representing a PHP script as its only argument, and returns an array of so-called “token identifiers.” Now I don’t know about you, but I haven’t memorized the list of parser tokens yet. Thankfully, there’s token_name() for that:

<?php

echo token_name(368); // T_OPEN_TAG

?>

Having the ability to tokenize an arbitrary PHP source file is actually quite powerful; it has been used to make PHP-aware diffs, extract code metrics from PHP projects, and even do some lightweight template processing.

We’re all mad here

Now, the list of obscure-but-probably-useful functionality contained within the PHP language is pretty long, but we can’t forget PECL.

Have you ever wanted to figure out the gender of a person, given only a their first name? Well, there’s an extension for that. Need to implement a bloom filter for your project? Here you go. What if you need to implement an event-based server inside of PHP? Well, there’s an extension for that, too.

Sentence first, verdict afterwards

PHP was a language born out of simplicity, necessity, and getting things done. While the underlying implementation may have changed drastically over the years, the basic philosophy of the project has not changed much with time. And, while the oddities of the language can be maddening at times (I’m looking at you, needle versus haystack argument ordering) and the global function list can look like a grab bag of seemingly useless functionality, nearly everything exists to solve a problem that developers encountered. Do yourself a favor and take a few moments to look through the documentation, just to see what’s there. You might be pleasantly surprised by what you find.

Other posts