WEB Advent 2010 / Debugging

“There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.”

—  Sir Charles Antony Richard Hoare

As you’ve already been told, PHP is the language of the quick and dirty, where a developer’s tendency is toward laziness (efficiency).

We don’t always unit test our code. Sometimes, we take design shortcuts, even though we know they’ll hamper us later on. Business constraints tie our hands and prevent us from doing our jobs in an optimal way.

If we consider what Fred Brooks has to say, we realize that the long-term cost of such shortsightedness far outweighs the advantages. In his seminal work, The Mythical Man-Month, Brooks explains, “over 90% of the costs of a typical system arise in the maintenance phase, and that any successful piece of software will inevitably be maintained.”

Pay no attention to the man behind the curtain

All too often, “maintenance” means taking a bug report, figuring out how to reproduce it, and patching the affected code. Or, in more serious cases, redesigning part of your architecture.

Fortunately, when it comes to debugging, the Swiss Army knife in the PHP developer’s toolbox is Xdebug. Despite the incredible power and utility of this extension, it never ceases to amaze me how many developers haven’t installed it — or worse, have never heard of it.

Installing Xdebug couldn’t be easier. If you’re running a real operating system, installing it is the same as installing any other PECL extension; run pecl install xdebug. If you’re on Windows, you have to find the correct binary, which can be tricky. Fortunately, the author has been kind enough to provide a tool to help.

Once installed, you can see the benefits of Xdebug immediately. It replaces the plain messages generated by PHP errors and uncaught exceptions with formatted errors, including stack traces. (To see error messages in their full HTML glory, be sure html_errors is enabled.)

This is kind of a big deal. The most important (difficult) part of the debugging process is isolation, determining the root cause of the bug and where it occurs. In trying to isolate a bug, stack traces are indispensible; they let you know the path your code took to get there.

In addition to stack traces, Xdebug has two php.ini settings that are extremely useful. The first, xdebug.collect_params, specifies the level of detail of function or method parameters you’d like to see in your stack traces. The default value, 0, shows no parameter information, while a value of 4 shows the full details of every parameter. This may be a little much for most situations. I suggest tweaking this value as the need arises.

The second setting, xdebug.show_local_vars, further augments our ease of isolation by dumping the local variables where the error or exception occurred. Set this value to On to see a formatted dump of all in-scope variables.

What you don’t see

Using the tools presented above, you should now be fully equipped to go forth and tear through your bug lists with impunity. Right?

Wrong!

Two things Xdebug’s variable and parameter collection facilties won’t show you are the state of the current object scope (i.e. $this), and access to any data that exists outside of the method.

Rather than brushing this off as a deficiency in Xdebug itself, consider the implications on how you design your methods. The principal cause of software logic bugs is poorly managed state; you’re more likely to produce untestable, unmaintainable code by using data that can change unpredictably.

When we think about code that’s optimized for predictability, we think of simple things, like string manipulation functions. Consider str_replace(), a highly-testable function. Given any three inputs, we can expect consistent, predictable output. If a call were repeated with the same three parameters, we would get the same result, because str_replace() doesn’t depend on any external state to do its job.

Additionally, predictable code minimizes or eliminates side effects, such as echoing output, modifying a variable that’s passed by reference, or assigning headers. Side effects become problematic, because — in addition to isolating a particular method — you must also isolate what that method has changed elsewhere in the app.

Producing side effects and depending on external state in a method isn’t always as obvious as a reference to $GLOBALS (or superglobals). We also have singletons, the memorialization of global variables in object-oriented design.

There are subtle incursions of state as well, like using date or time information. While not typically thought of as state in an application, it is a piece of information that is constantly changing out of the scope of your code. That’s not to say it’s outright dangerous; just be mindful of how you use it.

For object-oriented code, you need to look no further than $this. Managing state effectively can be difficult in OOP, because OOP (unlike functional programming) doesn’t force you to think about state. Instead, it makes mutable state readily available, promoting laziness around one of the most critical precepts of software design.

Don’t misunderstand; this is not a universal condemnation of object-oriented programming. It’s simply a suggestion to think before you code. If you’ve only ever learned about software design from an object-oriented point of view, I advise you to study a language from another paradigm. You’ll broaden your horizons and gain a better understanding of your craft as a developer.

Getting closure

One final catch when it comes to debugging with Xdebug comes with a new feature of PHP 5.3, closures. When printing a stack trace containing a closure, instead of useful information, Xdebug simply prints {closure}. While we still see the file and line number where the closure was invoked in the stack, this isn’t enough to be able to quickly determine where it was defined.

Instead, we can create a short script that will convert a normal PHP stack trace to one that gives a bit of extra information about where a closure was defined. This code uses components from the Lithium framework in order to simplify the introspection of classes and methods, but the basic premise is simple, and no framework is required. You may opt to directly use PHP’s native Reflection API instead.

<?php

use lithium\analysis\Debugger;
use lithium\analysis\Inspector;

class ClosureLocator {

  public static function trace($stack = null) {
    $stack = $stack ?: Debugger::trace(array('start' => 1, 'format' => 'array'));

    foreach ($stack as $i => $frame) {
      $prev = isset($stack[$i - 1]) ? $stack[$i - 1] : null;
      unset($stack[$i]['args'], $stack[$i]['object']);

      if (strpos($frame['function'], '{closure}') === false || !$prev) {
        $stack[$i]['function'] = $frame['functionRef'];
        continue;
      }
      if ($class = Inspector::classes(array('file' => $prev['file']))) {
        foreach (Inspector::methods(key($class), 'extents') as $method => $extents) {
          if (!($extents[0] <= $prev['line'] && $prev['line'] <= $extents[1])) {
            continue;
          }
          $class = key($class);
          $reference = "{$class}::{$method}";
          $stack[$i]['function'] = "{$reference}()::{closure}";
          break;
        }
      } else {
        $reference = $prev['file'];
        $stack[$i]['function'] = "{$reference}::{closure}";
      }
    }
    return $stack;
  }
}

?>

First, this code generates a specially-formatted stack trace, which is essentially debug_backtrace() with a few sanity checks and formatting niceties. Next, it iterates over the stack, looking for frames containing calls made from closures. Once it finds one (and ensures that there’s a previous call in the stack to reference for pointer information), it attempts to determine whether the closure is defined in a class method or a procedural PHP file. This part of the code is making an assumption, specifically that your code uses a class-to-file naming convention.

If the closure is defined in a class, each of the class’s methods is iterated over and inspected to see whether the line number calling the subsequent stack frame is between the start and end lines of the method’s definition. Once a match is found, we have our origin class and method.

This is a good start. We’re now able to see a filename or method reference where a closure is defined inline in our stack trace. However, for quickest identification, we also want to be able to see which line number a closure was defined on.

For this, we can add the following method to our ClosureLocator class:

public static function definition($reference, $callLine) {
  if (file_exists($reference)) {
    foreach (array_reverse(token_get_all(file_get_contents($reference))) as $token) {
      if (!is_array($token) || $token[2] > $callLine) {
        continue;
      }
      if ($token[0] === T_FUNCTION) {
        return $token[2];
      }
    }
  } else {
    list($class, $method) = explode('::', $reference);
    $classRef = new ReflectionClass($class);
    $methodInfo = Inspector::info($reference);
    $methodDef = join("\n", Inspector::lines($classRef->getFileName(), range(
      $methodInfo['start'] + 1, $methodInfo['end'] - 1
    )));

    foreach (array_reverse(token_get_all("<?php {$methodDef} ?>")) as $token) {
      if (!is_array($token) || $token[2] > $callLine) {
        continue;
      }
      if ($token[0] === T_FUNCTION) {
        return $token[2] + $methodInfo['start'];
      }
    }
  }
}

This method takes either a filename or the name of a class and method, separated by a double colon. If $reference is a filename, the file is tokenized, and we begin iterating over the tokens in reverse order, skipping ahead until we find where the next call (outbound from the closure) took place.

Once we find the line where the call took place, we continue iterating backward over the tokens until a T_FUNCTION is encountered, indicating the beginning of the closure’s definition. The line number of the token is then returned.

The second half of the method works in much the same way. By using some techniques from above, we can determine the line numbers in the file where the method is defined. Then, we extract just the code for the method definition, tokenize it, and begin iterating over the tokens in reverse. After skipping past all of the tokens that appear after the outbound method call, we look for the first instance of T_FUNCTION. Once found, we return the line number relative to the extracted snippet, plus the offset of the start of the method definition (where we extracted the code).

Finally, we’ll add the following lines to the trace() method, just before the closing brace of the outer foreach loop:

$line = static::definition($reference, $prev['line']) ?: '?';
$stack[$i]['function'] .= " @ {$line}";

This invokes our new method and appends the line number to the end of the closure definition reference.

Now, we can call our method from wherever we need a stack trace, and we should be able to see something more informative. Stripped down to just the information we generated, our stack trace should look something like this:

Array
(
    [0] => app\controllers\HelloWorldController::test()::{closure} @ 24
    [1] => lithium\net\http\Router::parse()
    [2] => lithium\net\http\Router::process()
    [3] => lithium\action\{closure}
    [4] => lithium\action\Dispatcher::run()::{closure} @ 108
    [5] => /path/to/app/config/bootstrap/action.php::{closure} @ 42
    [6] => lithium\core\Staticobject::_filter()
    [7] => lithium\action\Dispatcher::run()
    [8] => [main]
)

In conclusion

There you have it. With good tooling and software design practices, you can save yourself a world of pain and save your company from collapsing under the weight of hard-to-maintain projects. What could be more reason to celebrate this holiday season?

Other posts