WEB Advent 2010 / Legacy Jungle

At some point in every developer’s career, they get the opportunity to work on a gnarly, horrible briar patch of a project. Surviving that experience sharpens one’s mind and skills. I like to believe that no developer sets out to create a monstrous mess of code. Instead, messes grow organically. As developers are pressed for time, lack experience, or are faced with ever-shifting requirements, the mess gets bigger until it's too big and hairy to handle. However these messes come to be, it could one day be your job to tame and nurture such a beast. Being prepared will save you stress and will leave you with more hair.

Get your bearings

Knowledge and information are critical when working in a jungle. Obtaining information should be your first priority. Are any of the previous authors around? Or are you going to have to face the beast on your own? If you have access to the version control history, read the recent history. Get to understand what’s been going on in the code and where the trouble zones might be. Read everything you can get your hands on. The code, the comments, and version control history — these are all clues into what happened and how things came to be. Next, get a sense of the overall health of the code base. Does it run without errors? Does it even work like it’s supposed to? Does it have any tests? Do these tests run? Having answers to these kinds of questions should help broaden your understanding of the project and what challenges you might face.

Mapping the Jungle

Legacy code often feels like an untamed jungle. Much like a real jungle, a map is essential if you want survive. Generating an include graph with inclued and using graphviz to create diagrams of how files get included will help you understand how things work. After installing these tools, you can generate image files by running:

php /path/to/pear/graphviz.php -i /tmp/inclued.xxx.x -o inclued.dot
dot -Tpng -o inclued.png inclued.dot

This type of map can help clarify how a messy code base gets up and running. Generating call graphs can be useful when you have to deal with tangled logic, or messy function call stacks. I use Xdebug and tools like Webgrind and xdebugtoolkit to create call graphs and visualize how an application runs. Creating callgraphs can give you much needed insight into the internal structure of twisty code. I also take a lot of notes — they almost always come in handy either for myself or for my colleagues as we tame the jungle.

Taming and pruning

You can only work in the jungle for so long before it starts to consume you. At some point, you will want and need to start making changes, whether those changes are for your own sanity or to meet business needs. You should start clearing paths and colonizing the jungle as early as possible. Before attempting to do so, it’s good to do some planning and create safeguards. Eisenhower once said “Plans are worthless, but planning is everything.” While plans may be useless shortly after they are created, discussing and thinking about how the code should function and how it will look in the future is invaluable. It will help you focus on what needs to be done, and if you are working with others, it will help bring consensus on the direction you will take.

With plans in hand, its time to create some safeguards. The number of these that you create will depend on the amount of work to be done. I usually make safeguards in the form of automated tests. If you your project already has tests, add more until you feel confident that the changes you need to make won’t break everything. PHPUnit and Selenium are fantastic tools for creating automated tests that can give you the confidence you need to make changes. Having a test suite helps ensure what used to work still does, and that you haven’t made the mess worse. I aim to add tests for the most important and critical parts of a project, and spread outward from there.

With automated tests in hand, you can start making changes. Depending on the level of mess, there could be a few easy wins. I often start by finding sections with a high rate of code duplication, and moving that code into shared functions. Reducing duplication is an easy way to make maintenance easier, as there will be less code to maintain afterwards.

Detangling code and separating concerns is another fruitful effort. By separating SQL queries/logic and HTML you make it easier to spot duplication. Separated concerns also make it easier to write tests, so you can be confident that your code still works.

Legacy code is a fact of life, but if you’re smart, careful, and stay vigilant, you can transform any haphazard mess of code into a well-kept garden. It just takes time and patience, much like gardens in the real world.

Other posts