WEB Advent 2011 / Out with the Old

Jetpacks. Flying cars. Databases able to handle infinite amounts of data. Breakthrough after breakthrough, computer engineering forges a shiny future, and yet…

One of the first interview questions I was asked was to describe, in my own words, the difference between an inner join and an outer join. It is a question I have adopted, and I now ask it to every candidate. Over time, I’ve been interviewing more and more senior developers, most of whom know the correct answer, so this question has become a way to gaze into the developer soul instead, providing a brief glimpse of the developer’s professional experience. Typically, answers take one of the following paths:

  • A dry, textbook answer. Bonus points for using table EMPLOYEES in the example.
  • A MySQL-specific answer, illustrated with a set of tables to support a blogging app.
  • Data relationships (and therefore joins) explained as a set of properties of objects. Most answers here include User objects and Friend relationships.

I see the last answer more frequently these days, which makes sense. NoSQL is making huge inroads in the world of software engineering, and frameworks are popping up left and right divorcing developers from the need to interact with databases (sorry, “data stores”) directly. The combination of Moore’s Law and developers’ access to as much computing power as they need — or want — creates a situation where developers can get away with abandoning a schema-first approach to app design. This is great for those who have never learned SQL properly, or those who don’t want to feel shackled by a relatively inflexible schema and prefer to focus on business priorities instead.

Not designing the schema first is cool, but data still has to live somewhere and continue to be managed and monitored. We have not skipped over databases; we’ve just changed, quite drastically, how we think about them. Over time, relational databases battled each other, pushing features and vertical designs (I’m looking at you here, Oracle) that promised end-to-end data management. In the meantime, NoSQL data stores thought about separation of function and specialization. Instead of trusting relational databases to solve the problems behind the CAP theorem, they went all postmodern on data-driven software and deconstructed how data is stored, retrieved, synchronized, and persisted within an app.

Gone are the days where a web app was a database with some PHP in front of it. Web apps are now layers and layers — a lot of them independent, many of them using completely different technologies, and some interacting with each other via HTTP-based APIs. The art of scaling web apps is the art of shifting bottlenecks around, and in data-first design, the bottleneck was frequently the monolith database. These days, you might encounter some of the following within your app:

  • A caching layer from which most stored data is retrieved, usually implemented using Memcached or Redis.
  • Decoupling the volume of user activity from the expectations of app responsiveness by delaying the computation of some tasks. This is sometimes implemented by using a queue-specialized data store — either by using a great key/value store like Redis or an off-the-shelf product like Gearman.
  • Data that is separated horizontally across many databases — perhaps a few MySQL instances with sharded/replicated data.
  • Data that is separated based on natural parameters, such as user data in a graph database like Neo4j for easy relationship digging, and articles the users write in a MongoDB instance for quick reads and writes.
  • A standalone search index of the content of your app in a search-specialized data store, such as Solr.
  • A layer — invisible to the end use — that siphons samples data from designated data store locations, then crunches it until it’s transformed into information and, later on, knowledge.

The bottlenecks still exist, but they are isolated explicitly in a way that can be either easily refactored or replaced with a new technology when it is invented. Your search might be slow, but in a few months, Solr might get replaced with X. Your articles might require more error-tolerant write mechanism, but in a few weeks, your MongoDB instances might get replaced with Y.

From the standpoint of an SQL aficionado, I enjoy seeing how relational databases continue to be used but are forced to evolve to fit the needs of the new world. Your MySQL is no longer responsible for searching or managing user relationships, but it may be used as a queueing mechanism, or perhaps an interim caching layer. Relational databases are no longer used as the end-all solution for data management, but they are not discarded and can still be used in a lot of the new ways developers build web apps. Perhaps, in a way, the slight distrust developers still have of NoSQL (e.g., how certain products are still “immature”) means that relational databases are here to stay purely out of inertia, but this also gives developers a chance to be creative with the existing technology.

Here are a couple of examples of using MySQL in interesting (and it’s up to you whether unwise) ways:

In the last decade, we have changed how we interact with relational databases and what we expect from them. NoSQL has not (yet) offered us a flying car in the form of a be-all and end-all data store. In the meantime, we must make do we the tools that we already have, while hoping for and working on new inventions!

Until next time! (Starts her jetpack and flies away.)

Developer Gift

Gadgets and software are always nice, but think about making your developer gal or guy’s workspace a bit nicer. Nothing makes your desk feels cozier, saner, and healthier than an awesome plant. My favorite is the easy-to-grow monstera deliciosa, also known as the Swiss cheese plant. If you suspect your gal or guy to be terrible at plant management, I suggest a set of nicely potted succulent plants. They’re easy to grow and nigh impossible to kill. Perhaps a terrarium of succulent plants from this Etsy shop?

Other posts