WEB Advent 2012 / The Three Ugly Sisters

The Three Ugly Sisters are three classes of attacks which I’ve tried to highlight in 2012. You might also know them as Cross-Site Scripting, XML Injection, and Insufficient Server-Side Transport Layer Security (or Peerjacking). These three attacks are particularly ugly for PHP programmers, because each, in its own way, has a common advantage for attackers — PHP does not defend against them automatically, and they are poorly documented and poorly understood by programmers. This potent mix of default vulnerabilities, programmer ignorance, and poor reference material culminate in the sort of security vulnerabilities that an attacker can find in the wild without trying very hard.

Let’s take a brief look at each of them.

Insufficient server-side Transport Layer Security (Peerjacking)

The easiest way to fathom the mysteries of Transport Layer Security is to remember that your web app sometimes behaves like a modern browser. It can make GET and POST requests like a browser, over HTTPS. It can act as an intermediary, shuffling a user’s personal data to and fro between your server and the server of any third party (including within your own network). It can consume masses of information, store it, and display it back to users, over HTTPS.

Browsers and SSL/TLS have a simple relationship. Browsers implement it correctly, strictly, and monitor their implementation and the CAs they trust very closely. Any failure on their part would be incredibly embarrassing and harmful to their share of the browser market.

Unfortunately, it is far from unusual to find PHP libraries and applications where SSL/TLS protections have been accidentally or even deliberately disabled. Since SSL/TLS is designed to prevent data interception, request manipulation, request replays, and other attacks that are designed to do harm to users, disabling SSL/TLS or being unaware of how to configure it correctly is not an acceptable behavior in PHP. Yet, it remains ludicrously common. As programmers, we also need to be familiar with the prevailing data privacy legislation, privacy ethics, and corporate guidelines which may apply to user data in the jurisdiction or corporate setting we operate within.

Here are two examples of inappropriate HTTPS usage in PHP followed by their correctly configured variants. There is one each for PHP Streams and the cURL extension. The common factor between them is actually very simple — they disable two essential checks in SSL/TLS which we can call peer verification and domain matching. Peer verification guarantees the validity of the SSL certificate offered by the contacted server. Domain matching ensures that the offered SSL certificate is for the host or domain name we connected to. If we fail to verify the peer’s SSL certificate, then we’d never notice if was a self-signed fake, or if it was signed by an untrusted Certificate Authority. It might also have expired. If we fail to perform domain matching, then the attacker could use any valid SSL certificate for any domain or host (whether one they purchased or stole the private key for), so long as the certificate used is capable of passing Peer Verification. So, you need both Peer Verification and Domain Matching enabled in order to be completely secure.

PHP streams (the wrong way)

$url = 'https://api.twitter.com/1/statuses/public_timeline.json';
$result = file_get_contents($url);

PHP streams (the right way)

$url = 'https://api.twitter.com/1/statuses/public_timeline.json';
$contextOptions = array(
    'ssl' => array(
        'verify_peer'   => TRUE,
        'cafile'        => __DIR__ . '/cacert.pem',
        'verify_depth'  => 5,
        'CN_match'      => 'api.twitter.com'
$sslContext = stream_context_create($contextOptions);
$result = file_get_contents($url, NULL, $sslContext);

cURL (the wrong way)

$url = 'https://api.twitter.com/1/statuses/public_timeline.json';
$req = curl_init($url);
curl_setopt($req, CURLOPT_SSL_VERIFYPEER, FALSE); // Disable Peer Verification
curl_setopt($req, CURLOPT_SSL_VERIFYHOST, 0); // Disable Host Matching
/** OR **/
curl_setopt($req, CURLOPT_SSL_VERIFYHOST, TRUE); // TRUE = 1 when it should be set to 2!
$result = curl_exec($req);

cURL (the right way)

$url = 'https://api.twitter.com/1/statuses/public_timeline.json';
$req = curl_init($url);
$result = curl_exec($req);

$error = curl_errno($req);
|| $error == 77) {
    curl_setopt($req, CURLOPT_CAINFO, __DIR__ . '/cert-bundle.crt');
    $result = curl_exec($req);

While cURL on a platform like Ubuntu will be configured with access to a bundle of trusted Certificate Authority certs (e.g., the manually-added cert-bundle or cacert files from above), we should assume that this is probably not the default for many servers (and certainly not when using PHP’s HTTP stream wrapper). Make sure to configure a path to such a file as necessary and ensure libraries you use are doing the same!

As an exercise in awareness, use some of the configuration names above to run a search on libraries you use (or search on GitHub if feeling brave). The same exercise can apply to the following sections, too — these are not only common vulnerabilities, but also easy to locate with grep — their existence revealed by the detection, or lack thereof, of the key configuration constants and function names needed to enable or disable them.

For further reading, see Insufficient Transport Layer Security (HTTPS, TLS and SSL).

XML injection (XMLi)

This class of attacks revolve around PHP’s reliance on the libxml2 extension used by DOM, SimpleXML, and XmlReader. While it is typical to view XML as a simple text format, we should remember that XML can drive an interpreter (via PHP) to perform a number of unexpected actions. One of these is resolving external entity references.

A simple entity like & expands to a lone ampersand when you use DOM to extract the text it’s included in. Similarly, you can also define your own custom entities in XML. A simple entity defined as a very long string which is then repeated an awful lot of times in an XML document can turn any document with a modest filesize into a rapidly expanding RAM eating monster when the custom entity is resolved into ever larger chunks of text, i.e., a Quadratic Blowup Attack. For example:

<?xml version="1.0"?>
    <result>Now include &long; lots of times to expand
    the in-memory size of this XML structure</result>
    Keep it going...

This raises a serious risk — we can get XML from third-party services, from users via the browser (e.g., Ajax), and even from our own local filesystem in configuration files both written by us and distributed with third-party libraries and applications. That’s a lot of exposure to XML — a lot of potential targets for an attacker.

Custom entities can also refer to external XML resources, i.e., an external entity. Such external resources can be retrieved by using a PHP stream reference like a file path, a URL, or a PHP filter wrapper URI (to base64 encode non-XML content retrieved). This feature, which is enabled in PHP by default, means that any interpreted XML can potentially access such resources, have them inserted into the textual context of any XML element, and then potentially displayed back to the attacker. This therefore leaves local files readable by PHP and local URLs subject to localhost style access controls subject to Information Disclosure. It may also facilitate making the victim an unwilling participant in a DDoS attack on themselves or a third party.

<?xml version="1.0"?>
<!DOCTYPE results [
    <!ENTITY harmless SYSTEM
    <result>I am &harmless; - honest!</result>

The defense to XML Injection is simply to disable custom entities wherever possible. Luckily, libxml2 has an innate defense against other exponential attacks such as the Billion Laughs attack where deeply nested entities (all referring to each other) can be packed into an even smaller XML file.


And, optionally, as a sanity check before accepting any XML for processing:

$dom = new DOMDocument;
foreach ($dom->childNodes as $child) {
    if ($child->nodeType === XML_DOCUMENT_TYPE_NODE) {
        throw new \InvalidArgumentException(
            'Invalid XML: Detected use of illegal DOCTYPE'

For further reading, see XML Injection Attacks.

Cross Site Scripting (XSS)

XSS is never too far from the minds of a PHP developer. One area that still needs improvement is weaning PHP developers away from the concept of htmlspecialchars() being the sole defense required. In reality, XSS can target not only HTML, but also JavaScript, URIs, and CSS. Outside of the HTML context, our obsession with htmlspecialchars() is utterly useless and borders on misinformation, since articles and books on the topic frequently omit anything beyond HTML escaping. We need other escaping methods suited to these other contexts. This is referred to as context-based escaping.

Aside from context-based escaping, there is the continual need to be wary of allowing users to submit textual data as HTML. The two usual methods for allowing users to use HTML securely is to employ something like HTML Purifier (the only solution of this type I recommend) or a simpler intermediary syntax such as BBCode or Markdown. Markdown has gained popularity among programmers — it’s the primary format used by GitHub. The problem with intermediary languages is that they are converted into HTML, so you still need to use HTMLPurifier on their output! It’s a subtle misunderstanding, but intermediary languages are for the benefit of the users not the security of your application. Markdown actually includes all of HTML (including script tags) as valid Markdown syntax.

In terms of context-based escaping, your best choice is to use an established solution that can be peer reviewed and rapidly tested. Zend Framework 2 offers the reusable \Zend\Escaper class as a starting point. If you use Symfony 2, the Twig library includes \Zend\Escaper compatible escaping methods for HTML, HTML attributes, JavaScript, and CSS.

Outside of this approach, you should be wary of custom JavaScript escaping. A recent recommendation emerging online is to use the json_encode() function for escaping JavaScript string literals and integers. These recommendations usually neglect to mention that JSON encoding and JavaScript encoding are not the same thing. If using json_encode(), you should be aware that it only supports UTF-8, and you must use many of its allowed flags from PHP 5.3. Avoid it altogether for PHP 5.2 — it doesn’t even escape the ampersand, which can be used to construct HTML entities interpretable in XML-serialised HTML5 or any other HTML where a DOCTYPE has not been defined for the document (i.e., what you expect to be CDATA data may end up being PCDATA in some circumstances).

For further reading, see XSS (Cross Site Scripting) Prevention Cheat Sheet and Escaping RFC for PHP Core.

Other posts