Drupal performance - the next step

While Drupal's performance is always of interest, it has a hard time defending itself against the features people want to add.

There are different ways to address this, but the "less features" approach is usually not defensible.

To defend itself from the feature onslaught, Drupal tries to load as few lines of PHP code as possible, which helps to increase performance. A PHP opcode cache (such as APC) helps even more and points the way to where further improvements can be made: outside of conventional PHP.

One idea that has come up several times is to create a Drupal-specific PHP extension that will reimplement some of Drupal's often used functions in C. Such an extension exists. It implements Drupal's check_plain and Drupal 7's drupal_static. One of our goals is to examine the gains brought by this approach.

Another very interesting approach is to remove PHP entirely from the code that is actually executed by the webserver. There have been a number of attempts over the years, but one can't say that any of them has found wide adoption yet.

Hiphop is a recent addition to this group of compilers, it was released by Facebook and is used in production by them to reduce load across their infrastructure. This latter aspect makes it interesting to look into and see if it can be used with Drupal. Essentially, Hiphop translates your PHP code to C++ which is then compiled to a binary.

Besides the actual performance improvements, one should not disregard the total effort required to add and keep hiphop or a PHP extension in the toolchain:

To use the Drupal PHP extension, you only need a minimal patch to Drupal core. You then need to compile the extension and install it, which is also rather easy, but you will need to find a way to keep track of the extension and make sure that it will be available e.g. on new webservers you set up. However, in order to increase the use of the extension, you'll need to recreate all the Drupal functions that you have identified as being costly. Should there be a change in the core Drupal function, you will need to update your C reimplementation. On the other hand, some internal Drupal functions don't change all that much and the C code can be re-used across several Drupal versions.

To use hiphop, you need to first install a whole lot of libraries that aren't usually installed on a webserver. To make this easier, Tag1 consulting has developed HipHop RPMs for CentOS. There are also Debian packages maintained elsewhere. After you have successfully installed hiphop, you need to compile your PHP application. This is rather easy if you assemble a list of files that make up your PHP application and tell hiphop to compile all of them. The compilation will take quite a bit of time, but will succeed in the end if it is compatible. Drupal isn't completely compatible with hiphop. For Drupal 6 there is a small patchset on drupal.org which replaces calls to preg_replace(). Drupal 7 currently does compile with hiphop, but there is a bug in hiphop's implementation of the MySQL PDO layer that will not allow you to actually run it.

Since the resulting hiphop binary is not a PHP application anymore, there will be subtle differences in behavior. One example is Drupal's way to enable and disable modules: If you disable a module, the file will not be read in by PHP and thus it's functions will not be available. If you disable a module in hiphop compiled Drupal, you can still switch the status entry in the system table, but the functions of the module will still be available. You thus need to make sure you only compile in the modules you actually want to run to exclude unforeseen consequences.

We have compared Drupal's performance for a total of ten setups:

A) Drupal 6

1) Drupal running under PHP as an Apache module

2) Same as 1), with the Drupal PHP extension

3) Same as 1) with APC opcode cache

4) Same as 2) with APC opcode cache

5) Drupal compiled with Hiphop

6) Drupal compiled with Hiphop, only required files

B) Drupal 7

1) - 4) as in A)

All installations will use the same database with the D7 version upgraded from the D6 version. The database as a few thousand nodes, users, and comments, as well as taxonomy terms. The actual content composition isn't very important since we are interested in PHP and not MySQL performance.

All measurements are done with a modified version of siege. The modification is to increase the precision of the measurements (or rather the amount of digits written to the results file).

All the final measurements where done on a rackspace virtual machine with 2G RAM and four cores.

We have done tests with two pages, a single node with comments, and the forum overview page. All requests were done as an anonymous user with the page cache disabled. This is appropriate as we are only interested in raw php performance. We've requested several thousand pages for each test, using single concurrency.

Evaluating the tests proved much more difficult than was anticipated. As a standard procedure, most people are happy to specify the average of the measured quantity. More diligent people then also specify the standard deviation as a measure for the statistical significance of the result as is appropriate for a normal distribution. On closer examination it was observed that the resulting distribution of measurements rather resembles a mixture of skewed distributions and thus this procedure isn't appropriate.

Ideally, one would try to find out how to do one or more of the following:

  • Find reasons for the appearance of a mixture of distributions
  • Suppress some of these reasons for the existence of this mixture
  • Find a mathematical description for the mixture
  • Find a suitable model to express the statistical error for this mixture

While interesting, all the above is difficult and exceeds the experience of the experimenter with hard-core statistics. I can make the raw data available to somebody who has more of that.

We therefor do the following: the one-sigma boundary of the normal distribution marks the area where 68% of all results of the experiment can be found. We find a similar boundary by computing the same 68% threshold from the result of our data and give the resulting difference to the mean as an error estimate. Since we have taken a significant number of measurements, this should not be too far off. Regardless, it needs to be regarded with some caution.

So, here's the graphics of the evaluation of the main forum page.

There's a PDF of the graphics too.

The result is obviously, that Hiphop has by far the most advantage over a "normal", e.g. PHP+APC Drupal, install, a whopping 30%. Also, the gains from the PHP extension in any case are rather minor (2-3%).

An additional result is that sadly Drupal 7 is much slower (60%), at least for this page.

Now, what's the conclusion? The conclusion is that Hiphop can give you gains not easily possible with other methods. Does that mean everybody should run it? That depends on the effort you are willing to put into it. For Drupal 6, you would need to check if Drupal behaves as Drupal should. For this it would be nice to have unit tests. Unfortunately, there aren't many. Drupal 7 has good coverage with unit tests, but as explained Hiphop needs to be fixed to run it properly.

The big advantage of Hiphop vs the PHP extension can be easily explained by the fact that the extension only translates a small part of Drupal to a high-performance language, whereas Hiphop does so for the complete code base. You can turn this argument around and say that a 2-3% improvement is a lot considering that this was achieved by implementing one (D6) or to (D7) functions. This makes sense, too, but in order to achieve a higher quota, you'd have to re-implement a lot of functions by hand. The hiphop approach has a lot of appeal here, since you can continue to write all code in PHP.

The next steps should be:

1) Evaluate other types of pages (single node view)

2) Look into general system performance under load.

This project was sponsored by examiner.com.