Tag1 Consulting

Performance and Scalability Experts


Watching remote tests run

It can be incredibly helpful when you're troubleshooting Behat tests to watch the tests execute. It's fairly straightforward to install Selenium locally and watch @javascript tests execute in your browser of choice, a bit more challenging remotely.

Here's how I set up to do that on a remote Ubuntu 14.04 server.

VNC on the Server

  1. Install dependencies:

    sudo apt-get install Xvfb tightvncserver xterm firefox

Not enough entropy

I was writing documentation for using VNC to watch Behat tests being executed with the selenium2 driver on a remote server, when I ran into a strange behavior.

I'd set up Behat 3 on my desktop and was successfully running Selenium Server 2.42.2 with Firefox 31. After following the same setup process I'd used locally on a clean Digital Ocean VM, the Behat tests wouldn't run.

Drush RPMs

I was recently working on scripting some OS installs of CentOS 5 and 6. As part of the deployment, I required drush be installed. Now, I’ve considered using the drush package found in EPEL but it don’t meet my needs for a number of reasons:

  • It is built for Drupal 6.
  • It has a dependency on the Drupal 6 package in EPEL meaning I have to install that if I want to pull in drush.

Tackling oversized cache items in Drupal

Drupal’s highly dynamic and modular nature means that many of the central core and contrib subsystems and modules need to maintain a large amount of meta-data.

Rebuilding the data every request would be very expensive, and usually when one part of the data is needed during part of the request, another part will be needed later during the same request. Since just about every request needs to check variable_get(), load entities with fields attached etc., the meta-data needs to be loaded too.

The pattern followed by most subsystems is to put the data into a single large cache item and statically cache it. The more modules you have, the larger these cache items become — since more modules mean more variables, hook_schema() and hook_theme() implementations, etc. And the same happens via configuration with field instances, content types and default views.

This affects many of the central core subsystems — without which it’s impossible to run a Drupal site — as well as some of the most popular contrib modules. The theme, schema, path alias, variables, field API and modules system all have similar issues in core. Views and CCK have similar issues in contrib.

With just a stock Drupal core install, none of this is too noticeable, but once you hit 100 or 200 installed modules, suddenly every request needs to fetch and unserialize() potentially dozens of megabytes of data. Some of the largest cache items like the theme registry can grow too large for MAX_ALLOWED_PACKET or the memcache default slab size. Since the items are statically cached, these caches can easily add 30MB or 40MB to PHP memory usage combined.

The full extent of this problem became apparent when I profiled WebWise Symantec Connect site (Drupal.org case study). Symantec Connect currently runs on Drupal 6, and as a complex site with a lot of social functionality has a reasonably large number of installed modules.

Memcached and PECL memcache on CentOS and Fedora

At Tag1 Consulting we do a lot of work on increasing web site performance, especially around Drupal sites. One of the common tools we use is memcached combined with the Drupal Memcache module. In Drupal, there are a number of different caches which are stored in the (typically MySQL) database by default. This is good for performance as it cuts down on potentially large/slow SQL queries and PHP execution needed to display content on a site.

Entity is your friend

We are currently creating a website where you have episodes. Each episode has a video which has rights attached to it. The rights are fed into the system by an XML feed. Each right has a type, a start of availability, end of availability, a price. We need to store these somewhere...

Stop Disabling SELinux!

I see a lot of people coming by #centos and similar channels asking for help when they’re experiencing a problem with their Linux system. It amazes me how many people describe their problem, and then say something along the lines of, “and I disabled SELinux...”. Most of the time SELinux has nothing to do with the problem, and if SELinux is the cause of the problem, why would you throw out the extra security by disabling it completely rather than configuring it to work with your application?

Imported DISQUS Comments Not Showing Up On Drupal Nodes

DISQUS is a popular "social commenting" platform. It is integrated with many hosted blog platforms and open source CMSes, including Drupal. A client of ours exported the comments from their old Wordpress blog and then imported them into DISQUS. The problem was that the comments were showing up in the DISQUS dashboard, however, when you clicked their corresponding URLs, these imported comments did not appear in Drupal. While the Drupal module looks for comments on the node/X URLs, DISQUS was storing them at the old Wordpress URL which were implemented as path aliases in this case.

Tag1 Sponsors Narayan Newton For Drupal.org Infrastructure

Tag1 Consulting is sponsoring my work on Drupal.org Infrastructure. What this means is that instead of working on drupal.org whenever I can, I get to spend 20 paid hours per week on drupal.org infrastructure. In return for this, I have agreed to write a blog entry per month describing some of my work in detail. These will be entries covering security, performance, high-availability configuration and anything else interesting in my work on drupal.org. Hopefully these will be useful.

I look forward to spending more time securing and improving the performance of drupal.org and would like to thank everyone at Tag1 and our clients for this opportunity.

Tracking contrib and core patches with schema changes

During performance and scalability reviews of sites, we regularly find ourselves submitting patches to contrib modules and core to resolve performance issues.

Most large Drupal installations we work with use a variation of this workflow to track patches:

  • Upload the patch to an issue on Drupal.org if it's not already there
  • Add the patch to a /patches folder in revision control
  • Document what the patch does, the Drupal.org nid, and a reference to the ticket in the client's issue tracker in /patches/README.txt
  • Apply the patch to the code base
Syndicate content