Sam Boyer

Tag1 Quo: Versions, Versions Everywhere, Part 1

When Tag1 decided to build Tag1 Quo, we knew there was one question we’d have to answer over, and over, and over again: is there a security update available for this extension? Answering that question - at scale, for many websites, across many extensions, through all the possible versions they might have - is the heart of what Quo does.

The problem seems simple enough, but doing it at such scale, for “all” versions, and getting it right, has some deceptive difficulties. Given a site with an extension at a particular version, we need to know where it sits on the continuum of all versions that exist for that extension (we often refer to that as the “version universe,”), and whether any of the newer versions contain security fixes.

There are a few different approaches we could’ve taken to this problem. The one we ultimately settled on was a bit more abstracted than what might initially seem necessary. It was also not a “typical” Drupal solution. In this blog series, I’ll cover both the theoretical foundation of the problem, and the approach we took to implementation.

What’s a version?

Let's start at the beginning.

Quo works by having existing Drupal 6 (for now!) sites install an agent module, which periodically sends a JSON message back to the Quo servers indicating what extensions are present on that site, and at what versions. Those “versions” are derived from .info files, using functions fashioned after the ones used in Drupal core. Once that JSON arrives at Quo’s servers, we have to decide how to interpret the version information for each extension.

All versions arrive as strings, and the first decision Quo has to make is whether that string is “valid” or not. Validation, for the most part, means applying the same rules as applied by the Drupal.org release system. Let’s demonstrate that through some examples:

6.x-1.0

Drupal extension versions are a bit different than most software versions in the wild, because the first component, 6.x, explicitly carries information about compatibility, not with itself, but with its ecosystem: it tells us the version of Drupal core that the extension is supposedly compatible with.

The next part, 1.0, holds two bits of information: the major (1) and minor (0) versions. It’s wholly up to the author to decide what numbers to use there. The only additional meaning layered on is that releases with a 0 major version, like 6.x-0.1, are considered to be prerelease, and thus don’t receive coverage from the Drupal.org security team. Of course, Quo’s raison d’etre is that Drupal 6 is no longer supported by the security team, so that distinction doesn’t really matter anymore.

Now, 6.x-1.0 is an easy example, but it doesn’t cover the range of what’s possible. For example:

6.x-1.0-alpha1

This adds the prerelease field - alpha1. This field is optional, but if it does appear, it indicates the version to be some form of prerelease - and thus, as above, not covered by the security team. Drupal.org’s release system also places strong restrictions on the words that can appear there: alpha, beta, rc, and unstable. Additionally, the release system requires that the word must be accompanied by a number - 1, in this case.

There are a couple of notably different forms that valid Drupal versions can come in. There can be dev releases:

6.x-1.x-dev

Dev releases can only have the compatibility version and a major version, and cannot have a prerelease field. They’re supposed to represent a “line” of development, where that line is then dotted by any individual releases with the same major version.

And of course, Drupal core itself has versions:

6.43

Core versions have the same basic structure as the major version/minor version structure in an extension. Here, the 43 is a minor version - a dot along the development line of the 6 core version.

These examples illustrate what’s allowed by the Drupal.org release system, and thus, the shape that all individual extensions’ version universe will have to take. All together, we can say there are five discrete components of a version:

  • Core version
  • Major version
  • Minor version
  • Prerelease type
  • Prerelease number

Viewed in this way, it’s a small step to abstracting the notion of version away from a big stringy blob, and towards those discrete components. Specifically, we want to translate these versions into a 5-dimensional coördinate system, or 5-tuple: {core, major, minor, prerelease_type, prerelease_num}, where each of these dimensions has an integer value. Four of the components are already numbers, so that’s easy, but prerelease type is a string. However, because there’s a finite set of values that can appear for prerelease type, it’s easy to map those strings to integers:

  • Unstable = 0
  • Alpha = 1
  • Beta = 2
  • Rc = 3
  • (N/A - not a prerelease) = 4

With this mapping for prerelease types, we can now represent 6.x-1.0-alpha1 as {6,1,0,1,1}, or 6.x-2.3 as {6,2,3,4,0}.

However, that’s not quite the end of the story. The Quo service is all about delivering on the Drupal 6 Long Term Support (D6LTS) promise: providing and backporting security fixes for Drupal 6 extensions, now that they’re no longer supported by the security team.

Because such fixes are no longer official, we can’t necessarily expect there to be proper releases for them. At the same time, it’s still possible for maintainers to release new versions of 6.x modules, so we can’t just reuse the existing numbering scheme - the maintainer might later release a conflicting version.

For example, if D6LTS providers need to patch version 6.x-1.2 of some module, then we can’t release the patched version as 6.x-1.3, because we don’t have the authority to [get maintainers to] roll official releases (we’re not the security team, even though several members work for Tag1), and the maintainer could release 6.x-1.3 at any time, with or without our fix. Instead, we have to come up with some new notation that works alongside the existing version notation, without interfering with it.

Converting to the coördinate system gives us a nice tip in the right direction, though - we need a sixth dimension to represent the LTS patch version. And “dimension” isn’t metaphorical: for any given version, say {6, 1, 0, 1, 1} (that is, 6.x-1.0-alpha1), we may need to create an LTS patch to it, making it {6, 1, 0, 1, 1, 1}. And then later, maybe we have to create yet another: {6, 1, 0, 1, 1, 2}.

Now, we also have to extend the string syntax to support this sixth dimension - remember, strings are how the agent reports a site’s extension versions! It’s easy enough to say “let’s just add a bit to the end,” like we did with the coördinates, but we’re trying to design a reliable system here - we have to understand the implications of such changes.

Fortunately, this turns out to be quite easy: {6, 1, 0, 1, 1, 1} becomes 6.x-1.0-alpha1-p1; {6, 2, 3, 4, 0, 1} becomes 6.x-2.3-p1. This works well specifically because the strings in the prerelease type field are constrained to unstable, alpha, beta, and rc - unlike in semver, for example:

     A pre-release version MAY be denoted by appending a hyphen and a series
     of dot separated identifiers immediately following the patch version.
     Identifiers MUST comprise only ASCII alphanumerics and hyphen
     [0-9A-Za-z-].

     ...identifiers consisting of only digits are compared numerically and
     identifiers with letters or hyphens are compared lexically in ASCII
     sort order.

In semver, prerelease information can be any alphanumeric string, can be repeated, and are compared lexicographically (that is, alpha < beta < rc < unstable). If Drupal versions were unbounded in this way, then a -p1 suffix would be indistinguishable from prerelease information, creating ambiguity and making conflicts possible. But they’re not! So, this suffix works just fine.

Now, a coordinate system is fine and dandy, but at the end of the day, it’s just an abstracted system for representing the information in an individual version. That’s important, but the next step is figuring out where a particular version sits in the universe of versions for a given extension. Specifically, the question Quo needs to ask is if there’s a “newer” version of the component available (and if so, whether that version includes security fixes). Basically, we need to know if one version is “less” than another.

And that’s where we’ll pick up in the next post!