Tag1 Consulting

Performance and Scalability Experts

SearchBench In The Cloud

Comments

Charts

Submitted by admin on Wed, 07/09/2008 - 10:03.

I added an export feature to SearchBench, and used this to export data from another run of tests. I imported this data into the Gnumeric spreadsheet program, and generated the attached graphs. The issue facing Xapian in these tests is a significantly higher frequency of rogue queries taking significantly longer than the average query. My reading on Xapian makes me suspect this is caused by Disk I/O contention, however it will take additional tests to know for sure. SearchBench should generate averages from 100% of the results, and from the median 95% of the results to offer more comprehensie comparisons.


sounds cool

Submitted by Anonymous (not verified) on Thu, 08/21/2008 - 19:13.

sounds cool

Very great study

Submitted by Robert Douglass (not verified) on Wed, 07/09/2008 - 05:29.

I'm really happy to hear that Drupal's search is getting some respect =) Great benchmarks. You didn't specify whether you tested Drupal 6 search, which has a lot of performance advantage over Drupal 5. It is also interesting to note that Doug Green is working on backporting the Drupal 6 search improvements to Drupal 5: http://drupal.org/node/146466

There are also a couple of things that are yet to be done to make D7 search even faster: http://drupal.org/node/258998
Djun Kim has also thought up a nice strategy for adding some caching to search and we should see the results of that work eventually as well.

Let me know if you want help setting up an ApacheSolr test: http://drupal.org/project/apachesolr

The setup is very easy. You download Solr (either 1.2 or the latest nightly from the upcoming 1.3 are fine). In the tarball is an example application. Move the schema.xml file from the Drupal module to the conf directory of the example application, start the Solr server with java -jar start.sh, and hit cron to index your site.

One question I had about Xapian: does it return a list of nids to Drupal? Or does it return rendered search results? Drupal's search returns nids and Drupal then does a node_load on each search result which is horribly wasteful since all that is being built is a small extract from the text. Solr stores all the fields of the node and returns them with the search result so no node_load is needed. This is part of the reason why Solr performs so well, and I imagine both Xapian and core search could be programmed to behave this way as well.

benchmarks

Submitted by admin on Wed, 07/09/2008 - 08:20.

The current tests were against Drupal 5, but I also intend to compare Drupal 6 and Drupal 7 as time permits. There are many things waiting to be benchmarked! :)

I am certainly interested in benchmarking Solr, too, however first I am focused on adding more functionality to SearchBench so that the results gathered are more useful.

The Xapian module currently returns a list of nids, and lets Drupal do what it wants to with them. It does not replace Drupal's search GUI, instead it re-uses the GUI from the search module, and as such each result does involve a node_load() whether it needs it or not. I have not looked into ways to optimize Xapian just yet, but this is also planned.

My current focus is on building a useful benchmarking environment, and useful benchmarking tools. Once these are built, then I will focus on collecting extensive data.

An issue to bear in mind

Submitted by catch (not verified) on Wed, 07/09/2008 - 08:33.

An issue to bear in mind with Drupal 5 (and one of the changes in D6), is that for valid search results, Drupal is writing a temporary table - in nearly all situations to disk. So doing those benchmarks with only valid searches I think would probably return very different benchmarks. All this work is great though, I'm following along.