SearchBench has received a couple of useful updates since yesterday's initial cloud tests. It can generate search queries based on actual content, and it can export search benchmark results. In gaining these features, it is now possible to use SearchBench to perform some actual performance comparisons.
Once again I set up these tests on an extra large EC2 instance. I still have not performed any tuning, and I continue to test Drupal 5 core search with Xapian search. My initial benchmarks show that Xapian offers a very significant 6x+ performance advantage over Drupal's core search when a given search query actually returns results. In addition, Xapian is able to index a large site in about a 3rd the time of Drupal 5's built in search. Read on for actual benchmark results and graphs.
These tests make it clear that it's important to use legitimate search terms when benchmarking search performance. SearchBench's new ability to extract wordlists from a site's actual content allows the tool to provide much more useful data. Again, note that neither Xapian nor MySQL has been tuned for these results, and that future benchmarks will aim to better understand how various tunings and configurations affect search performance.
Most of these queries did not return any actual search results. The few slow downs you see are because Xapian did return results for some queries.
These are the same queries that were used in the previous test. Note that Drupal core's search did not return results at any time. It would be interesting to compare the queries where Xapian does return results but Drupal core does not, and to fully understand why they the difference in search results.
In this test, SearchBench generated wordlists based on words extracted from actual content on the website being tested. As a result, many of the queries returned actual results, visible in the performance slowdown above.
Some hard numbers from the above test:
Total tests | 3 |
Searches per test | 100 |
Total time | 71.5365 seconds |
Average time per test | 23.8455 seconds |
Average time per query | 0.23845 seconds |
Longest query | 0.66174 seconds |
Shortest query | 0.12636 seconds |
Thanks to SearchBench, the queries used in this test are identical to the queries used in the previous Xapian test, offering a more precise comparison between the two search solutions. There is an apparent slowdown in Drupal core powered searches when they return actual results. Much of this slow down is likely due to the creation of temporary tables, an issue that has been significantly improved in Drupal 6. This functionality is being back ported to Drupal 5 as an optional patch on which I plan to run additional benchmarks.
Some hard numbers from the above test:
Total tests | 3 |
Searches per test | 100 |
Total time | 433.8613 seconds |
Average time per test | 144.6204 seconds |
Average time per query | 1.44620 seconds |
Longest query | 4.90253 seconds |
Shortest query | 0.11557 seconds |
The raw search data from the above benchmarks can be found in this Gnumeric spreadsheet.
There are many more benchmarks planned, as detailed in my earlier blog posting. SearchBench is being developed as a tool to better understand search performance and scalability. Tag1 Consulting is focused on defining solid recommendations and best practices for obtaining optimal performance from LAMP-powered search solutions, and on continuing to improve Drupal's scalability.