Until recently, I was a student employee at the Oregon State University Open Source Lab. My career there ended, like many, with that painful process known as graduation. I got invaluable experience at the lab, not the least of which being the knowledge gained as their main (only) database administrator. One of my great pleasures in that position, was learning how to configure MySQL replication and manage clusters of replicating database servers. Even the simple case of a single master and a single slave has its edge cases.
Drupal
Chapter 1 Rough Draft Complete
Submitted by jeremy on Sun, 07/27/2008 - 16:27.I have completed a rough draft of the first chapter of "Drupal Performance and Scalability". The first chapter of this online book is divided into four sections, the first of which focuses on the importance of fully defining your performance and scalability goals, helping you to identify what you need to accomplish and how to set concrete and attainable goals. The second section discusses monitoring and measuring your ongoing progress, helping you decide what you need to monitor, and how to monitor it. The third section stresses the importance of making regular backups, discussing what needs to be backed up, and offering example scripts for backing up your entire website, including the database. Finally, the fourth section takes an in depth look at using revision control tools to manage your website, providing useful recipes showing how Git can track changes to your website, helping you update to new releases and push those updates into production.
It is important to realize that this is a rough draft, and as such it may contain spelling or grammatical errors, it may be missing key points, and the writing style may not be very polished. However, the book has to start somewhere, and this is the first step toward the end goal of publishing a useful and freely available online resource. I welcome all criticisms, suggestions and feedback. If you find errors in the text or have specific comments, you can help with this writing project by posting your feedback on the appropriate page. The current status of this project is tracked here.
Tuning Search In Drupal 5
Submitted by jeremy on Sat, 07/19/2008 - 10:40.In previous search benchmarks, I utilized random content generated with Drupal's devel module. In these latest benchmarks, I used an actual sanitized copy of the Drupal.org community website database, with email addresses and passwords removed. The first tests were intended to confirm that Xapian continues to perform well with large amounts of actual data. Additional tests were performed to measure the effect of various MySQL tunings and configurations. The following data was derived from several hundred benchmarks run on an Amazon AWS instance over the past week using the SearchBench module.
These tests confirm that Xapian continues to offer better search performance than Drupal's core search module. Contrary to popular belief, the data also shows that using the InnoDB storage engine for search tables significantly outperforms using the MyISAM storage engine for search tables, especially when your database server has sufficient RAM. The data also confirms that allocating additional RAM for MySQL's temporary tables can also improve search performance.
Online Performance and Scalability Book
Submitted by jeremy on Fri, 07/18/2008 - 09:32.Tag1 Consulting is focused on improving Drupal's performance and scalability. We also believe that when information is freely shared, everyone wins. Toward these ends, we are working on an online book titled, "Drupal Performance and Scalability". The book is divided into five main sections, Drupal Performance, Front End Performance, Improved Caching and Searching, Optimizing the Database Layer, and Drupal In The Cloud. The book is primarily aimed toward users running Drupal on the LAMP stack, with chapters applicable to everything from low-end shared hosts to large-scale multi-server installations.
By publishing on-line, we aim to encourage you to participate in the book writing process as an editor and a technical reviewer. You will currently find the book's complete outline online, along with descriptions of each planned section and chapter. As the book evolves, it will continue to be updated online in real time. We encourage you to post comments with suggestions, critical feedback, grammatical corrections, or anything else relevant to our ongoing effort.
Comparing Xapian and Drupal 5's Core Search
Submitted by jeremy on Wed, 07/09/2008 - 15:09.SearchBench has received a couple of useful updates since yesterday's initial cloud tests. It can generate search queries based on actual content, and it can export search benchmark results. In gaining these features, it is now possible to use SearchBench to perform some actual performance comparisons.
Once again I set up these tests on an extra large EC2 instance. I still have not performed any tuning, and I continue to test Drupal 5 core search with Xapian search. My initial benchmarks show that Xapian offers a very significant 6x+ performance advantage over Drupal's core search when a given search query actually returns results. In addition, Xapian is able to index a large site in about a 3rd the time of Drupal 5's built in search. Read on for actual benchmark results and graphs.
SearchBench In The Cloud
Submitted by jeremy on Tue, 07/08/2008 - 19:56.I ran some initial Drupal search benchmarks with SearchBench on Amazon's EC2 cloud service. These first tests were primarily focused on confirming that SearchBench and EC2 are a good match. They utilized a single server instance, and did not include any server tuning.
I used the devel module to create 5,000 random nodes and 10,000 random comments. I indexed this content both with Drupal's core search module, and with the contributed Xapian module. I then used SearchBench to create 1,000 random search queries with one to ten ten words in each query, with phrasing and negation set to random. Finally, I ran the same identical search test three times in a row, comparing Xapian's performance to Drupal's core search performance. I was impressed to see how well Drupal's core search performed in these tests, and plan many more tests to better understand the strengths and weaknesses of each search technology.
Introducing SearchBench
Submitted by jeremy on Sun, 06/29/2008 - 18:26.There have been some ongoing scalability issues affecting Drupal.org's built in search functionality for some time now. Less interested in outsourcing search to a big black box such as Google, I spent some time helping clean up the Xapian module, making it possible to completely replace Drupal's built in SQL-powered search functionality with a Xapian powered engine. With the basic search functionality complete, there was still a need to actually compare the performance of the two solutions.
Toward this goal, over the weekend I launched a new project called SearchBench, a Drupal module for benchmarking Drupal's search performance. As the module evolves, I hope it will prove extremely useful for comparing the performance and scalability of the many free and open source search options available to Drupal powered websites.
Why Drupal.org Should Join the Ad Bard Network
Submitted by jeremy on Mon, 05/26/2008 - 11:49.The Ad Bard Network was conceived because I have a need for relevant, non-obnoxious advertisements on my website, KernelTrap.org. I have maintained KernelTrap for many years, as a hobby in my spare time, and as a way to stay involved in the open source world. I enjoy this hobby, but it requires a lot of time and commitment keeping the website updated every day. I've long dreamed of finding a way to make a little income to help justify the time I invest into my hobby.
Displaying advertisements on KernelTrap has a lot of potential for earning income, but I failed to find an advertising network that was compatible with my beliefs and requirements. I need an ad network that won't flood my website with animated gifs, flash videos and pop-ups. I want to know exactly what information is being collected about my readers. I want to earn a fair share of the profits, and to know how much the advertising network is making off my website. I want to be fully in control of what types of ads and what specific ads appear on my website. And the ads need to load extremely quickly, not slowing down my web pages or loading scripts within scripts within scripts.
The Ad Bard Network has grown out of these needs, already exceeding my own requirements and becoming a viable and useful fund raising mechanism for all free and open source projects and websites.
Drupal and Amazon EC2 Quick Start
Submitted by jeremy on Tue, 05/13/2008 - 22:42.With all the excitement surrounding cloud computing, and specifically Amazon's EC2 (Elastic Compute Cloud) Beta service, I decided it was time to give it a try myself. Without much personal background in the new service, I found that there are an overwhelming number of pages talking about EC2, and even Drupal on EC2, but didn't locate a simple guide to quickly get me up and running. Having now spent a few hours today learning the basics, I'm jotting down these quick notes to help the next person interested in trying the same, and in the hopes of attracting useful tips from other AWS users.
MySQL Engines: MyISAM vs. InnoDB
Submitted by nnewton on Wed, 04/23/2008 - 17:54.This article provides a comparison between the MyISAM and InnoDB storage engines for MySQL. InnoDB is commonly considered to perform worse than MyISAM, but this article aims to dispel this myth by describing the differences between these engines and what makes InnoDB a good fit for many database needs.
In addition, a look at when it is better to use MyISAM and a case study of the drupal.org site provide insight for determining which engine is best for a given situation.
