MySQL Index Hinting

Did you know that you can give a hint to MySQL on which index to use? It’s called Index Hint and can be part of a query where you feel MySQL is doing a wrong choice (although that’s quite unlikely these days.)

SELECT * FROM table1 USE INDEX (your_index)
    WHERE col1 = 1 AND col2 = 2 AND col3 = 3;

Where your_index is the name of the index you’d like to use for this query. You can supply several comma-separated indexes and MySQL will pick the one it thinks is best. Alternatively you can tell MySQL to IGNORE INDEX too! Do benchmark though before making the final decision ;)

Pickle vs JSON — Which is Faster?

If you’re here for the short answer — JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps). I’ve been thinking about this since I wrote the ObjectProperty and JsonProperty classes for Google App Engine. They’re both easy to use and work as expected. I did have some trouble with ObjectProperty but I figured it out in the end.

As my previous posts mention, the ObjectProperty class uses Python’s pickle module, while JsonProperty works with simplejson (bundled with Python 2.6 and above, available through django.utils in Google App Engine). I decided to measure the performance of these two.

Unfortunately I couldn’t do much benchmarking on top of Google App Engine since there’s too much lag between the application server and Google’s Datastore so I decided to write simple benchmarks and find out which is faster — pickle or JSON. I started out by constructing a dataset which I’ll be pickling and “jsoning”, which resulted in some random lists, dictionaries and nested dictionaries containing lorem ipsum texts.

I then used Python’s timeit module to measure how long it took to “dumps” and “loads” the dataset using pickle and simplejson. I also measured the resulted pickle/json strings length to see which will be smaller in size, and guess what — JSON wins in all rounds. I ran the tests 10, 20, 50, 100, 500 and 1000 times for reading, writing and length comparison. Below are three charts illustrating the results:

As you see, dumps in JSON are much faster — by almost 1500%, and that is 15 times faster than Pickling! Now let’s see what happens with loads:

Loads shows even more goodies for JSON lovers — a massive 2500%, how’s that!? Of course some of you might be concerned with size, memory usage, etc. Since there’s no good method of measuring the actual bytes, I used Python’s len function to simply measure the number of characters in the resulting pickle/JSON string.

So yes, JSON is faster in all three aspects. If you’d like to experiment yourself, feel free to use the source code I wrote. Beware of running the 500/1000 tests, those can take hours ;)

The benchmark was done on an Ubuntu 10.10 64-bit machine with Python 2.6 installed, but I don’t think that results will be different on others. The conclusion to this is that if you need to store complex objects, such as functions, class instances, etc., you have to use pickle, while if you’re only looking for a way to store simple objects, lists and nested dictionaries, then you’re better off with JSON.

Thank you for reading and retweeting ;)

Update: If you’re sticking to Pickling objects and have the freedom to use C compiled libraries, then go ahead with cPickle instead of pickle, although that still lacks behind JSON (twice in loading and dumping). As to App Engine, I tried running a short benchmark with cPickle vs simplejson from the django.utils package, results were better for pickle, but still not enough to beat JSON which is 30% faster. I know there are other libraries worth mentioning here, but my hands are tied since I’m running App Engine with Python 2.5 and not allowed to install extra modules ;) Cheers and thanks for the comments!

Driving the (ve) Server at Media Temple

It’s been a few weeks now since Media Temple launched their new (ve) Server and I’ve been testing it out for a few days now. I’m actually hosting my blog there to experience some real traffic load and my first impressions are awesome!

I started off with the simplest 512 MB server and transferred a few websites to the new platform. I’m not too used to the Ubuntu Linux operating system but I found my way around quickly. They do have other operating systems options, but Ubuntu is the one they recommend. First few tests showed that my load time decreased dramatically compared to my Amazon EC2 instance, which I was quite happy with. Next step was to run a few load tests using the Apache Benchmark tool (ab), and very soon I realized that I got quite a few failed requests, memory shortage and other strange stuff.

Media Temple’s (ve) servers are hosted on the Virtuozzo platform by Parallels, and after browsing their documentation I found out that there’s no swap space available for Virtuozzo containers. They do allow around 80% of burstable RAM (so you get around 1 GB when running 512 MB) but when that runs out, you’re left with nothing, not even some swap space on your hard drive. Some heavy load tests showed 30% request failure, which is quite horrible.

Media Temple don’t give much information on the new platform via the support system and in memory shortage questions in their user forums they advice you to upgrade, of course! Well, I wouldn’t like to upgrade to just run a couple of load tests, and what about Digg-traffic? Should I predict that and upgrade before the spike? Then downgrade again to save some cash? Of course not.

A good option I found here is to tune Apache a little bit, reduce it’s resources limits. This will not increase performance, but may guarantee a 100% fail-safe workflow. We wouldn’t like our users to see a blank page (or a memory shortage error) when a spike hits, but we would rather want them to wait more than often and still load the requested page. The settings mostly depend on what software you’re running, which services and the RAM available in your container.

You might want to reduce the KeepAliveTimeout in your apache settings (mine’s now set to 5), and the rest is up to the mpm prefork module. You’ll have to modify your settings and then run some tests until you’re comfortable with the results. Mine are the following:

<IfModule mpm_prefork_module>
    StartServers 3
    MinSpareServers 2
    MaxSpareServers 5
    MaxClients 10
    MaxRequestsPerChild 0
</IfModule>

This is on a 512 MB (~ 400 more burstable) container. An Apache Benchmark test showed that 100 concurrent (simultaneous) requests performed in 26 seconds with 0% failed requests, this makes 3.84 requests per second, which is quite good. To give a comparison, the same test ran on the mashable.com website gave 30 seconds with 3.32 requests per second, and of course a 0% failure. Also check out other MPMs for Apache which could give results too.

This definitely requires more fine-tuning and if the page load time becomes too high then yes, there is a reason to upgrade, but don’t forget about other performance tricks such as CDNs, gzip (deflate) and others. When you’re done with Apache, proceed to MySQL fine-tuning & php configuration, there are some tricks there too to give you some extra speed & performance.

I’ll keep playing around with this server, plus I’ve purchased a 1GB (ve) this morning, so there’s quite lot of tests that have to be run. Anyways, if you’re looking for a good, high-performance VPS, then Media Temple is definitely a choice to consider. For only $30/mo you can get quite a good looking virtual server. It is more interesting than their old dedicated virtual servers (although still in beta). Cheers, and don’t forget to retweet this post ;)

W3 Total Cache with Amazon S3 and CloudFront

A few days ago Frederick Townes, author of the W3 Total Cache for WordPress has released an update to this wonderful plugin, and yes, it now fully supports Amazon S3 and CloudFront as the Content Delivery Network! This is a major one for me as I manually upload most of the static assets to my CloudFront account which may take quite a lot of time. The W3 Total Cache plugin does that for you in seconds! Post attachments, images, javascript, css.. All those could go to CloudFront in just 4 clicks. Frederick also mentioned that the upcoming update will also be surprising, which keeps me wondering.

I also tried out the other options for page and database caching. A few tests showed up that memcache is faster than APC, so that’s where I stopped at database caching. Page caching was switched to enhanced, which I believe is a new option. The site performance graph at Google Webmaster Tools shows pretty good performance for Novermber and December (very close to 1.5 seconds) although the overall average is still up at 3.5 seconds, which in terms of Google is slower than 59% of sites. This is probably caused by the force majeures in September and October. Page load time peaked at over 7 seconds there.

One more funny fact about Google’s Site performance and Page Speed tools is the “Minimize DNS lookups” section, which most of the time shows up a single entry:

The domains of the following URLs only serve one resource each. If possible, avoid the extra DNS lookups by serving these resources from existing domains: http://www.google-analytics.com/ga.js

Interesting. Perhaps I should copy that javascript file and serve it from my CDN, I wonder if that will work. Oh and then I’ll be missing all the nifty updates to Google Analytics, like the most recent one called Asynchronous Tracking – very neat by the way!

Loading jQuery from a CDN in WordPress

This may seem like an easy task to do but is quite tricky in WordPress. Using a CDN these days is very popular, cheap and helps speed up your website taking the load off your web server. I personally love Amazon CloudFront! The tips at Google Code suggest you serve all your static content from different domains, preferably ones without cookies, so CDNs are perfect.

All the problem with WordPress is script dependancies, and this applies not only to jQuery but to all the other predefined javascript libraries (prototype, scriptaculous, thickbox, see wp_enqueue_script for more info). It’s all about the handles and plugins that use jQuery will probably use the jquery handle in their dependency lists, which will automatically make WordPress include the standard jQuery from its wp-includes directory. This means that using the code:

wp_enqueue_script("my-handle", "https://konstantin.blog/jquery.js");

You might end up including two instances of the jQuery library, one from your CDN (s.kovshenin.com) and another one from the WordPress wp-includes directory, which will end up in a total mess. Strange though, that you cannot redefine an already known handle, such as jquery like this:

wp_enqueue_script("jquery", "https://konstantin.blog/jquery.js");

The javascript library will still be loaded from the default location (wp-includes on your local web server). So the right way to do it is with a little hack in your functions.php file (in case you’re doing it within your theme) or any other plugin file (in case you’re doing it within your plugin):

add_filter('script_loader_src', 'my_script_loader_src', 10, 2);
function my_script_loader_src($src, $handle) {
	if ($handle == "jquery")
		return "https://konstantin.blog/js/jquery.1.3.2.min.js";

	return $src;
}

Then any call to wp_enqueue_script with the jquery handler will output the correct path to your CDN version of jQuery. Oh and please, try not to use generic function names like my_script_loader_src, I used that just as an example, we don’t want any function name conflicts and can’t expect other plugin/theme developers to use non-generic names ;)

Every Millisecond Counts: Page Speed for Firebug

Here’s a little video that we’ve seen at Arvind’s and Sreeram’s presentation about speeding up the web at the Google Developer Day 2009 conference in Moscow. Inspiring isn’t it?

Arvind and Sreeram talked about a very nice plugin for Firefox (built upon Firebug) which is called Page Speed, developed and maintained by the Googlers. You may read more about the plugin on the official page at Google Code: Page Speed for Firefox/Firebug plus a bunch of cool tips and tricks right here: Let’s make the web faster. I used to run with one called YSlow by Yahoo, but the Googlers seem to have made a better job.

I ran the speed tests on my homepage and got quite a few sweet suggestions, mainly about combining and minifying my CSS and JavaScript files, distributing static content to different cookie-less domains and a couple more. Well combining and minifying CSS and JS would have been quite difficult in WordPress due to the series of plugins that use their own, if it weren’t of course for the W3 Total Cache plugin. In only a few minutes I managed to combine all javascript and stylesheets into single minified versions, which were recreated whenever a plugin was updated. After doing that, running the same test didn’t yield out that problem anymore. Distributing static content to different domains, well that’s one more issue that would have been solved by that brilliant cache plugin and its CDN features, but I guess I’ll have to wait for Amazon CloudFront compatibility.

One more thing I love about Page Speed is that not only they state the problem, but also provide the solution, or at least an easy guide to the solution. Now with a few warnings left, my Page Speed overall performance is okay. I hope to optimize that later this month for even faster access, and perhaps sign up with a PubSubHubbub service (Brett Slatkin had a fantastic presentation on that one at GDD too), and I can finally pronounce that correctly, Hubbub for short.

Multiple Sites Driven By One WordPress Installation

This is early experimental. And, I’ve also marked this post into the “personal” category, because you wouldn’t want your clients to have too much access, especially if they share a single WordPress installation. Now I know there’s the WordPress MU project, but I guess I can’t use it in this case, because WordPress MU assumes your URLs will be within the same domain (either subdomains or directories).

The reason I want multiple sites to be driven by one single WordPress installation is because I’m really tired of upgrading everytime. Upgrading the WordPress Core once in a while is okay, but when you’ve got a list of 30 plugins, it’s a pain in the neck upgrading two or three every day on every single blog and website you run. Automatic updates is not a choice, as I want to take a look at what I’m updating to before actually doing it, at least once.

I won’t be doing this from scratch. I’ll start by merging this blog and the Foller.me blog into a single installation. Single doesn’t mean they share the same database, all they share is the WordPress core files, plugins and themes. Yes, this may be dangerous, because not all the plugins store the data in the database (though I believe they should, at least when they’re capable of doing that). Now imagine the Next Gen Gallery (or perhaps any other gallery plugin) being shared over two websites within one WordPress installation. The albums are stored in one folder called gallery. So there might be a conflict if two albums have the same name. There might be an option to store the files in a different directory, and hope that option is stored in the database, will check on that later.

One more issue.. Remember I said personal projects? And assigned the post to the personal category? If you’ve got some clients who are hosted on WordPress, and you’re doing some admin things for them but they DO have admin rights in their admin panels, then I wouldn’t go with this stuff, as it’ll be quite difficult to restrict them from changing eachothers themes and plugins that they share. Get my point?

kay, now the trick will be in the wp-config.php file. We’ll basically look at the incoming address using some regular expression or whatever. If it’s based on kovshenin.com, then we connect to database 1, otherwise, if it’s based on blog.foller.me we connect to the 2nd database, and so on. Pretty simple, huh? If you’re a total freak you might wanna try changing just the prefix, thus having multiple websites, one WordPress installation, one database and a bunchload of tables ;)

I’ve no idea if this will alter the overall performance, but keeping total visitors under ~ 20,000 per day should be just fine ;) I’ll get back at you with another post next week, hopefully with some tests and some results. Cheers!