A Brief Introduction to Google App Engine

If you’ve been following my tweets you might have noticed that I’ve been running around Python and Django and Google App Engine for the last few weeks. Honestly, I fell in love with Python the second I wrote my first few lines of code, but it is not what this post is about. This post is about my short experience with App Engine.

My first experience with App Engine was back in 2009, when I deployed my hello world application, made sure it worked on appspot and then forgot about it, since I was more keen on finishing my Foller.me project in PHP. I didn’t really get the idea of cloud computing and distributed applications at that time, since I had no experience in heavy load at all. When Foller.me was reviewed on Mashable, I thought that 16k people visited my website that day, but in reality the figures were different, since the whole thing just stopped working.

What do you think Google runs on? Do they run on water vapor? I mean, cloud – it’s databases, and operating systems, and memory, and microprocessors, and the Internet!

Larry Ellison, CEO & Co-founder, Oracle

I spent a little less than a year on Amazon EC2, got familiar with their tools and services – quite cool if you have a few hundred extra bucks to spend every month. Yeah, the reason I dropped out from Amazon in favor of MediaTemple was the cost. A few months later Amazon announced Cloud Computing Free of Charge for a whole year, but it was too late for me. I was already playing around with Python and Django, thinking about Google App Engine.

The Google Developer Day conference in Moscow made me reconsider App Engine with a few cool improvements, especially for businesses. So I made my final decision to branch my Juice Toolkit project into a GAE version, which I was already working on at that point.

It did take me a few days to rewrite my Django code for Google’s Datastore support and I hope that it’s worthed. I used the django-mptt package to work with trees in the master branch of Juice, which unfortunately did not work out of the box with App Engine, so yeah, I wrote my own trees in App Engine for threaded comments, pages and taxonomy. Of course it’s still not final and requires some refactoring and optimization, but that wasn’t very difficult ;) And memcache, oh I love App Engine’s memcache. Their limitations docs said I can use up to 1 megabyte of memory storage with my free account, but memcache reports that current usage is over 2 megabytes and Google ain’t charging!

I cannot say that I now know much about App Engine, since there’s still quite a lot to learn, but I do thing that beginners like me would love a list of short tips that would help them on their way from Django to Google App Engine:

  • Django Models are not compatible with Google App Engine models, you have to rewrite them
  • The Django admin and authentication systems are built on Django models, thus they will not work
  • There is an appengine-admin project, but you’ll be better off with Google’s Datastore Viewer
  • Django’s shell (interactive console) is replaced by Google’s online shell, usually located at localhost:8080/_ah/admin/interactive (it is lacking the Run Program on Ctrl+Enter feature)
  • When you pass around objects linked together (for instance parent and child comments), note that obj and obj.key() are slightly different: .key() method is more likely to get a fresh copy of your object.
  • If you need to run something heavy, don’t be lazy, use the Task Queue API
  • Profile your code on appspot. Google says that your free account will be able to serve 5 million page views per month. I did a little calculation: 5 million / 30 days ~ 160k page views per day. Google gives 6.5 CPU hours every day, so you should be able to serve ~ 7 page views per second. This means that you have to spend less than ~ 150 CPU milliseconds to generate each page in order to serve 5 million per month. That should be your goal. Quite simple, eh? Use memcache.

I guess that’s enough for a start. If you’d like to stay tuned to what I’m up to, make sure you follow the Juice Toolkit project at Github. Note that there’s a branch called GAE, since the master branch is completely based on Django. Thank you for retweeting

Amazon Web Services: Cloud Computing Free of Charge

Howly shmoly, just read the announcement of Amazon’s Free Usage Tier offering an EC2 micro instance free of charge for a whole year! Sounds cool, doesn’t it? Well let’s go back a few months and analyze the reason why I left Amazon in favor of Media Temple’s (ve) service: Amazon is way too expensive for a young geek like me, barely having the money to pay rent for my lousy apartment in Moscow ;)

Well that’s not the only reason, but I’m now quite comfortable with (mt)’s services, except their tech support, but that’s not what this post is about. I must have flooded my Twitter with messages about Django and Python. Honestly, I fell in love with Python a few months ago (in theory) then started scratching code in the beginning of October, but then again, this is not what the post is about (can somebody tell me why I’m going off-topic today?)

Back to AWS. The news is good, but the fact that they mention “new customers” frightens me:

These free tiers are only available to new AWS customers and are available for 12 months following your AWS sign-up date. When your free usage expires or if your application use exceeds the free usage tiers, you simply pay standard, pay-as-you-go service rates (see each service page for full pricing details).

I’ve been with them for over a year, paying a bunch of money every month, so I’m not a new customer for them anymore, unfortunately. So they’re not really targeting old customers which were unsatisfied with something, but new ones which have never tried EC2 (S3 and all the rest). Again, this is only a trial, unlike Google AppEngine, which happens to love Python code.

So my thinking is – is this all a coincidence, or is it a light for me towards AppEngine, Python and Google? Stay tuned: @kovshenin :)

Moving Away from the Amazon Cloud

I wrote quite a few posts about Amazon Web Services and I hosted my blog there too for a while, but after some time I decided to switch back to a cheaper hosting provider and leave Amazon for the big projects inside our company. This turned out to be quite tricky.

Moving away from the Amazon cloud has some pitfalls you should watch out for. So this post is not only a note to myself about how to do it right next time, but also a note for you readers on how to hopefully save some time and money. Due to lack of experience and not reading everything carefuly the first time, it took me two months and around $35 just to move away from the cloud. Now that’s the kind of money I’d spend to buy a new book, but certainly not just to make Amazon $35 richer ;)

I made a rough checklist below of stuff to watch out for, and Amazon’s prices according to October 2010:

  • When terminating all instances in the cloud, make sure you check every region (US East, US West, Ireland and Singapore) – pricing start at around $0.10/hr
  • Clear your S3 buckets, and remove them – Amazon charges $0.15 per GB-month for S3 storage
  • Remove your EBS shots, from all regions – $0.11-0.18 for storage/shapshots per GB-month
  • Elastic IP addresses – Amazon charges $0.01 for non-attached IP addresses per hour, that’s $14 per month!

And please, do double check if there’s anything else in your AWS Management Console, especially if you get a notification from your bank next month. Make sure you scan all available regions! Another way to terminate your AWS account is to instruct your bank not to pay to Amazon at all ;)

If there’s anything else you would add to the list above, make sure to leave a comment below or poke me on Twitter (@kovshenin). To stay tuned and never miss a post, subscribe to my RSS feed.┬áCheers!

Driving the (ve) Server at Media Temple

It’s been a few weeks now since Media Temple launched their new (ve) Server and I’ve been testing it out for a few days now. I’m actually hosting my blog there to experience some real traffic load and my first impressions are awesome!

I started off with the simplest 512 MB server and transferred a few websites to the new platform. I’m not too used to the Ubuntu Linux operating system but I found my way around quickly. They do have other operating systems options, but Ubuntu is the one they recommend. First few tests showed that my load time decreased dramatically compared to my Amazon EC2 instance, which I was quite happy with. Next step was to run a few load tests using the Apache Benchmark tool (ab), and very soon I realized that I got quite a few failed requests, memory shortage and other strange stuff.

Media Temple’s (ve) servers are hosted on the Virtuozzo platform by Parallels, and after browsing their documentation I found out that there’s no swap space available for Virtuozzo containers. They do allow around 80% of burstable RAM (so you get around 1 GB when running 512 MB) but when that runs out, you’re left with nothing, not even some swap space on your hard drive. Some heavy load tests showed 30% request failure, which is quite horrible.

Media Temple don’t give much information on the new platform via the support system and in memory shortage questions in their user forums they advice you to upgrade, of course! Well, I wouldn’t like to upgrade to just run a couple of load tests, and what about Digg-traffic? Should I predict that and upgrade before the spike? Then downgrade again to save some cash? Of course not.

A good option I found here is to tune Apache a little bit, reduce it’s resources limits. This will not increase performance, but may guarantee a 100% fail-safe workflow. We wouldn’t like our users to see a blank page (or a memory shortage error) when a spike hits, but we would rather want them to wait more than often and still load the requested page. The settings mostly depend on what software you’re running, which services and the RAM available in your container.

You might want to reduce the KeepAliveTimeout in your apache settings (mine’s now set to 5), and the rest is up to the mpm prefork module. You’ll have to modify your settings and then run some tests until you’re comfortable with the results. Mine are the following:

<IfModule mpm_prefork_module>
    StartServers 3
    MinSpareServers 2
    MaxSpareServers 5
    MaxClients 10
    MaxRequestsPerChild 0

This is on a 512 MB (~ 400 more burstable) container. An Apache Benchmark test showed that 100 concurrent (simultaneous) requests performed in 26 seconds with 0% failed requests, this makes 3.84 requests per second, which is quite good. To give a comparison, the same test ran on the mashable.com website gave 30 seconds with 3.32 requests per second, and of course a 0% failure. Also check out other MPMs for Apache which could give results too.

This definitely requires more fine-tuning and if the page load time becomes too high then yes, there is a reason to upgrade, but don’t forget about other performance tricks such as CDNs, gzip (deflate) and others. When you’re done with Apache, proceed to MySQL fine-tuning & php configuration, there are some tricks there too to give you some extra speed & performance.

I’ll keep playing around with this server, plus I’ve purchased a 1GB (ve) this morning, so there’s quite lot of tests that have to be run. Anyways, if you’re looking for a good, high-performance VPS, then Media Temple is definitely a choice to consider. For only $30/mo you can get quite a good looking virtual server. It is more interesting than their old dedicated virtual servers (although still in beta). Cheers, and don’t forget to retweet this post ;)

Amazon Web Services: EC2 in North California

January is going crazy for me down here in Moscow, lot’s of stuff happening, loads of work. No time to tweet, not time to blog. As I mentioned in my earlier post, I quit my job at GSL and now working at a new local startup. I’ll make sure to announce it as soon as the website is alright, so stay tuned ;) Anyways, as I wrote back in December, I’m moving all my stuff to the new EC2 in the Northern California region, and I guess I can say that I’m finally done.

The process is not too different from simply moving to a new dedicated hosting or to a new EC2 instance in the same region, though there are a few nuances. I was surprised to note that the S3 Fox plugin for Firefox haven’t yet added the new region (Europe is present though). I thought it might not work for some reason (S3 and EC2 being in different regions), but hopefully it does. I also considered using the good old mod_php for Apache instead of running mod_suphp which gave me a tiny boost in performance. All the configurations were straightforward, copy from one EC2, paste into the other. Not without a few changes of course.

I also had a change in the Elastic IP address, but hey that was whitelisted by Twitter! So I guess I’ll have to write to them again for the new whitelisting. Oh well.. One more interesting thing is that I’m now running on an EBS-backed instance, which was introduced by Amazon not so long ago. I wouldn’t have to worry about getting my stuff lost on a terminate or a rebooted machine as the whole drive is being dumped into an EBS. So backups are now completely instant via the AWS Management Console, they’re called Snapshots, takes one click and a few minutes ;) Now if I’d like to terminate one EC2 instance and start the whole thing over on another one, I’d just restore from EBS or Snapshot! Unless, of course, I decide to move to another region. I believe EBS blocks and Snapshots are restricted to regions, furthermore, EBS and EC2 compatibility are restricted to a certain zone in one region, which is obvious. I wouldn’t like to run an EC2 instanced in one data center, backed by a hard drive located in a different one.

Another good question would be Amazon CloudFront. Well, since the S3 buckets haven’t changed, CloudFront should work the way it used to despiting the move. Or at least I hope so ;)

Amazon Web Services: Moving to a New Region

I wrote about Optimizing Your Amazon Web Services Costs back in November, where I mentioned some of the upsides of Reserved Instances at Amazon, but haven’t mentioned any downsides, and here we are. Two weeks later Amazon announced the Northern California Region opening. I thought it wouldn’t differ from the Virginia data center, but still decided to give it a shot for a few hours.

I didn’t do much benchmarking but hey, I’m running a Twitter app.. Foller.me, remember? This means that access times to the Twitter API are crucial, so I started off with some basic pinging, and the pings from California seemed to be a few times faster than the ones from Virginia. Next, I ran Xdebug and analyzed the cache grind sheets for a few requests to different profile pages. Sweet to know that 95% of the time taken to load a page is curl accessing the Twitter API ;) this means that my code is well optimized. The overall results in the California region was ~40% better than Virginia, so I thought of moving there. The problem was that I already had a 1 year contract with Amazon for an instance in the Virginia region.

I wrote to Amazon via their contact form and asked about reservation transfers from one region to another, of course with additional charges (the California region is slightly more expensive) and soon got a negative reply. They mentioned that reserved instances are not transferrable from one region to another but I can always cancel my reservation in one region and open up a new one in the other. They didn’t mention any refunds so I decided to ask, but soon, scrolling through their FAQ I found this:

Q: Can I move a Reserved Instance from one Region or Availability Zone to another?
A: No. Each Reserved Instance is associated with a specific Region and Availability Zone, which is fixed for the lifetime of the Reserved Instance and cannot be changed.

Q: Can I cancel a Reserved Instance?
A: The one-time payment for a Reserved Instances is not refundable. However, you can choose not to run or entirely stop using your Reserved Instance at any time, at which point you will not incur any further usage charges.

So I asked myself, why the heck would anybody want to cancel a reserved instance if they don’t get refunded? The conversation kept going on Twitter. Friends mentioned that I could purchase an additional reserved instance in the California region and then sell computing time on the one I have in Virginia, but that sounded too sarcastic. I felt unlucky and sad, and thought I thought should stick to the instance I had in Virginia. If only I had waited a few more weeks before making the purchase…

This morning I received another email from Amazon, stating that although they don’t usually do this sort of stuff, they got approval to process the cancellation with a refund just this one time, so now I’m free to reserve an instance in Northern California, happy holidays! Well, on Christmas Eve, this feels like a gift and I’m very excited about launching all my stuff in the new region, hopefully in January. So, thank you Amazon and Happy Holidays to all of you.

Cloud Tips: Amazon EC2 & Rejected Email

A few weeks ago I’ve setup my email in the /etc/aliases for user root (and the others) and started to actually read my root email from time to time (I wonder why I never did that before). Anyways, what bugged me straight away is that I had some rejected emails that were not being delivered, yielding the following errors (I removed some numbers):

Deferred: 450 4.7.1 : Helo command rejected: Host not found
421 invalid sender domain 'domU.compute-1.internal' (misconfigured dns?)

And some others that looked alike. Tonnes of them, every four hours! The emails to other addresses were delivered fine though. I had WordPress notification messages delivered to my email, never lost a message. I also tried sending out a few using the mail command via SSH, everything okay. For a second I thought that maybe those addresses were simply invalid, but wouldn’t the server reply with an “Invalid recepient” error? Probably.. Here’s what I got from the Amazon Web Services support forums:

It seems that some remote mail servers complain about your server
identifying itself in the SMTP dialogue as domU.compute-1.internal,
while its external name is ec2.compute-1.amazonaws.com

Makes total sense. Perhaps some servers do try to see where the e-mail is coming from and of course the .internal domain is unresolvable (thus the “dns” misconfiguration error). I had to identify myself with an external, resolvable name. So I copied the external name into the /etc/mailname file and hmm.. Well, it’s been a week now and I haven’t received anymore delivery errors, so that must have worked.

Optimizing Your Amazon Web Services Costs

I’ve been with Amazon for quite a long time now and you must have heard that their web hosting services aren’t very cheap. The average total of one instance per month (including EBS, S3 and all the others) was around $120 at the start. That was back in July 2009 when I had no idea about how all this stuff works. With a lot of experimenting I managed to drop my instance per month costs down by around 40%. Below are a few tips that can help you lower your Amazon Web Services charges:

  • Use reserved EC2 Instances where possible. Amazon charges $0.085 per hour for an m1.small Linux instance in the US, that’s around $61 per month and $734 per year. A reserved instance costs me $227 for one year, plus $0.03 per running hour, that makes it around $490 per year for an m1.small instance. Use reserved instances only if you’re sure that you’ll be using it for a whole year. You can save even more if you purchase a reserved instance for three years.
  • Storage: EBS vs EC2. Pick EC2! That’s right, EC2! EBS charges you for provisioned storage, IO requests and snapshots. These may rise pretty quickly if you’re running MySQL on an EBS block – very risky! Run your MySQL on EC2. The php files and everything else should preferably be on EC2 aswell. You can use your EBS block for tiny backups of core PHP files if you’re running more than one EC2 instance.
  • EBS is cheaper than S3. S3 should only be used in cases where you have to serve your static content from different servers (perhaps through CloudFront), and maybe store some backups there too (don’t forget to remove the old ones!), but EBS is cheaper, even with snapshots.
  • CloudFront is okay. It does speed up your website, but you have to know that it’s more expensive for requests to Japan and Hong Kong

There you go. With these tips you should be able to get the Amazon hosting services for around $90/month, unless of course you have a 3 million visitors per day website ;) Also, for those of you wondering.. I haven’t used RackSpace, but I did compare their prices to Amazon’s and they’re more expensive.

Cloud Tips: Rediscovering Amazon CloudFront

So, three months later I realized I wasn’t using CloudFront at all! Huh? I took a deeper look at my Amazon Web Services bill last month and found out that I wasn’t even charged for CloudFront! But hey, I delivered all my static content through CloudFront distributions from S3 and I had a subdomain mapped to those distributions and everything was working fine (thought I).. Let’s see:

Amazon CloudFront delivers your content using a global network of edge locations. Requests for your objects are automatically routed to the nearest edge location, so content is delivered with the best possible performance.

Right, and that’s probably what they charge for in the CloudFront section, so the fact is that I haven’t been using it at all. Gathering all the static content from the so-called “origin server” is far from what CloudFront can do. What I’ve been using for the past few months is simply delivering content from my S3 server, which is also good, but “good” is not enough. I browsed throughout the AWS Management Console for hours and couldn’t find out what I was doing wrong, the server kept pulling the content from the origin. Then, finally I realized that after I’ve created a distribution I was given two addresses and as they said, one was the origin server, the second one was the CloudFront server (it’s a .cloudfront.net subdomain underlined red), thus the settings I got all wrong were at the DNS level, not the Management Console.

Cloud Tips: Rediscovering Amazon CloudFront

So I logged back to my registrar, found the DNS management options and switched my CNAMEs to the CloudFront domain instead of the origin bucket and hoped that everything works well. The very next day I got my very first bill for Amazon CloudFront – three cents! Hurray! I’m not sure if this is well written in the documentation for CloudFront and S3 (I doubt that people read them) but I have a few friends who have experienced the same problem and why the address of the origin bucket in the first place? Weird. The S3 Firefox Organizer groups both fields into one and that’s even more weird. Oh well, glad I sorted it out.

Cloud Tips: Automatic Backups to S3

In a previous post about backing up EC2 MySQL to an Amazon S3 bucket we covered dumping MySQL datasets, compressing them and uploading to S3. After a few weeks test-driving the shell script, I came up with a new version that checks, fixes and optimizes all tables before generating the dump. This is pretty important as mysqldump will fail on whatever step would cause an error (data corruption, crashed tables, etc), thus your uploaded to S3 archive would be kind of corrupt. Here’s the script:

filename=mysql.`date +%Y-%m-%d`.sql.gz
echo Checking, Fixing and Optimizing all tables
mysqlcheck -u username -p password --auto-repair --check --optimize --all-databases
echo Generating MySQL Dump: ${filename}
mysqldump -u username -p password --all-databases | gzip -c9 > /tmp/${filename}
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /tmp/${filename}
echo Complete

There you go. If you remember my previous example I stored the temporary backup file on Amazon EBS (Elastic Block Storage) which is quite not appropriate. Amazon charges for EBS storage, reads and writes, so why the extra cost? Dump everything into your temp folder on EC2 and remove afterwards. Don't forget to make changes in your upload.php script ($local_dir settings). Also, just as a personal not and to people who didn't figure out how to upload archives with data to S3, here's another version of the script which takes your public_html (www, htdocs, etc) directory, archives it, compresses and uploads to an Amazon S3 bucket:

<pre>filename=data.`date +%Y-%m-%d`.sql.gz
echo Collecting data
tar -czf /tmp/${filename} /ebs/home/yourusername/www
echo Uploading ${filename} to S3 bucket
php /ebs/data/s3-php/upload.php ${filename}
echo Removing local ${filename}
rm -f /tmp/${filename}
echo Complete

Oh and have you noticed? Amazon has changed the design a little bit, and woah! They’ve finally changed the way they show the Access Secret without a trailing space character! Congrats Amazon, it took you only a few months.