Moscow Django Meetup

I’m not a huge fan of Django although Python is one of my favorite languages. Last week I heard about a Django Meetup happening in Moscow, so I decided to give it a go. It was fun, there were three speakers and around forty attendees, in a really nice venue. The three topics were PyCharm IDE, Continuous Integration and a different (and rather strange) approach at handling HTTP requests in Django general.

Anyway, it was nice to meet new people and see how their meetup is going. It was also nice to have a beer and chat about Git vs. Mercurial vs. Subversion. Will I attend next time? Maybe ;)

Encode Entities Inside PRE Tags

Here’s a little Python script that searches through a given file for pre tags and encodes anything in between. This is useful for when escaping from syntax highlighting plugins and replacing every occurrence of the code shortcode with a pre tag.

import re, sys

# First argument is the filename, output is filename.encoded
filename = sys.argv[1]
f = file(filename)
output = open('%s.encoded' % filename, 'w+');

# Read the whole file, fire the regular expressions
contents = f.read()
expr = re.compile(r'<pre>(.*?)</pre>', re.MULTILINE|re.DOTALL)
matches = expr.findall(contents)

# Loop through each match and replace < > with &lt; and &gt;
for match in matches:
	contents = contents.replace(match, match.replace('<', '&lt;').replace('>', '&gt;'));

# Write output file and close both files
output.write(contents)
output.close()
f.close()

Most syntax highlighting plugins will encode all entities on the fly for you so when you stop using them your code might break. Also, most highlighting plugins will render your TinyMCE visual editor useless when working with code, and I think it’s quite common to work with code using the visual editor in WordPress. At least Twenty Ten and Twenty Eleven understand that ;)

However, as seen from the replacement part, I don’t really encode all entities but rather replace the greater than and less than symbols. It’s enough for most cases but if you need a real entity encoding you should use the cgi.escape function which is similar to htmlspecialchars in php.

Feed this script with your database dump and it’ll create a new file with an .encoded prefix which you can feed back to MySQL. Please note though that this script reads the entier input file which may lead to slow execution, high memory usage and swapping when working with large files. Worked fine on my 30 megabyte database though.

Happy 2011: In 10+ Different Programming Languages

The end of the year 2010 is near so I’ve prepared this post for all you coders, developers and other geeks. You know when you build applications (especially web applications) you often leave a copyright notice on every page saying “Copyright 2010, All Rights Reserved” or whatever? Right, but what developers tend to forget is that time passes by and that they have to go back and change 2010 to 2011 in January — so we often browse websites, especially in January and February that have last year’s copyrights.

The best trick is not to hardcode the year inside your templates and skins, but to write the current year dynamically using date functions, so below is a list of printing out the current year in 10 different programming languages.

And here’s a list of ones contributed by commentators:

Continue reading

Pickle vs JSON — Which is Faster?

If you’re here for the short answer — JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps). I’ve been thinking about this since I wrote the ObjectProperty and JsonProperty classes for Google App Engine. They’re both easy to use and work as expected. I did have some trouble with ObjectProperty but I figured it out in the end.

As my previous posts mention, the ObjectProperty class uses Python’s pickle module, while JsonProperty works with simplejson (bundled with Python 2.6 and above, available through django.utils in Google App Engine). I decided to measure the performance of these two.

Unfortunately I couldn’t do much benchmarking on top of Google App Engine since there’s too much lag between the application server and Google’s Datastore so I decided to write simple benchmarks and find out which is faster — pickle or JSON. I started out by constructing a dataset which I’ll be pickling and “jsoning”, which resulted in some random lists, dictionaries and nested dictionaries containing lorem ipsum texts.

I then used Python’s timeit module to measure how long it took to “dumps” and “loads” the dataset using pickle and simplejson. I also measured the resulted pickle/json strings length to see which will be smaller in size, and guess what — JSON wins in all rounds. I ran the tests 10, 20, 50, 100, 500 and 1000 times for reading, writing and length comparison. Below are three charts illustrating the results:

As you see, dumps in JSON are much faster — by almost 1500%, and that is 15 times faster than Pickling! Now let’s see what happens with loads:

Loads shows even more goodies for JSON lovers — a massive 2500%, how’s that!? Of course some of you might be concerned with size, memory usage, etc. Since there’s no good method of measuring the actual bytes, I used Python’s len function to simply measure the number of characters in the resulting pickle/JSON string.

So yes, JSON is faster in all three aspects. If you’d like to experiment yourself, feel free to use the source code I wrote. Beware of running the 500/1000 tests, those can take hours ;)

The benchmark was done on an Ubuntu 10.10 64-bit machine with Python 2.6 installed, but I don’t think that results will be different on others. The conclusion to this is that if you need to store complex objects, such as functions, class instances, etc., you have to use pickle, while if you’re only looking for a way to store simple objects, lists and nested dictionaries, then you’re better off with JSON.

Thank you for reading and retweeting ;)

Update: If you’re sticking to Pickling objects and have the freedom to use C compiled libraries, then go ahead with cPickle instead of pickle, although that still lacks behind JSON (twice in loading and dumping). As to App Engine, I tried running a short benchmark with cPickle vs simplejson from the django.utils package, results were better for pickle, but still not enough to beat JSON which is 30% faster. I know there are other libraries worth mentioning here, but my hands are tied since I’m running App Engine with Python 2.5 and not allowed to install extra modules ;) Cheers and thanks for the comments!

App Engine: JSON Objects in the Google Datastore

Following up on my previous post about Python Objects in the Google Datastore where I’ve shown an easy way of storing any python object inside the Google Datastore by pickling it. Pickling works fine and it can literally store functions, classes and class instances in the datastore which is great, but as I discussed earlier, JSON could work too, so here I’ll introduce a similar approach at storing JSON objects (strings, lists and dictionaries) inside Google.

The problem with pickling objects is that they’re very difficult to maintain. Google provides a front-end for their Datastore where you can run GQL queries, watch statistics and even manage data on the fly. Since pickled objects are stored in bytes as opposed to JSON strings, I figured that when editing an entity using the Google admin, pickled objects tend to break, since the values sent back to the server by your browser are usually strings. I haven’t found a good way around this yet (update: I did find a way), which is why I temporarily switched to JSON, which is easier to edit and maintain.

Here’s the draft code I’m using to store JSON objects in Google App Engine, which I called (surprise) JsonProperty:

from django.utils import simplejson
from google.appengine.ext import db

class JsonProperty(db.TextProperty):
	def validate(self, value):
		return value

	def get_value_for_datastore(self, model_instance):
		result = super(JsonProperty, self).get_value_for_datastore(model_instance)
		result = simplejson.dumps(result)
		return db.Text(result)

	def make_value_from_datastore(self, value):
		try:
			value = simplejson.loads(str(value))
		except:
			pass

		return super(JsonProperty, self).make_value_from_datastore(value)

Note that Google still runs Python 2.5 hence the simplejson library is only available through the django.utils package (it ships with Python starting from version 2.6).

So the logic is quite clear here, we use dumps and loads to dump and load strings from and to the Datatstore. Perhaps it’s lacking some code in the validate method, we do after all need to check if the given value is convertible to JSON and then raise an exception if not. But I’ll leave that to the next version of this snippet, as I’m currently only using it to prototype stuff and maintain small and medium-sized projects — so do use this at your own risk ;)

As a usage sample you can take this simple code which can be run from the App Engine console:

class MyEntity(db.Model)
    name = db.StringProperty()
    obj = JsonProperty()

my_obj = {'key-1': 'value-1', 'key-2': 'value-2'}
entity = MyEntity(name="my-name", obj=my_obj)
entity.put()

entities = MyEntity.all()
entities = entities.fetch(10)
for entity in entities:
    print entity.obj # outputs the dictionary object

And don’t forget that you can manually browse the Datastore using your App Engine dashboard and see how such entities are actually stored.

I wrote about performance in that previous post, but honestly, I didn’t have the time to measure it, so if you guys do feel free to leave your benchmark results in the comments section. Some graphs would be cool too. Oh and thanks for retweeting!

App Engine: Python Objects in the Google Datastore

Howdy! Hope you’re all good and getting ready for the upcoming holidays! I’ve been writing Google App Engine code last month more than ever before. I finally got to a file and data structure that I myself can understand. Hopefully we’re all aware that the Google Datastore is not relational and that most of the frequently used data types are supported: StringProperty, IntegerProperty, ListProperty, EmailProperty and many others. But are those enough? Hopefully yes, but this post introduces a new property type I’ve written mainly for myself: the ObjectProperty.

Have you ever thought of storing Python dictionaries or nested lists in the Google Datastore? What about custom objects with their attributes? I’ve gone a little bit further – ObjectProperty lets you store any Python object inside the Datastore. As you might know, everything in Python is an object, lists are objects, dictionaries are objects, functions are objects, classes are objects, objects are objects (duh!).

ObjectProperty uses the Python object serialization module called pickle (and yes, it’s available in Python 2.5). It “pickles” the input values before storing them in the Datastore, then “unpickles” them upon retrieval. This is very similar to JSON, but hey, JSON can’t store functions ;)

The code is fairly simple for now and works out of the box for me. I did make some human error assumptions hence the exception handling, but to be honest, there were no tests. It’s still at the idea stage so test cases will be built later on, which means that for now you should use this carefully, at your own will, provided as is, without any guarantees (obviously). Here’s the ObjectProperty class:

from google.appengine.ext import db
import pickle

# Use this property to store objects.
class ObjectProperty(db.BlobProperty):
	def validate(self, value):
		try:
			result = pickle.dumps(value)
			return value
		except pickle.PicklingError, e:
			return super(ObjectProperty, self).validate(value)

	def get_value_for_datastore(self, model_instance):
		result = super(ObjectProperty, self).get_value_for_datastore(model_instance)
		result = pickle.dumps(result)
		return db.Blob(result)

	def make_value_from_datastore(self, value):
		try:
			value = pickle.loads(str(value))
		except:
			pass
		return super(ObjectProperty, self).make_value_from_datastore(value)

So as you see, I’ve overridden a few methods which the App Engine docs suggests to create new properties. I used the pickle module to dump and load objects to and from the Datastore, as well as gave a try to pickle an object during validation. If that part is time-consuming for you (perhaps you’re pickling large objects) you might want to simply return value.

A quick usage example

Here’s a fairly simple usage example, should be quite easy to follow. You can try it out using App Engine’s interactive console:

class MyEntity(db.Model):
	name = db.StringProperty()
	obj = ObjectProperty() # Kudos

# Let's create an entity
my_object = { 'string': 'Hello World', 'integer': 51, 'list': [1, 2, 3], 'dict': { 'a': 1, 'b': 2, 'c': [4,5,6] } }
entity = MyEntity(name="Konstantin", obj=my_object)
entity.put() # Saves the entity to the datastore

# Then retrieve that value from the datastore
entities = MyEntity.all()
entities = entities.fetch(10)
for entity in entities:
	print "%s: %s" % (entity.name, entity.obj)

# Should output: Konstantin: { 'integer': 51, 'list': [1, 2, 3], 'string': 'Hello World', 'dict': { 'a': 1, 'c': [4, 5, 6], 'b': 2 } }

Voila. And here’s how it looks like in the datastore:

(dp0 S'integer' p1 I51 sS'list' p2 (lp3 I1 aI2 aI3 asS'string' p4 S'Hello ...

Now please note that such fields are not indexed, nor applicable for GQL queries. This means that you cannot lookup an entity whose “list called c inside the obj dictionary contains the number 5”, although that would be quite nice but definitely a bottleneck in performance (we’re using Google App Engine and the Google Datastore for performance after all).

Where can this be used?

It is up to you where to use such a property, but I can think of many. For instance, an entity called Account with some standard set of fields for typical user accounts. Now let’s tie social logins to such accounts – Facebook, Twitter, LinkedIn. Each service uses its own authentication method, for instance Twitter uses OAuth. Now, OAuth works with tokens, request tokens and access tokens, so define a twitter property as an ObjectProperty and store all the tokens in a simple Python dictionary. This works because you’ll never lookup tokens without looking up the Account first, and once you did look up the user account, your tokens are already unpickled and available. Magic, isn’t it?

As I already mentioned, this is still at the idea stage and with some feedback from you guys, I can hopefully decide whether to carry on or leave it as a private snippet for myself. Thanks for reading!

Update: I was concerned about not being able to maintain pickled objects in the Datastore admin, but that db.Blob() function solved the problems, so the values are now not editable through the admin. You might also want to check out JSON Objects for the Google Datastore and a comparison of the two in a post called Pickle vs JSON — Which is Faster?

Installing Python 2.5 on Ubuntu Linux 10.10

If you’ve been working on App Engine and you’ve noticed that some stuff works on your development server but not on your production, it may be related to the different versions of Python. Latest linux builds including Ubuntu 10.04 and 10.10 come with Python 2.6 pre-installed, but Google App Engine still runs Python 2.5 (an issue has been created to add Python 2.6 support, make sure you vote that up).

Their roadmap mentions nothing about upgrading. So in order to make your development server look more like your production servers, you’ll have to get Python 2.5, which is not that trivial at first.

So, Felix Krull has published an archive of new and old Python packages, so let’s use that source to get Python 2.5 running on a new Ubuntu box:

sudo add-apt-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python2.5

Yup, that was easy! Let’s now see if both Python 2.5 and Python 2.6 are available:

$ python2.5
Python 2.5.5 (r255:77872, Sep 14 2010, 15:51:01)

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)

All done! Oh and don’t forget to launch your App Engine development server using python2.5 (installing it is not enough):

$ python2.5 dev_appserver.py .

As a bonus to this post, I’d like to share with you my way of working with App Engine, not in terms of code, but in terms of libraries organization. If you’re writing code for App Engine you’re probably working on more than one project at a time, hence you’ll need to use the SDK more than once.

So instead of copying it, replacing Python packages, etc, simply move the google_appengine folder to /usr/share and in every App Engine project create a symbolic link called .gae that points to that location. The SDK will automatically locate all the Google libraries and the development server is easy to launch:

$ ln -s /usr/share/google_appengine/ .gae
$ python2.5 .gae/dev_appserver.py .

Don’t forget the dot at the end, since it tells the SDK which project to launch. And make sure you don’t push the .gae directory to your source control ;) Happy coding!

Foller.me Has Got a New Home: Google App Engine

That’s right, Google App Engine! For those of you who don’t remember, Foller.me is a Twitter app I wrote back in 2009. Honestly, I never realized back then that performance is an issue until the day I got featured on Mashable. More about it in a previous post.

If you’re not familiar with Foller.me, I’ll explain it here in a couple of words:

Foller.me is a web application based on the Twitter API, used to gather Twitter analytics from any profile – in seconds! Foller.me scans your Twitter profile, parses your latest tweets and followers. Tweets are separated into topics, mentions and hashtags. Followers are mapped out on a Google Map.

This weekend I made a choice I thought I never will – I’m moving Foller.me over to Google App Engine. It was all after running a couple of experiments in my Juice Toolkit project (note that there’s a GAE branch).

I realized that it wouldn’t take me too much time to rewrite Foller.me using the new language I’m learning – Python. And I was right, the first few tests were up and running on Saturday, and there’s now a live preview including more fixes and updates on App Spot. The clouds seem to work fine, geography is even better than I expected (with a few more updates) and the interface got a few extra buttons!

More over, the new version of Foller.me has been moved over to GitHub which means that it is now an open source project! Anybody can dive into the code, suggest a few patches or updates. Contributors are always welcome ;) Also, if there’s anybody interested in the PHP code that runs on the old version, let me know, I’ll be more than glad to share it.

Now, for the bad news – I’m closing down the Foller.me blog, since there’s not time to post, and not that much I could write about. I hope that the announcements, commits and wikis on GitHub and Twitter would be enough. The API will be temporarily shut down during the move and a few weeks after, but I’ll bring that to life, perhaps with a few more fixes. The domain will probably move to a www mirror, since App Engine doesn’t allow naked domains at this stage (see this issue).

So yeah, Foller.me will live, donations are always welcome. The app will remain free of charge and so will the API. But there’s a plan to launch a premium service on Foller.me – detailed profile analytics. I’m able to give you even more statistics and analytics on your Twitter profile, but of course not instantly – my rough calculations are 1-2 days. This will generate a very sophisticated PDF report about your profile, including much more information about your followers, about people you’re following, their relations, lists, graphs and charts, and even more geographical data.

All this is still in the stage of an idea or a concept, lacking a business plan. So my question to you right now is – would you be interested in such a service? And if yes, how much money are you willing to pay for one report? Please answer in the comments section. Everybody who answered will get their first reports free of charge (if this is ever implemented).

Facebook Fans Count Using Python and the Graph API

I noticed some peeps struggling to show off their Facebook fans count on their websites using the Facebook API. I’ve shown before on how to do it in PHP, and this quick post is about Python. Honestly, you’ll laugh as soon as you read the following three lines of code:

import facebook
graph = facebook.GraphAPI()
print "Mashable has %s fans" % graph.get_object('/mashable')['fan_count']

Yeah, and that is one more reason to love Facebook. Now, you might be wondering where I got that Facebook module I imported? It’s called the Facebook Python SDK, it’s free and open source, hosted at Github and seems to be official.

There’s other info about public objects available via that method as well, you can get a full list by printing the whole object which is a dictionary:

print graph.get_object('/mashable')

Don’t forget to cache objects since you wouldn’t like to fire requests at Facebook every time your page loads up. If you’re thinking about other Graph API methods, make sure you read the Authorization section in the Graph API Overview since you’ll need OAuth tokens to make requests.

Heh, maybe I should continue this series of posts for a dozen of other languages too? ;) Thanks for sharing!

A Brief Introduction to Google App Engine

If you’ve been following my tweets you might have noticed that I’ve been running around Python and Django and Google App Engine for the last few weeks. Honestly, I fell in love with Python the second I wrote my first few lines of code, but it is not what this post is about. This post is about my short experience with App Engine.

My first experience with App Engine was back in 2009, when I deployed my hello world application, made sure it worked on appspot and then forgot about it, since I was more keen on finishing my Foller.me project in PHP. I didn’t really get the idea of cloud computing and distributed applications at that time, since I had no experience in heavy load at all. When Foller.me was reviewed on Mashable, I thought that 16k people visited my website that day, but in reality the figures were different, since the whole thing just stopped working.

What do you think Google runs on? Do they run on water vapor? I mean, cloud – it’s databases, and operating systems, and memory, and microprocessors, and the Internet!

Larry Ellison, CEO & Co-founder, Oracle

I spent a little less than a year on Amazon EC2, got familiar with their tools and services – quite cool if you have a few hundred extra bucks to spend every month. Yeah, the reason I dropped out from Amazon in favor of MediaTemple was the cost. A few months later Amazon announced Cloud Computing Free of Charge for a whole year, but it was too late for me. I was already playing around with Python and Django, thinking about Google App Engine.

The Google Developer Day conference in Moscow made me reconsider App Engine with a few cool improvements, especially for businesses. So I made my final decision to branch my Juice Toolkit project into a GAE version, which I was already working on at that point.

It did take me a few days to rewrite my Django code for Google’s Datastore support and I hope that it’s worthed. I used the django-mptt package to work with trees in the master branch of Juice, which unfortunately did not work out of the box with App Engine, so yeah, I wrote my own trees in App Engine for threaded comments, pages and taxonomy. Of course it’s still not final and requires some refactoring and optimization, but that wasn’t very difficult ;) And memcache, oh I love App Engine’s memcache. Their limitations docs said I can use up to 1 megabyte of memory storage with my free account, but memcache reports that current usage is over 2 megabytes and Google ain’t charging!

I cannot say that I now know much about App Engine, since there’s still quite a lot to learn, but I do thing that beginners like me would love a list of short tips that would help them on their way from Django to Google App Engine:

  • Django Models are not compatible with Google App Engine models, you have to rewrite them
  • The Django admin and authentication systems are built on Django models, thus they will not work
  • There is an appengine-admin project, but you’ll be better off with Google’s Datastore Viewer
  • Django’s shell (interactive console) is replaced by Google’s online shell, usually located at localhost:8080/_ah/admin/interactive (it is lacking the Run Program on Ctrl+Enter feature)
  • When you pass around objects linked together (for instance parent and child comments), note that obj and obj.key() are slightly different: .key() method is more likely to get a fresh copy of your object.
  • If you need to run something heavy, don’t be lazy, use the Task Queue API
  • Profile your code on appspot. Google says that your free account will be able to serve 5 million page views per month. I did a little calculation: 5 million / 30 days ~ 160k page views per day. Google gives 6.5 CPU hours every day, so you should be able to serve ~ 7 page views per second. This means that you have to spend less than ~ 150 CPU milliseconds to generate each page in order to serve 5 million per month. That should be your goal. Quite simple, eh? Use memcache.

I guess that’s enough for a start. If you’d like to stay tuned to what I’m up to, make sure you follow the Juice Toolkit project at Github. Note that there’s a branch called GAE, since the master branch is completely based on Django. Thank you for retweeting