App Engine: JSON Objects in the Google Datastore

Following up on my previous post about Python Objects in the Google Datastore where I’ve shown an easy way of storing any python object inside the Google Datastore by pickling it. Pickling works fine and it can literally store functions, classes and class instances in the datastore which is great, but as I discussed earlier, JSON could work too, so here I’ll introduce a similar approach at storing JSON objects (strings, lists and dictionaries) inside Google.

The problem with pickling objects is that they’re very difficult to maintain. Google provides a front-end for their Datastore where you can run GQL queries, watch statistics and even manage data on the fly. Since pickled objects are stored in bytes as opposed to JSON strings, I figured that when editing an entity using the Google admin, pickled objects tend to break, since the values sent back to the server by your browser are usually strings. I haven’t found a good way around this yet (update: I did find a way), which is why I temporarily switched to JSON, which is easier to edit and maintain.

Here’s the draft code I’m using to store JSON objects in Google App Engine, which I called (surprise) JsonProperty:

from django.utils import simplejson
from google.appengine.ext import db

class JsonProperty(db.TextProperty):
	def validate(self, value):
		return value

	def get_value_for_datastore(self, model_instance):
		result = super(JsonProperty, self).get_value_for_datastore(model_instance)
		result = simplejson.dumps(result)
		return db.Text(result)

	def make_value_from_datastore(self, value):
		try:
			value = simplejson.loads(str(value))
		except:
			pass

		return super(JsonProperty, self).make_value_from_datastore(value)

Note that Google still runs Python 2.5 hence the simplejson library is only available through the django.utils package (it ships with Python starting from version 2.6).

So the logic is quite clear here, we use dumps and loads to dump and load strings from and to the Datatstore. Perhaps it’s lacking some code in the validate method, we do after all need to check if the given value is convertible to JSON and then raise an exception if not. But I’ll leave that to the next version of this snippet, as I’m currently only using it to prototype stuff and maintain small and medium-sized projects — so do use this at your own risk ;)

As a usage sample you can take this simple code which can be run from the App Engine console:

class MyEntity(db.Model)
    name = db.StringProperty()
    obj = JsonProperty()

my_obj = {'key-1': 'value-1', 'key-2': 'value-2'}
entity = MyEntity(name="my-name", obj=my_obj)
entity.put()

entities = MyEntity.all()
entities = entities.fetch(10)
for entity in entities:
    print entity.obj # outputs the dictionary object

And don’t forget that you can manually browse the Datastore using your App Engine dashboard and see how such entities are actually stored.

I wrote about performance in that previous post, but honestly, I didn’t have the time to measure it, so if you guys do feel free to leave your benchmark results in the comments section. Some graphs would be cool too. Oh and thanks for retweeting!

App Engine: Python Objects in the Google Datastore

Howdy! Hope you’re all good and getting ready for the upcoming holidays! I’ve been writing Google App Engine code last month more than ever before. I finally got to a file and data structure that I myself can understand. Hopefully we’re all aware that the Google Datastore is not relational and that most of the frequently used data types are supported: StringProperty, IntegerProperty, ListProperty, EmailProperty and many others. But are those enough? Hopefully yes, but this post introduces a new property type I’ve written mainly for myself: the ObjectProperty.

Have you ever thought of storing Python dictionaries or nested lists in the Google Datastore? What about custom objects with their attributes? I’ve gone a little bit further – ObjectProperty lets you store any Python object inside the Datastore. As you might know, everything in Python is an object, lists are objects, dictionaries are objects, functions are objects, classes are objects, objects are objects (duh!).

ObjectProperty uses the Python object serialization module called pickle (and yes, it’s available in Python 2.5). It “pickles” the input values before storing them in the Datastore, then “unpickles” them upon retrieval. This is very similar to JSON, but hey, JSON can’t store functions ;)

The code is fairly simple for now and works out of the box for me. I did make some human error assumptions hence the exception handling, but to be honest, there were no tests. It’s still at the idea stage so test cases will be built later on, which means that for now you should use this carefully, at your own will, provided as is, without any guarantees (obviously). Here’s the ObjectProperty class:

from google.appengine.ext import db
import pickle

# Use this property to store objects.
class ObjectProperty(db.BlobProperty):
	def validate(self, value):
		try:
			result = pickle.dumps(value)
			return value
		except pickle.PicklingError, e:
			return super(ObjectProperty, self).validate(value)

	def get_value_for_datastore(self, model_instance):
		result = super(ObjectProperty, self).get_value_for_datastore(model_instance)
		result = pickle.dumps(result)
		return db.Blob(result)

	def make_value_from_datastore(self, value):
		try:
			value = pickle.loads(str(value))
		except:
			pass
		return super(ObjectProperty, self).make_value_from_datastore(value)

So as you see, I’ve overridden a few methods which the App Engine docs suggests to create new properties. I used the pickle module to dump and load objects to and from the Datastore, as well as gave a try to pickle an object during validation. If that part is time-consuming for you (perhaps you’re pickling large objects) you might want to simply return value.

A quick usage example

Here’s a fairly simple usage example, should be quite easy to follow. You can try it out using App Engine’s interactive console:

class MyEntity(db.Model):
	name = db.StringProperty()
	obj = ObjectProperty() # Kudos

# Let's create an entity
my_object = { 'string': 'Hello World', 'integer': 51, 'list': [1, 2, 3], 'dict': { 'a': 1, 'b': 2, 'c': [4,5,6] } }
entity = MyEntity(name="Konstantin", obj=my_object)
entity.put() # Saves the entity to the datastore

# Then retrieve that value from the datastore
entities = MyEntity.all()
entities = entities.fetch(10)
for entity in entities:
	print "%s: %s" % (entity.name, entity.obj)

# Should output: Konstantin: { 'integer': 51, 'list': [1, 2, 3], 'string': 'Hello World', 'dict': { 'a': 1, 'c': [4, 5, 6], 'b': 2 } }

Voila. And here’s how it looks like in the datastore:

(dp0 S'integer' p1 I51 sS'list' p2 (lp3 I1 aI2 aI3 asS'string' p4 S'Hello ...

Now please note that such fields are not indexed, nor applicable for GQL queries. This means that you cannot lookup an entity whose “list called c inside the obj dictionary contains the number 5”, although that would be quite nice but definitely a bottleneck in performance (we’re using Google App Engine and the Google Datastore for performance after all).

Where can this be used?

It is up to you where to use such a property, but I can think of many. For instance, an entity called Account with some standard set of fields for typical user accounts. Now let’s tie social logins to such accounts – Facebook, Twitter, LinkedIn. Each service uses its own authentication method, for instance Twitter uses OAuth. Now, OAuth works with tokens, request tokens and access tokens, so define a twitter property as an ObjectProperty and store all the tokens in a simple Python dictionary. This works because you’ll never lookup tokens without looking up the Account first, and once you did look up the user account, your tokens are already unpickled and available. Magic, isn’t it?

As I already mentioned, this is still at the idea stage and with some feedback from you guys, I can hopefully decide whether to carry on or leave it as a private snippet for myself. Thanks for reading!

Update: I was concerned about not being able to maintain pickled objects in the Datastore admin, but that db.Blob() function solved the problems, so the values are now not editable through the admin. You might also want to check out JSON Objects for the Google Datastore and a comparison of the two in a post called Pickle vs JSON — Which is Faster?

Installing Python 2.5 on Ubuntu Linux 10.10

If you’ve been working on App Engine and you’ve noticed that some stuff works on your development server but not on your production, it may be related to the different versions of Python. Latest linux builds including Ubuntu 10.04 and 10.10 come with Python 2.6 pre-installed, but Google App Engine still runs Python 2.5 (an issue has been created to add Python 2.6 support, make sure you vote that up).

Their roadmap mentions nothing about upgrading. So in order to make your development server look more like your production servers, you’ll have to get Python 2.5, which is not that trivial at first.

So, Felix Krull has published an archive of new and old Python packages, so let’s use that source to get Python 2.5 running on a new Ubuntu box:

sudo add-apt-repository ppa:fkrull/deadsnakes
sudo apt-get update
sudo apt-get install python2.5

Yup, that was easy! Let’s now see if both Python 2.5 and Python 2.6 are available:

$ python2.5
Python 2.5.5 (r255:77872, Sep 14 2010, 15:51:01)

$ python
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)

All done! Oh and don’t forget to launch your App Engine development server using python2.5 (installing it is not enough):

$ python2.5 dev_appserver.py .

As a bonus to this post, I’d like to share with you my way of working with App Engine, not in terms of code, but in terms of libraries organization. If you’re writing code for App Engine you’re probably working on more than one project at a time, hence you’ll need to use the SDK more than once.

So instead of copying it, replacing Python packages, etc, simply move the google_appengine folder to /usr/share and in every App Engine project create a symbolic link called .gae that points to that location. The SDK will automatically locate all the Google libraries and the development server is easy to launch:

$ ln -s /usr/share/google_appengine/ .gae
$ python2.5 .gae/dev_appserver.py .

Don’t forget the dot at the end, since it tells the SDK which project to launch. And make sure you don’t push the .gae directory to your source control ;) Happy coding!

Foller.me Has Got a New Home: Google App Engine

That’s right, Google App Engine! For those of you who don’t remember, Foller.me is a Twitter app I wrote back in 2009. Honestly, I never realized back then that performance is an issue until the day I got featured on Mashable. More about it in a previous post.

If you’re not familiar with Foller.me, I’ll explain it here in a couple of words:

Foller.me is a web application based on the Twitter API, used to gather Twitter analytics from any profile – in seconds! Foller.me scans your Twitter profile, parses your latest tweets and followers. Tweets are separated into topics, mentions and hashtags. Followers are mapped out on a Google Map.

This weekend I made a choice I thought I never will – I’m moving Foller.me over to Google App Engine. It was all after running a couple of experiments in my Juice Toolkit project (note that there’s a GAE branch).

I realized that it wouldn’t take me too much time to rewrite Foller.me using the new language I’m learning – Python. And I was right, the first few tests were up and running on Saturday, and there’s now a live preview including more fixes and updates on App Spot. The clouds seem to work fine, geography is even better than I expected (with a few more updates) and the interface got a few extra buttons!

More over, the new version of Foller.me has been moved over to GitHub which means that it is now an open source project! Anybody can dive into the code, suggest a few patches or updates. Contributors are always welcome ;) Also, if there’s anybody interested in the PHP code that runs on the old version, let me know, I’ll be more than glad to share it.

Now, for the bad news – I’m closing down the Foller.me blog, since there’s not time to post, and not that much I could write about. I hope that the announcements, commits and wikis on GitHub and Twitter would be enough. The API will be temporarily shut down during the move and a few weeks after, but I’ll bring that to life, perhaps with a few more fixes. The domain will probably move to a www mirror, since App Engine doesn’t allow naked domains at this stage (see this issue).

So yeah, Foller.me will live, donations are always welcome. The app will remain free of charge and so will the API. But there’s a plan to launch a premium service on Foller.me – detailed profile analytics. I’m able to give you even more statistics and analytics on your Twitter profile, but of course not instantly – my rough calculations are 1-2 days. This will generate a very sophisticated PDF report about your profile, including much more information about your followers, about people you’re following, their relations, lists, graphs and charts, and even more geographical data.

All this is still in the stage of an idea or a concept, lacking a business plan. So my question to you right now is – would you be interested in such a service? And if yes, how much money are you willing to pay for one report? Please answer in the comments section. Everybody who answered will get their first reports free of charge (if this is ever implemented).

A Brief Introduction to Google App Engine

If you’ve been following my tweets you might have noticed that I’ve been running around Python and Django and Google App Engine for the last few weeks. Honestly, I fell in love with Python the second I wrote my first few lines of code, but it is not what this post is about. This post is about my short experience with App Engine.

My first experience with App Engine was back in 2009, when I deployed my hello world application, made sure it worked on appspot and then forgot about it, since I was more keen on finishing my Foller.me project in PHP. I didn’t really get the idea of cloud computing and distributed applications at that time, since I had no experience in heavy load at all. When Foller.me was reviewed on Mashable, I thought that 16k people visited my website that day, but in reality the figures were different, since the whole thing just stopped working.

What do you think Google runs on? Do they run on water vapor? I mean, cloud – it’s databases, and operating systems, and memory, and microprocessors, and the Internet!

Larry Ellison, CEO & Co-founder, Oracle

I spent a little less than a year on Amazon EC2, got familiar with their tools and services – quite cool if you have a few hundred extra bucks to spend every month. Yeah, the reason I dropped out from Amazon in favor of MediaTemple was the cost. A few months later Amazon announced Cloud Computing Free of Charge for a whole year, but it was too late for me. I was already playing around with Python and Django, thinking about Google App Engine.

The Google Developer Day conference in Moscow made me reconsider App Engine with a few cool improvements, especially for businesses. So I made my final decision to branch my Juice Toolkit project into a GAE version, which I was already working on at that point.

It did take me a few days to rewrite my Django code for Google’s Datastore support and I hope that it’s worthed. I used the django-mptt package to work with trees in the master branch of Juice, which unfortunately did not work out of the box with App Engine, so yeah, I wrote my own trees in App Engine for threaded comments, pages and taxonomy. Of course it’s still not final and requires some refactoring and optimization, but that wasn’t very difficult ;) And memcache, oh I love App Engine’s memcache. Their limitations docs said I can use up to 1 megabyte of memory storage with my free account, but memcache reports that current usage is over 2 megabytes and Google ain’t charging!

I cannot say that I now know much about App Engine, since there’s still quite a lot to learn, but I do thing that beginners like me would love a list of short tips that would help them on their way from Django to Google App Engine:

  • Django Models are not compatible with Google App Engine models, you have to rewrite them
  • The Django admin and authentication systems are built on Django models, thus they will not work
  • There is an appengine-admin project, but you’ll be better off with Google’s Datastore Viewer
  • Django’s shell (interactive console) is replaced by Google’s online shell, usually located at localhost:8080/_ah/admin/interactive (it is lacking the Run Program on Ctrl+Enter feature)
  • When you pass around objects linked together (for instance parent and child comments), note that obj and obj.key() are slightly different: .key() method is more likely to get a fresh copy of your object.
  • If you need to run something heavy, don’t be lazy, use the Task Queue API
  • Profile your code on appspot. Google says that your free account will be able to serve 5 million page views per month. I did a little calculation: 5 million / 30 days ~ 160k page views per day. Google gives 6.5 CPU hours every day, so you should be able to serve ~ 7 page views per second. This means that you have to spend less than ~ 150 CPU milliseconds to generate each page in order to serve 5 million per month. That should be your goal. Quite simple, eh? Use memcache.

I guess that’s enough for a start. If you’d like to stay tuned to what I’m up to, make sure you follow the Juice Toolkit project at Github. Note that there’s a branch called GAE, since the master branch is completely based on Django. Thank you for retweeting

Google Developer Day Moscow 2010

It’s been a good Friday last week, although I was a little bit late for the show. The event was held in Crocus City Hall in Moscow, which is quite wicked unless you drive there by car. Google Developer Day Moscow 2010, we all waited so long for it (one whole year actually) and it turned out to be… fascinating, as usual!

Starting early morning we got some coffee (which I was late for) and took our place in the main hall for the keynote by Eric Tholome and Gene Sokolov and a few other speakers who introduced their sections: Chrome & HTML5 was amazing, 2d and 3d graphics, filesystem API and hardware access, thus – speech recognition, device orientation and more. Chrome Web Store is coming soon (developer preview available). Cloud Computing with the new AppEngine for Business, plus a short introduction to Spring Roo. The Android introduction was quite boring. Other sections (Monetization and Social Web) didn’t get their five minutes during the keynote.

After that we all went out to have some fun, drank coke, played MindBall, PS3 and air hockey. This part turned out to be much more exciting than last year ;) and then came the presentations. I’ll list below the ones I’ve been at, others were promised to be listed on Google Code Blog.

Google Web Toolkit

Fred Sauer (@fredsa) gave us yet another short intro to GWT, mentioned again that the Google AdWords interface is built completely using their toolkit which is wonderful. Yeah, we heard that last year, did anything change? Well yeah, Fred spoke a little bit more about Spring Roo and then off to Eclipse. We’ve seen Eclipse last year too, but it seems that they made some improvements on the Google Plugin for Eclipse and introduced Speed Tracer which is quite exciting.

We went once more through the features of GWT, a brief GWT 2.1 introduction and yet another MVP presentation (for the ones that missed it last year).

This whole presentation made me install Eclipse immediately. I downloaded and installed the Google Plugin with AppEngine and GWT enabled, I switched my workspace to PyDev, created a new Google AppEngine Hello-World project, hit Deploy to AppEngine and bang! It told me that Eclipse cannot deploy my project to AppEngine since it’s not an AppEngine project. What? Goodbye Eclipse, see you next year! ;)

AppEngine for Business

I miseed the first “What’s new in AppEngine” topic by Fred, but Patrick Chanezon (@chanezon) outlined some of the exciting bits in his topic. Patrick introduced us to AppEngine for Business: SLA, Support, Hosted SQL, Custom Domain SSL and Enterprise Admin Console (sounds awesome, doesn’t it) – but yet again, I’m not that keen on trying it, especially with the feeling that they’ve done everything right, but only for Java, while Python is lacking behind. I’m okay with the current console and limitations, so thank you Google ;)

Once again, we’ve been told about Eclipse, the Google Plugin for Eclipse and how easy it is to deploy an application to AppEngine (Java, *sigh*). Patrick then gave us a short intro to the Google Apps Marketplace and took questions, which were mostly about feeds, comissions, etc.

VC Investment for Your Company

This was quite interesting with Ilya Ponomarev (@iponomarev) and Don Dodge (@dondodge) on stage. They discussed doing business in Russia, startups, business incubators and Skolkovo Innovation Center. Surprisingly Ilya mentioned Timothy Post (@timothypost) and Runet Labs as the ones launching Techstars in Russia.

Ilya and Don took many questions, most of which were either boring, or from journalists ;) At the end of the session, Don disappeared and Ilya gathered a group outside in the main hall and spent another hour answering questions (some of which were silly again). But yeah, it’s good to hear that stuff like this is at least being discussed. A good quote from Don about looking for VC investment in your startup:

One person can have a delusion. But if three people are crazy, okay, we’ll give you the money!

Don Dodge at Google Developer Day Moscow 2010

Well, that’s quite it! At the end of all the sessions we got Google Developer Day and Google Chrome t-shirts, beer and wine, again, this seems to be a tradition. I’ve gathered a Twitter list of people I met, heard about and seen at Google Developer Day, you can find it right over here: @kovshenin/gddru – feel free to poke me if there’s somebody I forgot to add to that list.

Anyways, it’s been a great day, hope to be there next year!

Amazon Web Services: Cloud Computing Free of Charge

Howly shmoly, just read the announcement of Amazon’s Free Usage Tier offering an EC2 micro instance free of charge for a whole year! Sounds cool, doesn’t it? Well let’s go back a few months and analyze the reason why I left Amazon in favor of Media Temple’s (ve) service: Amazon is way too expensive for a young geek like me, barely having the money to pay rent for my lousy apartment in Moscow ;)

Well that’s not the only reason, but I’m now quite comfortable with (mt)’s services, except their tech support, but that’s not what this post is about. I must have flooded my Twitter with messages about Django and Python. Honestly, I fell in love with Python a few months ago (in theory) then started scratching code in the beginning of October, but then again, this is not what the post is about (can somebody tell me why I’m going off-topic today?)

Back to AWS. The news is good, but the fact that they mention “new customers” frightens me:

These free tiers are only available to new AWS customers and are available for 12 months following your AWS sign-up date. When your free usage expires or if your application use exceeds the free usage tiers, you simply pay standard, pay-as-you-go service rates (see each service page for full pricing details).

I’ve been with them for over a year, paying a bunch of money every month, so I’m not a new customer for them anymore, unfortunately. So they’re not really targeting old customers which were unsatisfied with something, but new ones which have never tried EC2 (S3 and all the rest). Again, this is only a trial, unlike Google AppEngine, which happens to love Python code.

So my thinking is – is this all a coincidence, or is it a light for me towards AppEngine, Python and Google? Stay tuned: @kovshenin :)