App Engine: Python Objects in the Google Datastore

Howdy! Hope you’re all good and getting ready for the upcoming holidays! I’ve been writing Google App Engine code last month more than ever before. I finally got to a file and data structure that I myself can understand. Hopefully we’re all aware that the Google Datastore is not relational and that most of the frequently used data types are supported: StringProperty, IntegerProperty, ListProperty, EmailProperty and many others. But are those enough? Hopefully yes, but this post introduces a new property type I’ve written mainly for myself: the ObjectProperty.

Have you ever thought of storing Python dictionaries or nested lists in the Google Datastore? What about custom objects with their attributes? I’ve gone a little bit further – ObjectProperty lets you store any Python object inside the Datastore. As you might know, everything in Python is an object, lists are objects, dictionaries are objects, functions are objects, classes are objects, objects are objects (duh!).

ObjectProperty uses the Python object serialization module called pickle (and yes, it’s available in Python 2.5). It “pickles” the input values before storing them in the Datastore, then “unpickles” them upon retrieval. This is very similar to JSON, but hey, JSON can’t store functions ;)

The code is fairly simple for now and works out of the box for me. I did make some human error assumptions hence the exception handling, but to be honest, there were no tests. It’s still at the idea stage so test cases will be built later on, which means that for now you should use this carefully, at your own will, provided as is, without any guarantees (obviously). Here’s the ObjectProperty class:

from google.appengine.ext import db
import pickle

# Use this property to store objects.
class ObjectProperty(db.BlobProperty):
	def validate(self, value):
		try:
			result = pickle.dumps(value)
			return value
		except pickle.PicklingError, e:
			return super(ObjectProperty, self).validate(value)

	def get_value_for_datastore(self, model_instance):
		result = super(ObjectProperty, self).get_value_for_datastore(model_instance)
		result = pickle.dumps(result)
		return db.Blob(result)

	def make_value_from_datastore(self, value):
		try:
			value = pickle.loads(str(value))
		except:
			pass
		return super(ObjectProperty, self).make_value_from_datastore(value)

So as you see, I’ve overridden a few methods which the App Engine docs suggests to create new properties. I used the pickle module to dump and load objects to and from the Datastore, as well as gave a try to pickle an object during validation. If that part is time-consuming for you (perhaps you’re pickling large objects) you might want to simply return value.

A quick usage example

Here’s a fairly simple usage example, should be quite easy to follow. You can try it out using App Engine’s interactive console:

class MyEntity(db.Model):
	name = db.StringProperty()
	obj = ObjectProperty() # Kudos

# Let's create an entity
my_object = { 'string': 'Hello World', 'integer': 51, 'list': [1, 2, 3], 'dict': { 'a': 1, 'b': 2, 'c': [4,5,6] } }
entity = MyEntity(name="Konstantin", obj=my_object)
entity.put() # Saves the entity to the datastore

# Then retrieve that value from the datastore
entities = MyEntity.all()
entities = entities.fetch(10)
for entity in entities:
	print "%s: %s" % (entity.name, entity.obj)

# Should output: Konstantin: { 'integer': 51, 'list': [1, 2, 3], 'string': 'Hello World', 'dict': { 'a': 1, 'c': [4, 5, 6], 'b': 2 } }

Voila. And here’s how it looks like in the datastore:

(dp0 S'integer' p1 I51 sS'list' p2 (lp3 I1 aI2 aI3 asS'string' p4 S'Hello ...

Now please note that such fields are not indexed, nor applicable for GQL queries. This means that you cannot lookup an entity whose “list called c inside the obj dictionary contains the number 5”, although that would be quite nice but definitely a bottleneck in performance (we’re using Google App Engine and the Google Datastore for performance after all).

Where can this be used?

It is up to you where to use such a property, but I can think of many. For instance, an entity called Account with some standard set of fields for typical user accounts. Now let’s tie social logins to such accounts – Facebook, Twitter, LinkedIn. Each service uses its own authentication method, for instance Twitter uses OAuth. Now, OAuth works with tokens, request tokens and access tokens, so define a twitter property as an ObjectProperty and store all the tokens in a simple Python dictionary. This works because you’ll never lookup tokens without looking up the Account first, and once you did look up the user account, your tokens are already unpickled and available. Magic, isn’t it?

As I already mentioned, this is still at the idea stage and with some feedback from you guys, I can hopefully decide whether to carry on or leave it as a private snippet for myself. Thanks for reading!

Update: I was concerned about not being able to maintain pickled objects in the Datastore admin, but that db.Blob() function solved the problems, so the values are now not editable through the admin. You might also want to check out JSON Objects for the Google Datastore and a comparison of the two in a post called Pickle vs JSON — Which is Faster?

About the author

Konstantin Kovshenin

WordPress Core Contributor, ex-Automattician, public speaker and consultant, enjoying life in Moscow. I blog about tech, WordPress and DevOps.

12 comments