Howdy! Hope you’re all good and getting ready for the upcoming holidays! I’ve been writing Google App Engine code last month more than ever before. I finally got to a file and data structure that I myself can understand. Hopefully we’re all aware that the Google Datastore is not relational and that most of the frequently used data types are supported: StringProperty, IntegerProperty, ListProperty, EmailProperty and many others. But are those enough? Hopefully yes, but this post introduces a new property type I’ve written mainly for myself: the ObjectProperty.
Have you ever thought of storing Python dictionaries or nested lists in the Google Datastore? What about custom objects with their attributes? I’ve gone a little bit further – ObjectProperty lets you store any Python object inside the Datastore. As you might know, everything in Python is an object, lists are objects, dictionaries are objects, functions are objects, classes are objects, objects are objects (duh!).
ObjectProperty uses the Python object serialization module called pickle (and yes, it’s available in Python 2.5). It “pickles” the input values before storing them in the Datastore, then “unpickles” them upon retrieval. This is very similar to JSON, but hey, JSON can’t store functions ;)
The code is fairly simple for now and works out of the box for me. I did make some human error assumptions hence the exception handling, but to be honest, there were no tests. It’s still at the idea stage so test cases will be built later on, which means that for now you should use this carefully, at your own will, provided as is, without any guarantees (obviously). Here’s the ObjectProperty class:
from google.appengine.ext import db import pickle # Use this property to store objects. class ObjectProperty(db.BlobProperty): def validate(self, value): try: result = pickle.dumps(value) return value except pickle.PicklingError, e: return super(ObjectProperty, self).validate(value) def get_value_for_datastore(self, model_instance): result = super(ObjectProperty, self).get_value_for_datastore(model_instance) result = pickle.dumps(result) return db.Blob(result) def make_value_from_datastore(self, value): try: value = pickle.loads(str(value)) except: pass return super(ObjectProperty, self).make_value_from_datastore(value)
So as you see, I’ve overridden a few methods which the App Engine docs suggests to create new properties. I used the pickle module to dump and load objects to and from the Datastore, as well as gave a try to pickle an object during validation. If that part is time-consuming for you (perhaps you’re pickling large objects) you might want to simply return value.
A quick usage example
Here’s a fairly simple usage example, should be quite easy to follow. You can try it out using App Engine’s interactive console:
class MyEntity(db.Model): name = db.StringProperty() obj = ObjectProperty() # Kudos # Let's create an entity my_object = { 'string': 'Hello World', 'integer': 51, 'list': [1, 2, 3], 'dict': { 'a': 1, 'b': 2, 'c': [4,5,6] } } entity = MyEntity(name="Konstantin", obj=my_object) entity.put() # Saves the entity to the datastore # Then retrieve that value from the datastore entities = MyEntity.all() entities = entities.fetch(10) for entity in entities: print "%s: %s" % (entity.name, entity.obj) # Should output: Konstantin: { 'integer': 51, 'list': [1, 2, 3], 'string': 'Hello World', 'dict': { 'a': 1, 'c': [4, 5, 6], 'b': 2 } }
Voila. And here’s how it looks like in the datastore:
(dp0 S'integer' p1 I51 sS'list' p2 (lp3 I1 aI2 aI3 asS'string' p4 S'Hello ...
Now please note that such fields are not indexed, nor applicable for GQL queries. This means that you cannot lookup an entity whose “list called c inside the obj dictionary contains the number 5”, although that would be quite nice but definitely a bottleneck in performance (we’re using Google App Engine and the Google Datastore for performance after all).
Where can this be used?
It is up to you where to use such a property, but I can think of many. For instance, an entity called Account with some standard set of fields for typical user accounts. Now let’s tie social logins to such accounts – Facebook, Twitter, LinkedIn. Each service uses its own authentication method, for instance Twitter uses OAuth. Now, OAuth works with tokens, request tokens and access tokens, so define a twitter property as an ObjectProperty and store all the tokens in a simple Python dictionary. This works because you’ll never lookup tokens without looking up the Account first, and once you did look up the user account, your tokens are already unpickled and available. Magic, isn’t it?
As I already mentioned, this is still at the idea stage and with some feedback from you guys, I can hopefully decide whether to carry on or leave it as a private snippet for myself. Thanks for reading!
Update: I was concerned about not being able to maintain pickled objects in the Datastore admin, but that db.Blob() function solved the problems, so the values are now not editable through the admin. You might also want to check out JSON Objects for the Google Datastore and a comparison of the two in a post called Pickle vs JSON — Which is Faster?
[…] This post was mentioned on Twitter by Konstantin Kovshenin and Shanker Bakshi, Amor. Amor said: App Engine: Python Objects in the Google Datastore http://bit.ly/eHntZN – via @kovshenin […]
Also see Nick Johnson's aetycoon project for the similar "PickleProperty" and a whole bunch of other variations on the custom property theme.
Wow, looks good David, thanks for sharing! Interesting to know though what kind of objects do people want to store in the datastore and whether pickle is more efficient than JSON. Perhaps there's a need for a JsonProperty too, but that would require an extra lib.
A JSON library is available by default in App Engine so there's no need to add your own just do:
from django.utils import simplejson
A bug has been filed with Google to change it to work like normal simplejson so you can just do:
import simplejson
I would love to know which is faster, Pickle or JSON.
I currently use JSON for my complex property needs, but pickle might simplify some code.
Michael, thanks yeah figured out it was bundled with django. I did some experimenting with JSON objects too in a separate post called JSON Objects in the Google Datastore. Again, this is a simple draft which I'm using to prototype. Enterprise-level code might be available in the future.
As for benchmarking, well I'm waiting for somebody from my readers to do it ;)
Cheers!
Turns out that JSON is much faster than pickling: Pickle vs JSON — Which is Faster?
[…] up on my previous post about Python Objects in the Google Datastore where I’ve shown an easy way of storing any python object inside the Google Datastore by […]
[…] and 15 times faster in writing (dumps). I’ve been thinking about this since I wrote the ObjectProperty and JsonProperty classes for Google App Engine. They’re both easy to use and work as […]
Dude, thanks so much for this post!!
No probs
I don't understand why you did this:
<pre>except:
pass
return super(ObjectProperty, self).make_value_from_datastore(value)
</pre>
Is it to convert the value into a valid one?
Anon, it's to grab the value from the datastore. Look up the make_value_from_datastore function.