Pickle vs JSON — Which is Faster?

Hey there! I'm currently working on a CLI tool to deploy WordPress apps to DigitalOcean. Check it out! It's free and open source.

If you’re here for the short answer — JSON is 25 times faster in reading (loads) and 15 times faster in writing (dumps). I’ve been thinking about this since I wrote the ObjectProperty and JsonProperty classes for Google App Engine. They’re both easy to use and work as expected. I did have some trouble with ObjectProperty but I figured it out in the end.

As my previous posts mention, the ObjectProperty class uses Python’s pickle module, while JsonProperty works with simplejson (bundled with Python 2.6 and above, available through django.utils in Google App Engine). I decided to measure the performance of these two.

Unfortunately I couldn’t do much benchmarking on top of Google App Engine since there’s too much lag between the application server and Google’s Datastore so I decided to write simple benchmarks and find out which is faster — pickle or JSON. I started out by constructing a dataset which I’ll be pickling and “jsoning”, which resulted in some random lists, dictionaries and nested dictionaries containing lorem ipsum texts.

I then used Python’s timeit module to measure how long it took to “dumps” and “loads” the dataset using pickle and simplejson. I also measured the resulted pickle/json strings length to see which will be smaller in size, and guess what — JSON wins in all rounds. I ran the tests 10, 20, 50, 100, 500 and 1000 times for reading, writing and length comparison. Below are three charts illustrating the results:

As you see, dumps in JSON are much faster — by almost 1500%, and that is 15 times faster than Pickling! Now let’s see what happens with loads:

Loads shows even more goodies for JSON lovers — a massive 2500%, how’s that!? Of course some of you might be concerned with size, memory usage, etc. Since there’s no good method of measuring the actual bytes, I used Python’s len function to simply measure the number of characters in the resulting pickle/JSON string.

So yes, JSON is faster in all three aspects. If you’d like to experiment yourself, feel free to use the source code I wrote. Beware of running the 500/1000 tests, those can take hours ;)

The benchmark was done on an Ubuntu 10.10 64-bit machine with Python 2.6 installed, but I don’t think that results will be different on others. The conclusion to this is that if you need to store complex objects, such as functions, class instances, etc., you have to use pickle, while if you’re only looking for a way to store simple objects, lists and nested dictionaries, then you’re better off with JSON.

Thank you for reading and retweeting ;)

Update: If you’re sticking to Pickling objects and have the freedom to use C compiled libraries, then go ahead with cPickle instead of pickle, although that still lacks behind JSON (twice in loading and dumping). As to App Engine, I tried running a short benchmark with cPickle vs simplejson from the django.utils package, results were better for pickle, but still not enough to beat JSON which is 30% faster. I know there are other libraries worth mentioning here, but my hands are tied since I’m running App Engine with Python 2.5 and not allowed to install extra modules ;) Cheers and thanks for the comments!

8 thoughts on “Pickle vs JSON — Which is Faster?

  1. Tweets that mention Pickle vs JSON — Which is Faster? -- Topsy.com

  2. Thanks for doing this, it confirms what I thought, but it's always nice to see the real numbers. I guess I made the right decision to use JSON. If you think about it, it makes sense, since JSON will only handle the simple dictionary, list, string, number, boolean and None/null. Which is fine for most things. I think even with the conversion you would have to do storing most objects as JSON will be faster then pickle. I think unless you are dealing with very complex object structures you are probably better off with JSON.

    • Michael, thanks for your comment. And one more bonus is that JSON gives you the ability to pass stuff directly to the output of asynchronous javascript (AJAX) calls without having to encode it, which makes things even simpler in terms of web applications. Cheers!

  3. There are some problems with your tests. Why are you comparing pickle(cPickle) protocol which is able to serialize all types of objects (which implements serialize protocol (__reduce__)) with json protocol which is able to serialize only built-in types? More correct will be comparing "json" and "marshal" modules. I have replaced "pickle" module with "marshal" in your script and "marshal" module faster more than 4 times

    • Hi! That's a fairly good question, thank you. Yes, I did try marshal in that same script by simply replacing "import pickle" with "import marshal as pickle" and then running the tests — marshal was a little less than 2 times faster, so maybe this is dependent on the architecture or whatever, since you got four.

      But anyways, marshal is faster than cPickle by 4 times in loading and 8 times in dumping, but I wouldn't go with marshal, quoting from the docs: "The marshal module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of .pyc files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you're serializing and de-serializing Python objects, use the pickle module instead."

      It's up to you though, but I really wanted to compare JSON and Pickle since I'm not yet ready to use any others ;)

      Cheers, and thanks again for your comment!

  4. The big win for pickle is when you have complex object hierarchies that need to be serialized. You have to count all the code to convert your objects into a format that is compatible with JSON and then the code to convert them back to your program's internal representation in any benchmark too. Not to mention the headache of trying to maintain all that extra code.

  5. You can also use the more recent pick protocols. For example, change line 86 to:

    pickle_result.append(pickle.dumps(entry, pickle.HIGHEST_PROTOCOL))

    On my machine, this makes the loads twice as fast. JSON still much faster.

Comments are closed.