Date Tags python

Did a quick comparison of some data serialization options for Python. My requirements for the serialization format were the following:

  • Input data is typically either a list or a dictionary.
  • Interoperability is important and must be compatible with at least C.
  • A human readable format is desirable but not necessary.

Based on the requirements, I took a look at the following Python packages:

The following Python script was used to test out the various packages:

import os
import time

import umsgpack
import yaml
import cjson
import ujson

DATA = [{'val1':12345, 'val2':[1,2,3,4,5], 'val3':"12345"} for _ in range(10000)]

def test_serialization(name, encode, decode):
    print name
    print "  Encoding..."
    t_start = time.clock()
    packed = encode(DATA)
    print "    time = %f seconds" % (time.clock() - t_start)
    print "    size = %u kilobytes" % (len(packed) / 1024)
    print "  Decoding..."
    t_start = time.clock()
    unpacked = decode(packed)
    print "    time = %f seconds" % (time.clock() - t_start)
    print "    same = %r" % (DATA == unpacked)

test_serialization("umsgpack", umsgpack.packb, umsgpack.unpackb)
test_serialization("yaml", yaml.dump, yaml.load)
test_serialization("cjson", cjson.encode, cjson.decode)
test_serialization("ujson", ujson.encode, ujson.decode)

The result of running this script on my laptop (Intel Core i7 2670QM) is the following:

umsgpack
  Encoding...
    time = 0.390241 seconds
    size = 341 kilobytes
  Decoding...
    time = 0.430256 seconds
    same = True
yaml
  Encoding...
    time = 8.266586 seconds
    size = 527 kilobytes
  Decoding...
    time = 15.943908 seconds
    same = True
cjson
  Encoding...
    time = 0.030977 seconds
    size = 576 kilobytes
  Decoding...
    time = 0.022119 seconds
    same = True
ujson
  Encoding...
    time = 0.013703 seconds
    size = 478 kilobytes
  Decoding...
    time = 0.018000 seconds
    same = True

For my particular application, speed is more important than size of the serialized data. The clear winner for speed is ujson. For size, msgpack is slightly better than ujson which makes sense since it is a binary format.

Overall, I am very impressed by the performance of ujson. Given the ubiquity of JSON for web-based data, it makes sense that ultra optimized libraries would exist for it. While I love YAML as a data format, the performance of the PyYAML library is not suitable for applications requiring fast encoding/decoding times.

Hi, I am Jeff Rimko!
A computer engineer and software developer in the greater Pittsburgh, Pennsylvania area.