Roll your own persistence in python¶

http://blog.codejams.net/pycon/

Performance boost over Sqlite + SQLAlchemy

Example: Invoice¶

Invoices have lineitems

lineitems may be split into accounting codes

invoice has payments

Must record amount paid on each accounting code for AR

Invoice/lineitem

simple JSON creation

Implementation¶

Choose a way to persist string keys and string values

Choose a serialization format

Add features as ‘middleware’

add querying by wrapping our db in a MapReduce implementation

Serve our database over HTTP using WSGI

Storage: Roll your own¶

class DictMixin(object):
    def keys(self): pass
    def __getitem__(self,key): pass
    def __setitem__(self,key,value): pass
    def __delitem__(self,key)__: pass

Other ideas::

Use gdata, amazon, or other on-line db

Ways¶

dumbdb:

written in Python, a flat file

fallback option

dbm:

Unix only

gbdm:

Non-standard format

dbhash:

BSD DB library

Shove:

Recommends shove which lets you use various Amazon, gdata and other handy ways.

Found on cheese shop

Tradeoffs¶

Scaling

Editing of data

Serialization¶

cPickle

marshal and repr/eval is not safe

JSON - yay!

YAML

XML

Serialization tradeoffs¶

interoptability

speed (cPickle, JSON are fast)

security

Adding features¶

class UserDict(object):
    pass
class JSONSerializer(UserDict):
    pass

Use formEncode to validate a document has the correct schema

Data replication & versioning¶

class VersionDict(UserDict):
    def __getitem__(self,key): do magic code
    def __setitem__(self,key): do magic code

Map-Reduce Querying¶

use Google’s tool to handle storage and data optimization

The server¶

class HTTPDict(UserDict): lots to do (disk space, memory usage, parallelizing, concurrency, locking, transactions, many others)

Summary¶

Worse is faster and in some ways better

UserDict amd DictMixin are fascinating

Document oriented databases are fascinating