Highly scalable services in Python

Original title: Wysoko skalowalne serwisy pythonowe w OnetPoczcie

  • by Igor Waligóra

    • From Warsaw
    • Works on high integrity systems at Onet
  • About Onet

    • Over 70 developers use Python full time
    • https://twitter.com/onetpl
    • news portal and email provider
    • One of the oldest sites in Poland

Note

I’m translating this entire talk from Polish. I don’t speak Polish except how to say, “Thanks” and “Questions?”.

Talk Description (In Polish)

W prezentacji pokażemy nasz model rozproszenia usług pocztowych, jak zrobiliśmy to przy użyciu Pythona. Pokażemy metody realizacji stabilnych i skalowalnych systemów utylizujących niskopoziomowe biblioteki oraz model ich zwielokrotnienia i integracji. Uchylimy rąbek tajemnicy działania jednego z największych systemów pocztowych w polskim Internecie.

System Architecture

Some sort of email system

  • 200 servers
  • 20 database servers
  • 1 Petabyte of data
  • 4.7 million users
  • 80 thousand requests a minute
  • Spam

Server types

  • SMTP

    • Postfix > MDA > Storage
  • POP3 / IMAP

    • Dovecot > storage
  • Webmail

    • PHP / JavaScript > Python Server > Storage

Persistence Server

  • check RFC 822

Their Python server

  • Tornado
  • JSON-RPC

libOP - API

  • libAUTH
  • libDB
  • libOCACHE
  • libANTYSPAM
  • libStorage

Making Python faster

Write in C, make bindings in Cython, import into Python

// op.c
int mparser_fetch(const struct mparser_server *mparser, etc){
    [...]
}

Workflow:

  • Take C code that does what they need.

  • Implement as Cython

  • Call the Cython modules from Python

  • Put all the dependencies for the C library, Cython components, and Python into setup.py files so they can easily deploy

System Summary

  • C
  • Cython
  • Tornado
  • Python

Benefits

  • Challenging but successful implementation

  • Good performance

  • optimized to handle any load

    • 20x speed over standard Python

Critical tools

  • PyPI
  • Virtualenv
$ dpkg -i libop_1.1.0_amd64.deb
$ mkvirtualenv mparser
(mparser) $ source mparser/bin/activate
$ pip install -r requirements.txt

Results

  • really good performance
  • 99.8% uptime
  • Able to handle 500 thousand spam hits a minute

Summary

  • Build good systems

  • C libraries are the way to go

  • Use Python to build your stuff, but leverage in the C libraries

  • Processes

    • Scrum
    • DevOps