Healthy Webapps Through Continuous Introspection

by Erik van Zijst

Case study: Wasted cycles on Bitbucket

=> SSHD => conq (Python) => git/hg

  • conq is our custom SSH shell
  • conq imports Django ORM and Bitbucket code
  • takes ~1.41 seconds to start (spawns ~50/second)

Solution after analysis: Stop the imports and just write native SQL

  • 16 times faster to start up (0.09s vs 1.41s)
  • 60% load decrease on all web servers!

Lessons learned

  • Test your stuff
  • Monitor your servers

Common problems

Slowness in Web Apps

  • Slow SQL queries (or too many!)

  • lock contention

    • between threads
    • database table/row locks
    • fine locks (git/hg)
  • excessive IO (disk/network)

  • evil regex: r'^(a+)+$'


  • 503 - worker pools full
  • 500 if requests time out (Gunicorn SIGKILL)

The latter is best avoided as it destroys forensic evidence and leaves stale state (e.g. lock files)



Designed to profile your production environment without impacting performance


Designed to let you catch and then bubble up a system locking issue

import re
from interruptingcow import timeout

    with timeout(20.0, RuntimeError):
        #evil regix
        re.match(r'^(a+)+$', 'aaaaaaaaaaaa')
except RuntimeError:
    print 'Interrupted'