June 6, 2011 Knowing Things¶
By Jeff Schenk
Note
Way too much heckling and too many comments from one person who kept taking him off topic so I asked Jeff to not take so many questions. That did not go over well with the hecklers, but actually asking on the mailing list uncovered a serious problem in the user group. Now things are better.
Ad.ly’s story¶
- Kinda knowing things is easy
- Really knowing with certainty a lot of complex things is maybe harder
- Need to know things with precision is really important
Thing Types¶
- Integers (how many people clicked)
- Strings
- Booleans
- Bling (money)
- Time
Bling¶
Do you want to misplace money and get fired? No? Use Decimal:
from decimal import Decimal
moneys = Decimal('100.01')
Decimal oddities in rounding so use quantize:
>>> moneys / 2
Decimal('50.005') # Copied something wrong here..
# so use quantize
>>> (moneys / 2).quantize('001')
Decimal('50.005') # ... or maybe here cause these numbers are the same
Time¶
- Timezones suck
- Computers like integers
- So they use hours since epoch
- all the work is done in about 10 lines of Python code.
Storage¶
SQL
- Joins are death
- If you join, you will die
- intelligent index are super
- if you’re going to group bu it or filter on it, you probably want it indexed.
Pre-Aggregate
- When you’re working with a lot of data, you need to aggregate chunks as you go.
- My Guess: A lot of Celery tasks!
- Spooned into a single report table that breaks normalization
heapq¶
Algorithms makes merging of iterables really powerful:
import heapq
for result in heapq.merge(query1, query2):
# merge results and know they are in order
print(result)
Caching is key!¶
- They need flexibility to slice and dice the data
- Once its been sliced, they want to be able to view, page, and sort the data
- Redis gives the speed of cache with the power to sort and page
- They use redis-py as their library
Questions¶
- Test coverage?