Schema Design¶

by Kevin Hanson, Solutions Architect, 10gen

Parallels¶

Tables == Collections Row == Document Column == Field Index == Index Join == Embedding & Linking Schema Object == None

Do we link or do we embed?

Blog posts and comments

Faster
But large embeds can make the master document slow. Ex: If a post has a billion comments

Comment gets its own copy of the master blog post

Caching via memchached, redis, etc are functionally denormalized instances of data sets.
NoSQL means you cut out the middleman

More thoughts on denormalized data

Pussing to an array infinitely
- Document will grow larger than Pre-allocated size
- Document may increase max doc size of 16MB

Logic idea:

first 200 comments are insert into the blog document
After that have a linked comment document

Bad shard key:

Sharding on "date" field and constantly inserting most recent data...

Good example:

sharding blog posts on "author"

Note

TODO find out why the Good example is actually good