Reddit's Scalability

An interesting scalability presentation from one of the founders of reddit [a text based summary of the presentation]. One of the interesting parts is that the restrictions of a relational database impeded the user experience and the speed or ability to make changes or updates to the website.

They keep a Thing Table and a Data Table. Everything in Reddit is a Thing: users, links, comments, subreddits, awards, etc. Things keep common attribute like up/down votes, a type, and creation date.

The Data table has three columns: thing id, key, value. There's a row for every attribute. There's a row for title, url, author, spam votes, etc. When they add new features they didn't have to worry about the database anymore. They didn't have to add new tables for new things or worry about upgrades. Easier for development, deployment, maintenance.

The price is you can't use cool relational features. There are no joins in the database and you must manually enforce consistency. No joins means it's really easy to distribute data to different machines. You don't have to worry about foreign keys are doing joins or how to split the data up. Worked out really well. Worries of using a relational database are a thing of the past.

I haven't worked on a site that has to scale that level. At my previous employer it was not unusual for the javascript components I wrote to have to deal with two million ajax calls a day which was humbling to see. Get that wrong and it is an immediate bad user experience.

With my current employer we have the challenges of SOA and scaling across disparate applications that have their own needs for data consistency. Doing that in a transparent and consistent manner is the difficult part.
cam 2010-05-18 18:57:45.0