How Reddit Achieved 100k Reads/Sec for Media Metadata
Lessons from Reddit's System Design to Ace Your Next Interview
Reddit used this database to handle ~100,000 Reads/sec with ~17ms (p99)👇
Use the proven techniques in your next System Design Interview 🔥
Background
Media metadata is classified as
Data on the post model e.g. video thumbnails, playback URLs, bitrates etc
Metadata associated with the lifecycle of the media asset, e.g. processing state, encoding information, S3 file location etc.
Challenges
Metadata distributed across multiple systems, resulting in inconsistent storage formats and varying query patterns.
Lack of proper mechanisms for auditing changes, analyzing content and categorizing metadata.
Functional Requirements
Move all existing media metadata into a Unified Storage.
Handle >100,000 read requests per second with a very low latency
Data creation and updates have lower traffic compared to Reads, slightly higher latency is tolerable
Solution
Reddit opted for AWS Aurora Postgres over Cassandra due to the challenges associated with ad-hoc queries for debugging in Cassandra, and the potential risk of some data not being denormalized and unsearchable.
1. Abstraction
Reads and writes to database are abstracted behind APIs.
2. Data Migration
Migration Process
Enabling dual writes into our metadata APIs from clients of media metadata.
Backfill data from older databases to our metadata store
Enable dual reads on media metadata from our service clients
Monitor data comparisons for each read and fix data gaps
Slowly ramp up the read traffic to our database to make sure it can scale
Scenarios leading to data differences
Data transformation bugs in the service layer.
Writes into the new metadata store failed, while writes into the source database succeed
Race condition when data from the backfill process in step 2 overwrites newer data from service writes in step
How to fix Data Differences
Kafka consumer was set up to listen to a stream of data change events from the source database.
The consumer then performs data validation with the media metadata store.
If any data inconsistencies are detected, the consumer reports the differences to another data table in the database.
This allows engineers to query and analyze the data issues.
3. Scaling Strategies
Media metadata was optimised for reads.
At 100k requests per second, read latency of 2.6 ms at p50, 4.7 ms at p90, and 17 ms at p99.
which is 50% faster than our previous data system
3a Table Partitioning
Size of media metadata will reach roughly 50 TB by the year 2030.
To address this they implemented table partitioning in Postgres using a partition management extension called pg_partman
And choose range-based partitioning (post_id as partition key) instead of hash-based partitioning.
As post_id increases monotonically with time, range-based partitioning allows to partition the table by distinct time periods.
This approach offers two important advantages
Most read operations are for recently created posts, allowing the Postgres engine to cache the indexes of the most recent partitions, minimizing disk I/O. Hot working set remains in memory, enhancing query performance.
Many batch read requests were on multiple post IDs from the same time period. Hence, likely the data is retrieved from a single partition.
3b. JSONB
Storing the media metadata as serialized JSONB format for Reads, effectively transformed the table into a NoSQL-like key-value pair.
Benefits
It efficiently fetches all the fields together using a single key
It eliminates the need for joins, simplifying the querying logic
Which other Database you will choose to optimise for Reads?
Understanding the Foundational Principles is the key to effective learning!
Follow along to Improve System Design Skills.