How Reddit Achieved 100k Reads/Sec for Media Metadata

How Reddit Achieved 100k Reads/Sec for Media Metadata

Lessons from Reddit's System Design to Ace Your Next Interview

·

3 min read

Reddit used this database to handle ~100,000 Reads/sec with ~17ms (p99)👇

Use the proven techniques in your next System Design Interview 🔥

Background

Media metadata is classified as

  1. Data on the post model e.g. video thumbnails, playback URLs, bitrates etc

  2. Metadata associated with the lifecycle of the media asset, e.g. processing state, encoding information, S3 file location etc.

Challenges

  1. Metadata distributed across multiple systems, resulting in inconsistent storage formats and varying query patterns.

  2. Lack of proper mechanisms for auditing changes, analyzing content and categorizing metadata.

Functional Requirements

  1. Move all existing media metadata into a Unified Storage.

  2. Handle >100,000 read requests per second with a very low latency

  3. Data creation and updates have lower traffic compared to Reads, slightly higher latency is tolerable

Solution

Reddit opted for AWS Aurora Postgres over Cassandra due to the challenges associated with ad-hoc queries for debugging in Cassandra, and the potential risk of some data not being denormalized and unsearchable.

1. Abstraction

Reads and writes to database are abstracted behind APIs.

2. Data Migration

Migration Process

  1. Enabling dual writes into our metadata APIs from clients of media metadata.

  2. Backfill data from older databases to our metadata store

  3. Enable dual reads on media metadata from our service clients

  4. Monitor data comparisons for each read and fix data gaps

  5. Slowly ramp up the read traffic to our database to make sure it can scale

Scenarios leading to data differences

  1. Data transformation bugs in the service layer.

  2. Writes into the new metadata store failed, while writes into the source database succeed

  3. Race condition when data from the backfill process in step 2 overwrites newer data from service writes in step

How to fix Data Differences

  • Kafka consumer was set up to listen to a stream of data change events from the source database.

  • The consumer then performs data validation with the media metadata store.

  • If any data inconsistencies are detected, the consumer reports the differences to another data table in the database.

  • This allows engineers to query and analyze the data issues.

Reddit Kafka solution to fix Data Differences

3. Scaling Strategies

Media metadata was optimised for reads.

At 100k requests per second, read latency of 2.6 ms at p50, 4.7 ms at p90, and 17 ms at p99.

which is 50% faster than our previous data system

3a Table Partitioning

Size of media metadata will reach roughly 50 TB by the year 2030.

To address this they implemented table partitioning in Postgres using a partition management extension called pg_partman

And choose range-based partitioning (post_id as partition key) instead of hash-based partitioning.

As post_id increases monotonically with time, range-based partitioning allows to partition the table by distinct time periods.

This approach offers two important advantages

  1. Most read operations are for recently created posts, allowing the Postgres engine to cache the indexes of the most recent partitions, minimizing disk I/O. Hot working set remains in memory, enhancing query performance.

  2. Many batch read requests were on multiple post IDs from the same time period. Hence, likely the data is retrieved from a single partition.

3b. JSONB

Storing the media metadata as serialized JSONB format for Reads, effectively transformed the table into a NoSQL-like key-value pair.

Benefits

  • It efficiently fetches all the fields together using a single key

  • It eliminates the need for joins, simplifying the querying logic

Which other Database you will choose to optimise for Reads?


Understanding the Foundational Principles is the key to effective learning!

Follow along to Improve System Design Skills.

Reference and Image Credits

Â