Mastering Database Scalability: Figma's Approach to Vertical Partitioning Database Explained

Mastering Database Scalability: Figma's Approach to Vertical Partitioning Database Explained

A Deep Dive into How Figma Strategically Vertically Partitioned Their Database to Improve Performance

·

5 min read

Figma used this strategy to Vertically Partition their Database

Learn the tradeoffs for your next System Design Interview 👇

Definition

Vertical partitioning is when a database is split by moving tables and/or columns into a separate database

Background

Initially they were on a single large Amazon RDS database and observed upwards of 65% CPU utilization during peak traffic with Database latencies becoming increasingly unpredictable

First Tactical Fixes

  1. Upgrade database to the largest instance available (from r5.12xlarge to r5.24xlarge) to maximize CPU utilization runway

  2. Create multiple read replicas to scale read traffic

  3. Establish new databases for new use cases to limit growth on original database

  4. Add PgBouncer as a connection pooler to limit the impact of a growing number of connections

on further Analysis they learned

  • Writes contributed a significant portion of database utilization.

  • Not all reads could be moved to replicas due to application sensitivity to replication lag.

Hence, they still needed to offload more work from our original database.

Possible Solutions

1. Horizontal Scaling the Database

They observed many popular managed solutions are not natively compatible with Postgres.

So, they would either have to find a Postgres-compatible managed solution, or self-host.

2. Migrating to NoSQL

It would require

  • Complex double read and write migration

  • Significant application side changes

3. Postgres-compatible NewSQL

They would’ve had one of the largest single-cluster footprints for cloud-managed distributed Postgres.

So they didn’t want to bear the burden of being the first customer to hit certain scaling issues while having little control over managed solutions.

4. Self Hosting

It would mean

  • Upfront work to acquire the training, knowledge, and skills to support self-hosting.

  • Large operational cost

5. Vertically Partition the DB

They would move groups of tables onto their own databases.

This proved to have both short- and long-term benefits

  1. Vertical partitioning relieves the original database

  2. Provide a path forward for horizontally sharding subsets of tables in the future

How they implemented Vertical Partitioning

They first need to identify tables to partition into their own database. They chose the tables via following factors

  1. Impact: Moving the tables should move a significant portion of workload

  2. Isolation: The tables should not be strongly connected to other tables

1. Measuring impact

  • They looked at Average Active Sessions (AAS) for queries (average number of active threads dedicated to a given query at a certain point in time).

  • They calculated this information by querying pg_stat_activity in 10 millisecond intervals to identify CPU waits associated with a query, and then aggregated the information by table name.

2. Measuring isolation

  • They have Ruby based backend which use ActiveRecord to write queries

  • They created Validators that hooked into ActiveRecord and send production query and transaction information into Snowflake to look for queries and transactions that consistently referenced the same group of tables

  • If these workloads turned out to be costly, those tables would be identified as prime candidates for vertical partitioning

Table Migration

The data movement has to be coordinated across thousands of application backend instances, so they could route queries to the new database at the correct moment.

They wanted a solution that met the following goals:

  1. Limit potential availability impact to <1 minute

  2. Automate the procedure so it is easily repeatable

  3. Have the ability to undo a recent partition

Solution

At a high level, they implemented the following

  1. Prepare client applications to query from multiple database partitions

  2. Replicate tables from original database to a new database until replication lag is near 0

  3. Pause activity on original database

  4. Wait for databases to synchronize

  5. Reroute query traffic to the new database

  6. Resume activity

1. Preparing client application

They leveraged PgBouncer and created a separate PgBouncer services to virtually split traffic.

Partitioning the PgBouncer layer first would give clients leeway to route queries incorrectly.

They would be able to detect the routing mismatch, but since both PgBouncers have the same target database, the client would still successfully query data.


Start state looks as follows

State of the database after partitioning PgBouncer

State of the database after partitioning data

Once they verified that applications are prepared with separate connections for each PgBouncer (and sending traffic appropriately), they’d proceed.

2. Data Replication

There are two ways to replicate data: Streaming Replication or Logical Replication.

They chose logical replication because it allows them to

  1. Port over a subset of tables, so they can start with a much smaller storage footprint in the destination database

  2. Replicate to a database, which is running a different Postgres major version, meaning we can perform minimal-downtime major version upgrades with this tooling.

  3. Set up reverse replication, which allows us to roll back the operation.

Working with terabytes of production data, Logical Replication was quite slow because it copies rows in bulk but inefficiently updates indexes one row at a time.

So they removed indexes in the destination database and rebuild the indexes after the initial copying reducing the copy time to a matter of hours.

Modifications to the new database were also replicated back to the old database (via reverse replication stream), for the case they rolled back.

3. Coordinating Query Rerouting

They performed sharding operation in two phases (partitioning PgBouncers, then data)

Overview of the operation:

  • They stoped all relevant database traffic briefly in order for logical replication to synchronize the new database

  • PgBouncer paused new connections and revoked clients’ query privileges on the partitioned tables in the original database.

  • After a brief grace period, they canceled any remaining in flight queries.

  • With traffic paused, they used LSNs (Log Sequence Number) to determine if two databases were synchronized. They sample an LSN from original database and waited for the replica to replay past this LSN. At this point, the data is identical in both the original and the replica.

A visualization of our synchronization mechanism


Post checking the replica is synchronized, they stoped replication and promote the replica to a new database.

Reverse replication is set up as previously mentioned. Then, they resume traffic in PgBouncer, but now the queries are routed to the new database.

A summary of the procedure

Conclusion

They observed a ~30 second period of partial availability impact (~2% of requests dropped).

Each database partition is operating with greatly increased headroom. Their largest partition has CPU utilization hovering ~10%


Understanding the Foundational Principles is the key to Effective Learning!

Follow along to Improve System Design Skills.


Reference and Image Credit

Â