Databases

Database Scaling

3 min read
Focus: DATABASES

TL;DR — Quick Summary

  • Vertical scaling = bigger hardware (limited, expensive). Horizontal scaling = more servers (complex, unlimited).
  • Read replicas distribute read traffic — primary handles writes, replicas handle reads. Most apps benefit from this first.
  • Sharding splits data across servers — powerful but eliminates cross-shard JOINs and ACID transactions.
  • Scaling order: indexes → caching → read replicas → vertical scale → partitioning → sharding.

Lesson Overview

📈 When Your Database Becomes the Bottleneck

A single well-tuned PostgreSQL server can handle millions of queries per day. But as your application grows, you'll hit limits — too many concurrent connections, too much data for one disk, or write throughput that exceeds what a single server can handle.

Database scaling is about extending those limits. There are two fundamental directions:

  • Vertical Scaling (Scale Up): Bigger hardware — more CPU, more RAM, faster SSD. Simplest approach. Has a hard ceiling and gets expensive fast.
  • Horizontal Scaling (Scale Out): More servers — distribute load across multiple machines. Far more scalable, but introduces distributed system complexity.

🔄 Read Replicas — The First Step

For most applications, reads vastly outnumber writes (10:1, 100:1 ratios are common). Read replicas solve this by maintaining copies of the primary database that handle all read traffic. The primary handles only writes, streaming changes to replicas via the replication log.

🗂️ Sharding — Horizontal Write Scaling

Sharding partitions data across multiple independent database servers — each server (shard) owns a subset of the data. User IDs 1-1,000,000 go to Shard 1; 1,000,001-2,000,000 go to Shard 2, etc. Each shard is a full, independent database.

Sharding is powerful but introduces significant complexity: cross-shard JOINs are impossible, global transactions span multiple databases, and rebalancing shards when they get uneven is painful.

⚡ Caching — Keep Hot Data in Memory

Before scaling your database, add a cache layer. Redis or Memcached can handle millions of reads per second from memory. If 90% of traffic is reading the same product pages or user profiles, caching eliminates 90% of database queries entirely.

Scaling sequence: 1) Optimize queries + indexes. 2) Add caching. 3) Add read replicas. 4) Vertical scale. 5) Consider sharding. Each step is an order of magnitude more complex than the last — don't skip ahead.

Conceptual Deep Dive

Think of a read replica like a library with multiple copies of popular books. If thousands of people want to read 'Harry Potter,' instead of one person accessing the original at a time, you print 10 copies. Each replica is a copy of the primary — reads are distributed across all copies, dramatically increasing read throughput.

Sharding is like physically dividing the library: Location A has books A-M, Location B has books N-Z. Twice the storage, twice the throughput for writers. But if you want to search across ALL books, you need to query both locations and merge the results — complexity grows fast.

Architecture & Data Flow

Production Scaling Architecture
Rendering diagram…
Scaling Decision Tree
Rendering diagram…

Implementation Lab

Read Replica Routing in Application Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
const mysql = require('mysql2/promise');
 
// Primary: ALL writes go hereALL writes go here
const primaryPool = mysql.createPool({
  host: process.env.DB_PRIMARY_HOST,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  database: 'myapp',
  connectionLimit: 10
});
 
// Read replicas: ALL reads go here (load balanced via round-robin DNS or app logic)ALL reads go here (load balanced via round-robin DNS or app logic)
const replicaPool = mysql.createPool({
  host: process.env.DB_REPLICA_HOST,  // can be a DNS round-robin to multiple replicas
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  database: 'myapp',
  connectionLimit: 50  // replicas can handle more concurrent reads
});
 
// Route queries based on operation type
async function getUser(id) {
  const [rows] = await replicaPool.execute('SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [id]);
  return rows[0];
}
 
async function createUser(data) {
  // Write ALWAYS goes to primaryALWAYS goes to primary
  const [result] = await primaryPool.execute(
    'INSERT INTO users (name, email) VALUES (?, ?)'INTO users (name, email) VALUES (?, ?)',
    [data.name, data.email]
  );
  return result.insertId;
}
 
// IMPORTANT: Read-after-write consistency problem!Read-after-write consistency problem!
// After createUser(), immediately calling getUser() via replica(), immediately calling getUser() via replica
// might return 'not found' due to replication lag!'not found' due to replication lag!
// Solution: read from PRIMARY immediately after a write:PRIMARY immediately after a write:
async function createAndFetchUser(data) {
  const id = await createUser(data);
  // Read from PRIMARY to avoid replication lagPRIMARY to avoid replication lag
  const [rows] = await primaryPool.execute('SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [id]);
  return rows[0];
}
Connection Pooling — Efficient Connection Management
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
const { Pool } = require('pg');  // PostgreSQL
 
// Without pooling: each request opens/closes a DB connectionDB connection
// Opening a connection takes 50-200ms — terrible for high traffic50-200ms — terrible for high traffic
 
// WITH pooling: reuse a pool of pre-established connections
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,                // max connections in pool
  idleTimeoutMillis: 30000,   // close idle connections after 30s
  connectionTimeoutMillis: 2000,  // timeout if no connection available in 2s
});
 
// Connections are automatically returned to pool after each query
async function getProducts(category) {
  // 'pool.query' automatically acquires and releases a connection
  const result = await pool.query(
    'SELECT * FROM products WHERE category = $1 LIMIT 50'FROM products WHERE category = $1 LIMIT 50',
    [category]
  );
  return result.rows;
}
 
// Monitor pool health
pool.on('error', (err) => {
  console.error('Unexpected error on idle client', err);
});
 
console.log(`Pool: ${pool.totalCount} total, ${pool.idleCount} idle, ${pool.waitingCount} waiting`{pool.totalCount} total, ${pool.idleCount} idle, ${pool.waitingCount} waiting`);
Database-Level Partitioning (PostgreSQL)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Table partitioning: split one large table into smaller physical partitions
-- but query them as one table
 
-- Create partitioned parent table (orders, partitioned by year)
CREATE TABLE orders (
  id          BIGINT,
  user_id     INT NOT NULL,
  total       DECIMAL(10, 2),
  created_at  TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
 
-- Create individual partitions
CREATE TABLE orders_2023
  PARTITION OF orders
  FOR VALUES FROM ('2023-01-01'01-01') TO ('2024-01-01'01-01');
 
CREATE TABLE orders_2024
  PARTITION OF orders
  FOR VALUES FROM ('2024-01-01'01-01') TO ('2025-01-01'01-01');
 
-- PostgreSQL automatically routes queries to the right partition
-- Query only hits orders_2024 partition (fast!)
SELECT * FROM orders
WHERE created_at >= '2024-06-01'06-01' AND created_at < '2024-07-01'07-01';
 
-- Drop old data instantly (vs slow DELETE)
-- Dropping a partition is instantaneous, even for 100M rows
DROP TABLE orders_2022;  -- instantly removes all 2022 orders

Best Practices — Interactive Comparison

Use a connection pool — opening a new DB connection per request is catastrophic at scale

javascript
1
2
3
4
5
6
7
8
9
10
11
12
// Creates a brand new DB connection on EVERY requestnew DB connection on EVERY request
// At 500 req/sec = 500 simultaneous connections500 req/sec = 500 simultaneous connections
// DB connection limit hit = errors for all usersfor all users
app.get('/users/:id', async (req, res) => {
  const conn = await mysql.createConnection(dbConfig);
  // Each connection takes 50-200ms to open50-200ms to open
  const [rows] = await conn.execute(
    'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
  );
  await conn.end(); // connection destroyed immediately
  res.json(rows[0]);
});

Pro Tips — Senior Dev Insights

1

Use PgBouncer (PostgreSQL) or ProxySQL (MySQL) as a connection pooling proxy — they can multiplex thousands of app connections onto just 20-50 actual database connections.

2

pg_partman is a PostgreSQL extension that automatically creates and maintains date-range partitions — perfect for logs, events, and time-series data.

3

For read-heavy workloads, consider materialized views — precomputed query results stored as a table, refreshed on a schedule. A 5-second complex report query becomes a 0.001s table scan.

4

AWS Aurora, Google Cloud Spanner, CockroachDB are 'NewSQL' databases — relational + ACID, but designed for horizontal scaling from the ground up. Worth considering before building custom sharding.

⚖️ Scaling Strategies Compared

FeatureCachingRead ReplicasVertical ScalePartitioningSharding
Improves read performance
Improves write performance
ComplexityLowLowNoneMediumVery High
Maintains cross-data JOINs
Maintains ACID transactions
CostLowMediumHighLowVery High

Common Developer Pitfalls

!

Premature sharding — the vast majority of applications never need it. Optimizing queries + caching + read replicas handles 99% of scaling needs.

!

Not using connection pooling — without it, high concurrency exhausts database connections and degrades performance.

!

Ignoring replication lag — reading from replicas immediately after writes causes read-after-write inconsistency that confuses users.

!

Picking a bad shard key — using a low-cardinality key (like status or country) creates 'hot shards' where all traffic goes to one shard.

Interview Mastery

Database replication maintains one or more replica copies of the primary database on separate servers. The primary streams all changes (via binlog in MySQL, WAL in PostgreSQL) to replicas, which apply them asynchronously. It solves: (1) Read scalability — distribute read queries across multiple servers instead of hammering one, (2) High availability — if the primary fails, a replica can be promoted to primary in minutes (failover), (3) Geographic distribution — put replicas close to users in different regions to reduce latency.

Sharding partitions data across multiple independent database servers by a shard key (e.g., user_id % number_of_shards). It enables horizontal write scaling beyond what one server can handle. Drawbacks: (1) Cross-shard JOINs are impossible — data in different shards can't be joined with SQL, (2) No cross-shard transactions — ACID guarantees only work within one shard, (3) Uneven distribution — some shards ('hot shards') can become overloaded if the shard key isn't chosen carefully, (4) Schema changes must be applied to every shard, (5) Application-level routing logic adds complexity.

Opening a database connection is expensive — 50-200ms for authentication, TLS negotiation, and session setup. Without pooling, a server handling 1,000 concurrent requests opens 1,000 connections, which can exceed the database's maximum connection limit and exhaust memory. Connection pooling maintains a pre-established pool of connections that are reused across requests. When a query completes, the connection returns to the pool for the next request. Tools like PgBouncer (PostgreSQL) and ProxySQL (MySQL) act as pooling proxies, multiplexing thousands of application connections onto dozens of actual database connections.

Real-World Blueprint

"Shopify handles billions of dollars in sales during Black Friday. Their database scaling strategy uses a tiered approach: (1) Redis caches product pages, cart data, and session data — absorbing 95% of traffic. (2) Read replicas handle catalog browse queries. (3) The primary database handles only order writes. (4) Separate database shards per merchant for large enterprise accounts. They've spoken publicly about this 'pods' architecture — each merchant's data is isolated in its own shard, preventing one large merchant from affecting others' performance."

Hands-on Lab Exercises

1

Set up a PostgreSQL primary + replica using Docker Compose. Write a script that routes all writes to primary and reads to replica.

2

Implement Redis caching for a database query: measure response time with cold cache, warm cache, and after invalidation.

3

Create a partitioned PostgreSQL table by date range. Insert 1 million rows, then run EXPLAIN ANALYZE to see partition pruning in action.

4

Simulate replication lag: on a replica, run a slow query and watch the replication lag metric increase in pg_stat_replication.

Real-World Practice Scenarios

Your app's users table has 50M rows and profile page loads take 3 seconds. Walk through the scaling sequence you'd apply, from simplest to most complex.

After adding read replicas, users report seeing their profile update on the edit page but then seeing old data when navigating away. What's happening and how do you fix it?

You're designing a multi-tenant SaaS with 10,000 business customers. Some large customers have millions of records. How do you design the data isolation and scaling strategy?

Your Black Friday traffic is 50x normal load, but only for 6 hours. You can't shard just for that — what scaling strategies work for predictable traffic spikes?