Database Scaling
TL;DR — Quick Summary
- Vertical scaling = bigger hardware (limited, expensive). Horizontal scaling = more servers (complex, unlimited).
- Read replicas distribute read traffic — primary handles writes, replicas handle reads. Most apps benefit from this first.
- Sharding splits data across servers — powerful but eliminates cross-shard JOINs and ACID transactions.
- Scaling order: indexes → caching → read replicas → vertical scale → partitioning → sharding.
Lesson Overview
📈 When Your Database Becomes the Bottleneck
A single well-tuned PostgreSQL server can handle millions of queries per day. But as your application grows, you'll hit limits — too many concurrent connections, too much data for one disk, or write throughput that exceeds what a single server can handle.
Database scaling is about extending those limits. There are two fundamental directions:
- Vertical Scaling (Scale Up): Bigger hardware — more CPU, more RAM, faster SSD. Simplest approach. Has a hard ceiling and gets expensive fast.
- Horizontal Scaling (Scale Out): More servers — distribute load across multiple machines. Far more scalable, but introduces distributed system complexity.
🔄 Read Replicas — The First Step
For most applications, reads vastly outnumber writes (10:1, 100:1 ratios are common). Read replicas solve this by maintaining copies of the primary database that handle all read traffic. The primary handles only writes, streaming changes to replicas via the replication log.
🗂️ Sharding — Horizontal Write Scaling
Sharding partitions data across multiple independent database servers — each server (shard) owns a subset of the data. User IDs 1-1,000,000 go to Shard 1; 1,000,001-2,000,000 go to Shard 2, etc. Each shard is a full, independent database.
Sharding is powerful but introduces significant complexity: cross-shard JOINs are impossible, global transactions span multiple databases, and rebalancing shards when they get uneven is painful.
⚡ Caching — Keep Hot Data in Memory
Before scaling your database, add a cache layer. Redis or Memcached can handle millions of reads per second from memory. If 90% of traffic is reading the same product pages or user profiles, caching eliminates 90% of database queries entirely.
Scaling sequence: 1) Optimize queries + indexes. 2) Add caching. 3) Add read replicas. 4) Vertical scale. 5) Consider sharding. Each step is an order of magnitude more complex than the last — don't skip ahead.
Conceptual Deep Dive
Think of a read replica like a library with multiple copies of popular books. If thousands of people want to read 'Harry Potter,' instead of one person accessing the original at a time, you print 10 copies. Each replica is a copy of the primary — reads are distributed across all copies, dramatically increasing read throughput.
Sharding is like physically dividing the library: Location A has books A-M, Location B has books N-Z. Twice the storage, twice the throughput for writers. But if you want to search across ALL books, you need to query both locations and merge the results — complexity grows fast.
Architecture & Data Flow
Implementation Lab
const mysql = require('mysql2/promise');
// Primary: ALL writes go hereALL writes go here
const primaryPool = mysql.createPool({
host: process.env.DB_PRIMARY_HOST,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: 'myapp',
connectionLimit: 10
});
// Read replicas: ALL reads go here (load balanced via round-robin DNS or app logic)ALL reads go here (load balanced via round-robin DNS or app logic)
const replicaPool = mysql.createPool({
host: process.env.DB_REPLICA_HOST, // can be a DNS round-robin to multiple replicas
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: 'myapp',
connectionLimit: 50 // replicas can handle more concurrent reads
});
// Route queries based on operation type
async function getUser(id) {
const [rows] = await replicaPool.execute('SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [id]);
return rows[0];
}
async function createUser(data) {
// Write ALWAYS goes to primaryALWAYS goes to primary
const [result] = await primaryPool.execute(
'INSERT INTO users (name, email) VALUES (?, ?)'INTO users (name, email) VALUES (?, ?)',
[data.name, data.email]
);
return result.insertId;
}
// IMPORTANT: Read-after-write consistency problem!Read-after-write consistency problem!
// After createUser(), immediately calling getUser() via replica(), immediately calling getUser() via replica
// might return 'not found' due to replication lag!'not found' due to replication lag!
// Solution: read from PRIMARY immediately after a write:PRIMARY immediately after a write:
async function createAndFetchUser(data) {
const id = await createUser(data);
// Read from PRIMARY to avoid replication lagPRIMARY to avoid replication lag
const [rows] = await primaryPool.execute('SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [id]);
return rows[0];
}const { Pool } = require('pg'); // PostgreSQL
// Without pooling: each request opens/closes a DB connectionDB connection
// Opening a connection takes 50-200ms — terrible for high traffic50-200ms — terrible for high traffic
// WITH pooling: reuse a pool of pre-established connections
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // max connections in pool
idleTimeoutMillis: 30000, // close idle connections after 30s
connectionTimeoutMillis: 2000, // timeout if no connection available in 2s
});
// Connections are automatically returned to pool after each query
async function getProducts(category) {
// 'pool.query' automatically acquires and releases a connection
const result = await pool.query(
'SELECT * FROM products WHERE category = $1 LIMIT 50'FROM products WHERE category = $1 LIMIT 50',
[category]
);
return result.rows;
}
// Monitor pool health
pool.on('error', (err) => {
console.error('Unexpected error on idle client', err);
});
console.log(`Pool: ${pool.totalCount} total, ${pool.idleCount} idle, ${pool.waitingCount} waiting`{pool.totalCount} total, ${pool.idleCount} idle, ${pool.waitingCount} waiting`);-- Table partitioning: split one large table into smaller physical partitions
-- but query them as one table
-- Create partitioned parent table (orders, partitioned by year)
CREATE TABLE orders (
id BIGINT,
user_id INT NOT NULL,
total DECIMAL(10, 2),
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create individual partitions
CREATE TABLE orders_2023
PARTITION OF orders
FOR VALUES FROM ('2023-01-01'01-01') TO ('2024-01-01'01-01');
CREATE TABLE orders_2024
PARTITION OF orders
FOR VALUES FROM ('2024-01-01'01-01') TO ('2025-01-01'01-01');
-- PostgreSQL automatically routes queries to the right partition
-- Query only hits orders_2024 partition (fast!)
SELECT * FROM orders
WHERE created_at >= '2024-06-01'06-01' AND created_at < '2024-07-01'07-01';
-- Drop old data instantly (vs slow DELETE)
-- Dropping a partition is instantaneous, even for 100M rows
DROP TABLE orders_2022; -- instantly removes all 2022 ordersBest Practices — Interactive Comparison
Use a connection pool — opening a new DB connection per request is catastrophic at scale
// Creates a brand new DB connection on EVERY requestnew DB connection on EVERY request
// At 500 req/sec = 500 simultaneous connections500 req/sec = 500 simultaneous connections
// DB connection limit hit = errors for all usersfor all users
app.get('/users/:id', async (req, res) => {
const conn = await mysql.createConnection(dbConfig);
// Each connection takes 50-200ms to open50-200ms to open
const [rows] = await conn.execute(
'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
);
await conn.end(); // connection destroyed immediately
res.json(rows[0]);
});// Creates a brand new DB connection on EVERY requestnew DB connection on EVERY request
// At 500 req/sec = 500 simultaneous connections500 req/sec = 500 simultaneous connections
// DB connection limit hit = errors for all usersfor all users
app.get('/users/:id', async (req, res) => {
const conn = await mysql.createConnection(dbConfig);
// Each connection takes 50-200ms to open50-200ms to open
const [rows] = await conn.execute(
'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
);
await conn.end(); // connection destroyed immediately
res.json(rows[0]);
});// Pool created ONCE at startup — shared across all requestsONCE at startup — shared across all requests
const pool = mysql.createPool({
...dbConfig,
connectionLimit: 20, // max 20 actual DB connectionsDB connections
waitForConnections: true,
queueLimit: 100
});
// Each request borrows a connection and returns it
app.get('/users/:id', async (req, res) => {
// No connection overhead — connection already open
const [rows] = await pool.execute(
'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
);
// Connection auto-returned to pool after query
res.json(rows[0]);
});Pro Tips — Senior Dev Insights
Use PgBouncer (PostgreSQL) or ProxySQL (MySQL) as a connection pooling proxy — they can multiplex thousands of app connections onto just 20-50 actual database connections.
pg_partman is a PostgreSQL extension that automatically creates and maintains date-range partitions — perfect for logs, events, and time-series data.
For read-heavy workloads, consider materialized views — precomputed query results stored as a table, refreshed on a schedule. A 5-second complex report query becomes a 0.001s table scan.
AWS Aurora, Google Cloud Spanner, CockroachDB are 'NewSQL' databases — relational + ACID, but designed for horizontal scaling from the ground up. Worth considering before building custom sharding.
⚖️ Scaling Strategies Compared
| Feature | Caching | Read Replicas | Vertical Scale | Partitioning | Sharding |
|---|---|---|---|---|---|
| Improves read performance | |||||
| Improves write performance | |||||
| Complexity | Low | Low | None | Medium | Very High |
| Maintains cross-data JOINs | |||||
| Maintains ACID transactions | |||||
| Cost | Low | Medium | High | Low | Very High |
Common Developer Pitfalls
Premature sharding — the vast majority of applications never need it. Optimizing queries + caching + read replicas handles 99% of scaling needs.
Not using connection pooling — without it, high concurrency exhausts database connections and degrades performance.
Ignoring replication lag — reading from replicas immediately after writes causes read-after-write inconsistency that confuses users.
Picking a bad shard key — using a low-cardinality key (like status or country) creates 'hot shards' where all traffic goes to one shard.
Interview Mastery
Database replication maintains one or more replica copies of the primary database on separate servers. The primary streams all changes (via binlog in MySQL, WAL in PostgreSQL) to replicas, which apply them asynchronously. It solves: (1) Read scalability — distribute read queries across multiple servers instead of hammering one, (2) High availability — if the primary fails, a replica can be promoted to primary in minutes (failover), (3) Geographic distribution — put replicas close to users in different regions to reduce latency.
Sharding partitions data across multiple independent database servers by a shard key (e.g., user_id % number_of_shards). It enables horizontal write scaling beyond what one server can handle. Drawbacks: (1) Cross-shard JOINs are impossible — data in different shards can't be joined with SQL, (2) No cross-shard transactions — ACID guarantees only work within one shard, (3) Uneven distribution — some shards ('hot shards') can become overloaded if the shard key isn't chosen carefully, (4) Schema changes must be applied to every shard, (5) Application-level routing logic adds complexity.
Opening a database connection is expensive — 50-200ms for authentication, TLS negotiation, and session setup. Without pooling, a server handling 1,000 concurrent requests opens 1,000 connections, which can exceed the database's maximum connection limit and exhaust memory. Connection pooling maintains a pre-established pool of connections that are reused across requests. When a query completes, the connection returns to the pool for the next request. Tools like PgBouncer (PostgreSQL) and ProxySQL (MySQL) act as pooling proxies, multiplexing thousands of application connections onto dozens of actual database connections.
Real-World Blueprint
"Shopify handles billions of dollars in sales during Black Friday. Their database scaling strategy uses a tiered approach: (1) Redis caches product pages, cart data, and session data — absorbing 95% of traffic. (2) Read replicas handle catalog browse queries. (3) The primary database handles only order writes. (4) Separate database shards per merchant for large enterprise accounts. They've spoken publicly about this 'pods' architecture — each merchant's data is isolated in its own shard, preventing one large merchant from affecting others' performance."
Hands-on Lab Exercises
Set up a PostgreSQL primary + replica using Docker Compose. Write a script that routes all writes to primary and reads to replica.
Implement Redis caching for a database query: measure response time with cold cache, warm cache, and after invalidation.
Create a partitioned PostgreSQL table by date range. Insert 1 million rows, then run EXPLAIN ANALYZE to see partition pruning in action.
Simulate replication lag: on a replica, run a slow query and watch the replication lag metric increase in pg_stat_replication.
Real-World Practice Scenarios
Your app's users table has 50M rows and profile page loads take 3 seconds. Walk through the scaling sequence you'd apply, from simplest to most complex.
After adding read replicas, users report seeing their profile update on the edit page but then seeing old data when navigating away. What's happening and how do you fix it?
You're designing a multi-tenant SaaS with 10,000 business customers. Some large customers have millions of records. How do you design the data isolation and scaling strategy?
Your Black Friday traffic is 50x normal load, but only for 6 hours. You can't shard just for that — what scaling strategies work for predictable traffic spikes?
Database Scaling
TL;DR — Quick Summary
- Vertical scaling = bigger hardware (limited, expensive). Horizontal scaling = more servers (complex, unlimited).
- Read replicas distribute read traffic — primary handles writes, replicas handle reads. Most apps benefit from this first.
- Sharding splits data across servers — powerful but eliminates cross-shard JOINs and ACID transactions.
- Scaling order: indexes → caching → read replicas → vertical scale → partitioning → sharding.
Overview
📈 When Your Database Becomes the Bottleneck
A single well-tuned PostgreSQL server can handle millions of queries per day. But as your application grows, you'll hit limits — too many concurrent connections, too much data for one disk, or write throughput that exceeds what a single server can handle.
Database scaling is about extending those limits. There are two fundamental directions:
- Vertical Scaling (Scale Up): Bigger hardware — more CPU, more RAM, faster SSD. Simplest approach. Has a hard ceiling and gets expensive fast.
- Horizontal Scaling (Scale Out): More servers — distribute load across multiple machines. Far more scalable, but introduces distributed system complexity.
🔄 Read Replicas — The First Step
For most applications, reads vastly outnumber writes (10:1, 100:1 ratios are common). Read replicas solve this by maintaining copies of the primary database that handle all read traffic. The primary handles only writes, streaming changes to replicas via the replication log.
🗂️ Sharding — Horizontal Write Scaling
Sharding partitions data across multiple independent database servers — each server (shard) owns a subset of the data. User IDs 1-1,000,000 go to Shard 1; 1,000,001-2,000,000 go to Shard 2, etc. Each shard is a full, independent database.
Sharding is powerful but introduces significant complexity: cross-shard JOINs are impossible, global transactions span multiple databases, and rebalancing shards when they get uneven is painful.
⚡ Caching — Keep Hot Data in Memory
Before scaling your database, add a cache layer. Redis or Memcached can handle millions of reads per second from memory. If 90% of traffic is reading the same product pages or user profiles, caching eliminates 90% of database queries entirely.
Scaling sequence: 1) Optimize queries + indexes. 2) Add caching. 3) Add read replicas. 4) Vertical scale. 5) Consider sharding. Each step is an order of magnitude more complex than the last — don't skip ahead.
Architecture & Logic Flow
⚖️ Scaling Strategies Compared
| Feature | Caching | Read Replicas | Vertical Scale | Partitioning | Sharding |
|---|---|---|---|---|---|
| Improves read performance | |||||
| Improves write performance | |||||
| Complexity | Low | Low | None | Medium | Very High |
| Maintains cross-data JOINs | |||||
| Maintains ACID transactions | |||||
| Cost | Low | Medium | High | Low | Very High |
Deep Dive Analysis
Think of a read replica like a library with multiple copies of popular books. If thousands of people want to read 'Harry Potter,' instead of one person accessing the original at a time, you print 10 copies. Each replica is a copy of the primary — reads are distributed across all copies, dramatically increasing read throughput. Sharding is like physically dividing the library: Location A has books A-M, Location B has books N-Z. Twice the storage, twice the throughput for writers. But if you want to search across ALL books, you need to query both locations and merge the results — complexity grows fast.
Implementation Reference
const mysql = require('mysql2/promise');
// Primary: ALL writes go here
const primaryPool = mysql.createPool({
host: process.env.DB_PRIMARY_HOST,
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: 'myapp',
connectionLimit: 10
});
// Read replicas: ALL reads go here (load balanced via round-robin DNS or app logic)
const replicaPool = mysql.createPool({
host: process.env.DB_REPLICA_HOST, // can be a DNS round-robin to multiple replicas
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: 'myapp',
connectionLimit: 50 // replicas can handle more concurrent reads
});
// Route queries based on operation type
async function getUser(id) {
const [rows] = await replicaPool.execute('SELECT * FROM users WHERE id = ?', [id]);
return rows[0];
}
async function createUser(data) {
// Write ALWAYS goes to primary
const [result] = await primaryPool.execute(
'INSERT INTO users (name, email) VALUES (?, ?)',
[data.name, data.email]
);
return result.insertId;
}
// IMPORTANT: Read-after-write consistency problem!
// After createUser(), immediately calling getUser() via replica
// might return 'not found' due to replication lag!
// Solution: read from PRIMARY immediately after a write:
async function createAndFetchUser(data) {
const id = await createUser(data);
// Read from PRIMARY to avoid replication lag
const [rows] = await primaryPool.execute('SELECT * FROM users WHERE id = ?', [id]);
return rows[0];
}const { Pool } = require('pg'); // PostgreSQL
// Without pooling: each request opens/closes a DB connection
// Opening a connection takes 50-200ms — terrible for high traffic
// WITH pooling: reuse a pool of pre-established connections
const pool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20, // max connections in pool
idleTimeoutMillis: 30000, // close idle connections after 30s
connectionTimeoutMillis: 2000, // timeout if no connection available in 2s
});
// Connections are automatically returned to pool after each query
async function getProducts(category) {
// 'pool.query' automatically acquires and releases a connection
const result = await pool.query(
'SELECT * FROM products WHERE category = $1 LIMIT 50',
[category]
);
return result.rows;
}
// Monitor pool health
pool.on('error', (err) => {
console.error('Unexpected error on idle client', err);
});
console.log(`Pool: ${pool.totalCount} total, ${pool.idleCount} idle, ${pool.waitingCount} waiting`);-- Table partitioning: split one large table into smaller physical partitions
-- but query them as one table
-- Create partitioned parent table (orders, partitioned by year)
CREATE TABLE orders (
id BIGINT,
user_id INT NOT NULL,
total DECIMAL(10, 2),
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create individual partitions
CREATE TABLE orders_2023
PARTITION OF orders
FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');
CREATE TABLE orders_2024
PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
-- PostgreSQL automatically routes queries to the right partition
-- Query only hits orders_2024 partition (fast!)
SELECT * FROM orders
WHERE created_at >= '2024-06-01' AND created_at < '2024-07-01';
-- Drop old data instantly (vs slow DELETE)
-- Dropping a partition is instantaneous, even for 100M rows
DROP TABLE orders_2022; -- instantly removes all 2022 ordersComparative Best Practices
Use a connection pool — opening a new DB connection per request is catastrophic at scale
// Creates a brand new DB connection on EVERY requestnew DB connection on EVERY request
// At 500 req/sec = 500 simultaneous connections500 req/sec = 500 simultaneous connections
// DB connection limit hit = errors for all usersfor all users
app.get('/users/:id', async (req, res) => {
const conn = await mysql.createConnection(dbConfig);
// Each connection takes 50-200ms to open50-200ms to open
const [rows] = await conn.execute(
'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
);
await conn.end(); // connection destroyed immediately
res.json(rows[0]);
});// Creates a brand new DB connection on EVERY requestnew DB connection on EVERY request
// At 500 req/sec = 500 simultaneous connections500 req/sec = 500 simultaneous connections
// DB connection limit hit = errors for all usersfor all users
app.get('/users/:id', async (req, res) => {
const conn = await mysql.createConnection(dbConfig);
// Each connection takes 50-200ms to open50-200ms to open
const [rows] = await conn.execute(
'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
);
await conn.end(); // connection destroyed immediately
res.json(rows[0]);
});// Pool created ONCE at startup — shared across all requestsONCE at startup — shared across all requests
const pool = mysql.createPool({
...dbConfig,
connectionLimit: 20, // max 20 actual DB connectionsDB connections
waitForConnections: true,
queueLimit: 100
});
// Each request borrows a connection and returns it
app.get('/users/:id', async (req, res) => {
// No connection overhead — connection already open
const [rows] = await pool.execute(
'SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [req.params.id]
);
// Connection auto-returned to pool after query
res.json(rows[0]);
});Common Pitfalls
- •Premature sharding — the vast majority of applications never need it. Optimizing queries + caching + read replicas handles 99% of scaling needs.
- •Not using connection pooling — without it, high concurrency exhausts database connections and degrades performance.
- •Ignoring replication lag — reading from replicas immediately after writes causes read-after-write inconsistency that confuses users.
- •Picking a bad shard key — using a low-cardinality key (like status or country) creates 'hot shards' where all traffic goes to one shard.
Key Takeaways
Hands-on Practice
- ✓Set up a PostgreSQL primary + replica using Docker Compose. Write a script that routes all writes to primary and reads to replica.
- ✓Implement Redis caching for a database query: measure response time with cold cache, warm cache, and after invalidation.
- ✓Create a partitioned PostgreSQL table by date range. Insert 1 million rows, then run EXPLAIN ANALYZE to see partition pruning in action.
- ✓Simulate replication lag: on a replica, run a slow query and watch the replication lag metric increase in pg_stat_replication.
Expert Pro Tips
Interview Preparation
Q: What is database replication and what problem does it solve?
Master Answer:
Database replication maintains one or more <strong>replica copies</strong> of the primary database on separate servers. The primary streams all changes (via binlog in MySQL, WAL in PostgreSQL) to replicas, which apply them asynchronously. It solves: (1) <strong>Read scalability</strong> — distribute read queries across multiple servers instead of hammering one, (2) <strong>High availability</strong> — if the primary fails, a replica can be promoted to primary in minutes (failover), (3) <strong>Geographic distribution</strong> — put replicas close to users in different regions to reduce latency.
Q: What is database sharding and what are its drawbacks?
Master Answer:
Sharding partitions data across multiple independent database servers by a shard key (e.g., user_id % number_of_shards). It enables horizontal write scaling beyond what one server can handle. Drawbacks: (1) <strong>Cross-shard JOINs are impossible</strong> — data in different shards can't be joined with SQL, (2) <strong>No cross-shard transactions</strong> — ACID guarantees only work within one shard, (3) <strong>Uneven distribution</strong> — some shards ('hot shards') can become overloaded if the shard key isn't chosen carefully, (4) <strong>Schema changes</strong> must be applied to every shard, (5) Application-level routing logic adds complexity.
Q: What is connection pooling and why is it essential?
Master Answer:
Opening a database connection is expensive — 50-200ms for authentication, TLS negotiation, and session setup. Without pooling, a server handling 1,000 concurrent requests opens 1,000 connections, which can exceed the database's maximum connection limit and exhaust memory. <strong>Connection pooling</strong> maintains a pre-established pool of connections that are reused across requests. When a query completes, the connection returns to the pool for the next request. Tools like PgBouncer (PostgreSQL) and ProxySQL (MySQL) act as pooling proxies, multiplexing thousands of application connections onto dozens of actual database connections.
Industrial Blueprint
"Shopify handles billions of dollars in sales during Black Friday. Their database scaling strategy uses a tiered approach: (1) Redis caches product pages, cart data, and session data — absorbing 95% of traffic. (2) Read replicas handle catalog browse queries. (3) The primary database handles only order writes. (4) Separate database shards per merchant for large enterprise accounts. They've spoken publicly about this 'pods' architecture — each merchant's data is isolated in its own shard, preventing one large merchant from affecting others' performance."
Simulated Scenarios
© 2026 DevHub Engineering • All Proprietary Rights Reserved
Generated on March 7, 2026 • Ver: 4.0.2
Document Class: Master Education
Confidential Information • Licensed to User