Databases

Database Migrations

3 min read
Focus: DATABASES
⚑

TL;DR β€” Quick Summary

  • Migrations are version-controlled, numbered SQL files that evolve your schema incrementally and reproducibly.
  • Every migration needs both up (apply) and down (rollback) functions.
  • Always add new columns as nullable first to avoid table locks on large tables.
  • Never modify an already-deployed migration β€” always create a new numbered migration file instead.

Lesson Overview

πŸš€ The Schema Change Problem

Your application's database schema will change constantly as features are added. The challenge: how do you change the schema in development, staging, and production reliably, with your whole team, without manual SQL scripts scattered in Slack messages?

Database migrations are version-controlled SQL scripts that evolve your schema step by step. Each migration is a discrete, numbered change β€” like a commit for your database structure.

πŸ“‹ How Migration Tools Work

Tools like Flyway, Liquibase, Prisma Migrate, Knex, Alembic (Python), and ActiveRecord (Rails) all share the same core approach:

  1. Store migrations as numbered files in your repo (001_create_users.sql, 002_add_email_column.sql)
  2. Track which migrations have run in a special schema history table
  3. On deployment, run only new migrations that haven't been applied yet
  4. Each migration has an up function (apply the change) and a down function (reverse it)

⚠️ Zero-Downtime Migrations

Adding a column is trivial in development. In production with 500 concurrent users, ALTER TABLE can lock the entire table for minutes on large tables. Zero-downtime migration strategies include:

  • Expand-Contract pattern: Add new column (nullable), backfill data, update code to write to both, deploy, drop old column.
  • Online DDL: MySQL 8+ and PostgreSQL support non-blocking index creation and column additions.
  • Shadow tables: Create the new table structure, migrate data in background, swap with atomic rename.

Rule: never run a raw ALTER TABLE that locks a large production table during peak hours. Plan zero-downtime migration strategy for any schema change on tables with >1M rows.

Conceptual Deep Dive

Think of migrations like Git commits for your database schema. Just as you wouldn't hand-edit code files directly on a production server, you don't manually run SQL ALTER TABLE statements in production. Instead, every schema change is a migration file β€” version controlled, peer reviewed, tested in staging, and automatically applied on deployment.

The migration history table is the source of truth: it records every migration that has ever been applied, with a timestamp and checksum. When you deploy, the tool reads this table, compares it to your migration files, and runs only the new ones in order.

Architecture & Data Flow

Migration Deployment Flow
Rendering diagram…
Expand-Contract Pattern β€” Zero Downtime Column Rename
Rendering diagram…

Implementation Lab

Knex.js Migrations β€” Up and Down
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// migrations/20240115_001_create_users_table.js
exports.up = async function(knex) {
  await knex.schema.createTable('users', function(table) {
    table.increments('id').primary();
    table.string('name', 100).notNullable();
    table.string('email', 255).notNullable().unique();
    table.string('password_hash', 255).notNullable();
    table.enum('status', ['active', 'inactive', 'banned']).defaultTo('active');
    table.timestamps(true, true);  // created_at, updated_at
  });
};
 
exports.down = async function(knex) {
  await knex.schema.dropTableIfExists('users');
};
 
// --- Next migration ---
// migrations/20240120_002_add_profile_fields.js
exports.up = async function(knex) {
  await knex.schema.alterTable('users', function(table) {
    table.string('avatar_url').nullable();  // always nullable for new columns!new columns!
    table.text('bio').nullable();
    table.date('birthdate').nullable();
    table.index('status');  // add index
  });
};
 
exports.down = async function(knex) {
  await knex.schema.alterTable('users', function(table) {
    table.dropColumn('avatar_url');
    table.dropColumn('bio');
    table.dropColumn('birthdate');
    table.dropIndex('status');
  });
};
Flyway SQL Migrations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- File: V1__Create_users_table.sql  (Flyway naming: V{version}__{description})
CREATE TABLE users (
  id         BIGINT PRIMARY KEY AUTO_INCREMENT,
  name       VARCHAR(100) NOT NULL,
  email      VARCHAR(255) NOT NULL,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  UNIQUE INDEX ux_users_email (email)
);
 
-- File: V2__Create_posts_table.sql
CREATE TABLE posts (
  id         BIGINT PRIMARY KEY AUTO_INCREMENT,
  user_id    BIGINT NOT NULL,
  title      VARCHAR(255) NOT NULL,
  body       TEXT,
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
  INDEX idx_posts_user_id (user_id)
);
 
-- File: V3__Add_soft_delete_to_users.sql
-- Safe: nullable column addition doesn't require a full table lock
ALTER TABLE users
ADD COLUMN deleted_at TIMESTAMP NULL DEFAULT NULL,
ADD INDEX idx_users_deleted_at (deleted_at);
Zero-Downtime Migration β€” Rename a Column Safely
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- UNSAFE: Direct rename locks the table
-- ALTER TABLE users RENAME COLUMN username TO display_name;
-- This is fine for small tables but locks large tables for minutes!
 
-- SAFE: Expand-Contract pattern (3 separate deployments)
 
-- PHASE 1: Add new column (nullable β€” no default needed, fast)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100) NULL;
 
-- PHASE 2: Backfill existing data (run in batches to avoid long lock)
UPDATE users SET display_name = username
WHERE display_name IS NULL
LIMIT 10000;  -- run repeatedly in small batches
-- Application now writes to BOTH username AND display_name
 
-- PHASE 3: Make new column NOT NULL, drop old column
-- (only after app code only uses display_name)
ALTER TABLE users MODIFY display_name VARCHAR(100) NOT NULL;
ALTER TABLE users DROP COLUMN username;

Best Practices β€” Interactive Comparison

New columns must be nullable first β€” NOT NULL on a large table locks it for minutes in production

sql
1
2
3
4
5
6
7
-- This scans ALL 10 million rows to set the default
-- Locks the entire table for minutes!
-- Your entire app is degraded while this runs
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NOT NULL DEFAULT '';
 
-- Users see: ERROR 1205 Lock wait timeout exceeded

Pro Tips β€” Senior Dev Insights

1

Always backup production before running migrations: mysqldump -u root -p mydb > backup_pre_migration_$(date +%Y%m%d).sql

2

For large tables, use tools designed for zero-downtime migrations: gh-ost (MySQL), pg_repack (PostgreSQL), or pt-online-schema-change (Percona).

3

Use Prisma Migrate for TypeScript/Node projects β€” it generates migrations from your schema definition and keeps them in sync with your TypeScript types.

4

Adopt a convention for migration naming: YYYYMMDD_NNN_description.sql makes it easy to see what changed and when, and sorts correctly in any file browser.

βš–οΈ Popular Migration Tools

FeatureFlywayLiquibasePrisma MigrateKnexAlembic
Primary languageSQL / JavaSQL / XML / YAMLTypeScriptJavaScriptPython
Auto-generates migrations
Rollback supportPro onlyManual
Schema drift detection
Popular withJava / SpringEnterpriseNext.js / PrismaNode.jsPython / Flask

Common Developer Pitfalls

!

Editing deployed migration files β€” this breaks the checksum check and desynchronizes environments.

!

Adding NOT NULL columns without defaults to large production tables β€” causes a full table scan and lock for minutes.

!

Not taking a database backup before running production migrations β€” you lose the ability to restore if something goes wrong.

!

Running migrations manually instead of in CI/CD β€” leads to environment drift where staging and production have different schemas.

Interview Mastery

Migration tools store a checksum of each migration file in the schema history table. If you modify a file, the checksum no longer matches what was applied. The migration tool will detect this and throw an error, refusing to run. In the best case, this blocks your CI/CD pipeline. In the worst case, if you somehow bypass the check, you'll have different schema states across environments with no way to reconcile them. New migrations should always be new numbered files.

The expand-contract (also called parallel change) pattern involves 3 phases: (1) Expand β€” add the new column/table alongside the old one (purely additive, no locks). (2) Migrate β€” update application code to write to both old and new structures simultaneously; backfill historical data in small batches. (3) Contract β€” once all data is migrated and no code references the old structure, drop the old column/table. This approach never locks tables and is fully reversible at each phase.

Strategies depend on the migration type: (1) If it failed mid-run and the tool left the DB in a partial state, manually ROLLBACK or run the down migration. (2) If the migration was a pure DDL (no data loss), write and run a corrective migration. (3) If data was corrupted, restore from the pre-migration backup (this is why you always take a backup before running migrations in production). (4) Fix the migration file, reset it in the schema history table, and re-run. Always keep a pre-migration backup for production deployments.

Real-World Blueprint

"GitHub runs hundreds of database migrations per year on tables with billions of rows. Their blog has documented how adding a single column to the repositories table (with 40 million rows) required weeks of planning. They used GitHub's own tool 'gh-ost' (GitHub Online Schema Tool) to ghost-copy the table with the new schema, replicate changes in real time, and atomically swap tables β€” zero downtime, zero locks, zero data loss. The migration was a single migration file in their repo, reviewed by the DBA team, and executed automatically."

Hands-on Lab Exercises

1

Set up Knex or Flyway in a Node.js or Java project. Create 5 sequential migrations building a complete blog schema.

2

Practice the full rollback cycle: run migrations up, verify the schema, run migrations down, verify the rollback worked.

3

Implement the expand-contract pattern on a sample table: add a new column, backfill it, then drop the old column across 3 migration files.

4

Set up a CI pipeline (GitHub Actions) that automatically runs migrations on push to main, and fails the build if a migration fails.

Real-World Practice Scenarios

You need to add a NOT NULL column with no default to a 10-million-row users table in production without downtime. What's your strategy?

A developer modified an already-deployed migration file to fix a typo. The CI/CD pipeline is now failing with a checksum error. How do you fix it?

You deployed a migration that renamed a column. The new deployment is failing. You need to rollback the code AND the migration. What's the sequence?

Your app runs on 5 servers. During deployment, servers 1-2 are running the new code but servers 3-5 are still running old code. The new code expects a new column. How should the migration be timed?