Database Migrations
TL;DR β Quick Summary
- Migrations are version-controlled, numbered SQL files that evolve your schema incrementally and reproducibly.
- Every migration needs both
up(apply) anddown(rollback) functions. - Always add new columns as nullable first to avoid table locks on large tables.
- Never modify an already-deployed migration β always create a new numbered migration file instead.
Lesson Overview
π The Schema Change Problem
Your application's database schema will change constantly as features are added. The challenge: how do you change the schema in development, staging, and production reliably, with your whole team, without manual SQL scripts scattered in Slack messages?
Database migrations are version-controlled SQL scripts that evolve your schema step by step. Each migration is a discrete, numbered change β like a commit for your database structure.
π How Migration Tools Work
Tools like Flyway, Liquibase, Prisma Migrate, Knex, Alembic (Python), and ActiveRecord (Rails) all share the same core approach:
- Store migrations as numbered files in your repo (
001_create_users.sql,002_add_email_column.sql) - Track which migrations have run in a special schema history table
- On deployment, run only new migrations that haven't been applied yet
- Each migration has an
upfunction (apply the change) and adownfunction (reverse it)
β οΈ Zero-Downtime Migrations
Adding a column is trivial in development. In production with 500 concurrent users, ALTER TABLE can lock the entire table for minutes on large tables. Zero-downtime migration strategies include:
- Expand-Contract pattern: Add new column (nullable), backfill data, update code to write to both, deploy, drop old column.
- Online DDL: MySQL 8+ and PostgreSQL support non-blocking index creation and column additions.
- Shadow tables: Create the new table structure, migrate data in background, swap with atomic rename.
Rule: never run a raw ALTER TABLE that locks a large production table during peak hours. Plan zero-downtime migration strategy for any schema change on tables with >1M rows.
Conceptual Deep Dive
Think of migrations like Git commits for your database schema. Just as you wouldn't hand-edit code files directly on a production server, you don't manually run SQL ALTER TABLE statements in production. Instead, every schema change is a migration file β version controlled, peer reviewed, tested in staging, and automatically applied on deployment.
The migration history table is the source of truth: it records every migration that has ever been applied, with a timestamp and checksum. When you deploy, the tool reads this table, compares it to your migration files, and runs only the new ones in order.
Architecture & Data Flow
Implementation Lab
// migrations/20240115_001_create_users_table.js
exports.up = async function(knex) {
await knex.schema.createTable('users', function(table) {
table.increments('id').primary();
table.string('name', 100).notNullable();
table.string('email', 255).notNullable().unique();
table.string('password_hash', 255).notNullable();
table.enum('status', ['active', 'inactive', 'banned']).defaultTo('active');
table.timestamps(true, true); // created_at, updated_at
});
};
exports.down = async function(knex) {
await knex.schema.dropTableIfExists('users');
};
// --- Next migration ---
// migrations/20240120_002_add_profile_fields.js
exports.up = async function(knex) {
await knex.schema.alterTable('users', function(table) {
table.string('avatar_url').nullable(); // always nullable for new columns!new columns!
table.text('bio').nullable();
table.date('birthdate').nullable();
table.index('status'); // add index
});
};
exports.down = async function(knex) {
await knex.schema.alterTable('users', function(table) {
table.dropColumn('avatar_url');
table.dropColumn('bio');
table.dropColumn('birthdate');
table.dropIndex('status');
});
};-- File: V1__Create_users_table.sql (Flyway naming: V{version}__{description})
CREATE TABLE users (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE INDEX ux_users_email (email)
);
-- File: V2__Create_posts_table.sql
CREATE TABLE posts (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
title VARCHAR(255) NOT NULL,
body TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
INDEX idx_posts_user_id (user_id)
);
-- File: V3__Add_soft_delete_to_users.sql
-- Safe: nullable column addition doesn't require a full table lock
ALTER TABLE users
ADD COLUMN deleted_at TIMESTAMP NULL DEFAULT NULL,
ADD INDEX idx_users_deleted_at (deleted_at);-- UNSAFE: Direct rename locks the table
-- ALTER TABLE users RENAME COLUMN username TO display_name;
-- This is fine for small tables but locks large tables for minutes!
-- SAFE: Expand-Contract pattern (3 separate deployments)
-- PHASE 1: Add new column (nullable β no default needed, fast)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100) NULL;
-- PHASE 2: Backfill existing data (run in batches to avoid long lock)
UPDATE users SET display_name = username
WHERE display_name IS NULL
LIMIT 10000; -- run repeatedly in small batches
-- Application now writes to BOTH username AND display_name
-- PHASE 3: Make new column NOT NULL, drop old column
-- (only after app code only uses display_name)
ALTER TABLE users MODIFY display_name VARCHAR(100) NOT NULL;
ALTER TABLE users DROP COLUMN username;Best Practices β Interactive Comparison
New columns must be nullable first β NOT NULL on a large table locks it for minutes in production
-- This scans ALL 10 million rows to set the default
-- Locks the entire table for minutes!
-- Your entire app is degraded while this runs
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NOT NULL DEFAULT '';
-- Users see: ERROR 1205 Lock wait timeout exceeded-- This scans ALL 10 million rows to set the default
-- Locks the entire table for minutes!
-- Your entire app is degraded while this runs
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NOT NULL DEFAULT '';
-- Users see: ERROR 1205 Lock wait timeout exceeded-- Phase 1: Add nullable (runs in milliseconds)
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NULL;
-- Phase 2: Backfill in small batches (no lock held)
UPDATE users SET display_name = name
WHERE display_name IS NULL LIMIT 5000;
-- Repeat until 0 rows updated
-- Phase 3: Enforce NOT NULL after full backfill
ALTER TABLE users
MODIFY display_name VARCHAR(100) NOT NULL;Pro Tips β Senior Dev Insights
Always backup production before running migrations: mysqldump -u root -p mydb > backup_pre_migration_$(date +%Y%m%d).sql
For large tables, use tools designed for zero-downtime migrations: gh-ost (MySQL), pg_repack (PostgreSQL), or pt-online-schema-change (Percona).
Use Prisma Migrate for TypeScript/Node projects β it generates migrations from your schema definition and keeps them in sync with your TypeScript types.
Adopt a convention for migration naming: YYYYMMDD_NNN_description.sql makes it easy to see what changed and when, and sorts correctly in any file browser.
βοΈ Popular Migration Tools
| Feature | Flyway | Liquibase | Prisma Migrate | Knex | Alembic |
|---|---|---|---|---|---|
| Primary language | SQL / Java | SQL / XML / YAML | TypeScript | JavaScript | Python |
| Auto-generates migrations | |||||
| Rollback support | Pro only | Manual | |||
| Schema drift detection | |||||
| Popular with | Java / Spring | Enterprise | Next.js / Prisma | Node.js | Python / Flask |
Common Developer Pitfalls
Editing deployed migration files β this breaks the checksum check and desynchronizes environments.
Adding NOT NULL columns without defaults to large production tables β causes a full table scan and lock for minutes.
Not taking a database backup before running production migrations β you lose the ability to restore if something goes wrong.
Running migrations manually instead of in CI/CD β leads to environment drift where staging and production have different schemas.
Interview Mastery
Migration tools store a checksum of each migration file in the schema history table. If you modify a file, the checksum no longer matches what was applied. The migration tool will detect this and throw an error, refusing to run. In the best case, this blocks your CI/CD pipeline. In the worst case, if you somehow bypass the check, you'll have different schema states across environments with no way to reconcile them. New migrations should always be new numbered files.
The expand-contract (also called parallel change) pattern involves 3 phases: (1) Expand β add the new column/table alongside the old one (purely additive, no locks). (2) Migrate β update application code to write to both old and new structures simultaneously; backfill historical data in small batches. (3) Contract β once all data is migrated and no code references the old structure, drop the old column/table. This approach never locks tables and is fully reversible at each phase.
Strategies depend on the migration type: (1) If it failed mid-run and the tool left the DB in a partial state, manually ROLLBACK or run the down migration. (2) If the migration was a pure DDL (no data loss), write and run a corrective migration. (3) If data was corrupted, restore from the pre-migration backup (this is why you always take a backup before running migrations in production). (4) Fix the migration file, reset it in the schema history table, and re-run. Always keep a pre-migration backup for production deployments.
Real-World Blueprint
"GitHub runs hundreds of database migrations per year on tables with billions of rows. Their blog has documented how adding a single column to the repositories table (with 40 million rows) required weeks of planning. They used GitHub's own tool 'gh-ost' (GitHub Online Schema Tool) to ghost-copy the table with the new schema, replicate changes in real time, and atomically swap tables β zero downtime, zero locks, zero data loss. The migration was a single migration file in their repo, reviewed by the DBA team, and executed automatically."
Hands-on Lab Exercises
Set up Knex or Flyway in a Node.js or Java project. Create 5 sequential migrations building a complete blog schema.
Practice the full rollback cycle: run migrations up, verify the schema, run migrations down, verify the rollback worked.
Implement the expand-contract pattern on a sample table: add a new column, backfill it, then drop the old column across 3 migration files.
Set up a CI pipeline (GitHub Actions) that automatically runs migrations on push to main, and fails the build if a migration fails.
Real-World Practice Scenarios
You need to add a NOT NULL column with no default to a 10-million-row users table in production without downtime. What's your strategy?
A developer modified an already-deployed migration file to fix a typo. The CI/CD pipeline is now failing with a checksum error. How do you fix it?
You deployed a migration that renamed a column. The new deployment is failing. You need to rollback the code AND the migration. What's the sequence?
Your app runs on 5 servers. During deployment, servers 1-2 are running the new code but servers 3-5 are still running old code. The new code expects a new column. How should the migration be timed?
Database Migrations
TL;DR β Quick Summary
- Migrations are version-controlled, numbered SQL files that evolve your schema incrementally and reproducibly.
- Every migration needs both
up(apply) anddown(rollback) functions. - Always add new columns as nullable first to avoid table locks on large tables.
- Never modify an already-deployed migration β always create a new numbered migration file instead.
Overview
π The Schema Change Problem
Your application's database schema will change constantly as features are added. The challenge: how do you change the schema in development, staging, and production reliably, with your whole team, without manual SQL scripts scattered in Slack messages?
Database migrations are version-controlled SQL scripts that evolve your schema step by step. Each migration is a discrete, numbered change β like a commit for your database structure.
π How Migration Tools Work
Tools like Flyway, Liquibase, Prisma Migrate, Knex, Alembic (Python), and ActiveRecord (Rails) all share the same core approach:
- Store migrations as numbered files in your repo (
001_create_users.sql,002_add_email_column.sql) - Track which migrations have run in a special schema history table
- On deployment, run only new migrations that haven't been applied yet
- Each migration has an
upfunction (apply the change) and adownfunction (reverse it)
β οΈ Zero-Downtime Migrations
Adding a column is trivial in development. In production with 500 concurrent users, ALTER TABLE can lock the entire table for minutes on large tables. Zero-downtime migration strategies include:
- Expand-Contract pattern: Add new column (nullable), backfill data, update code to write to both, deploy, drop old column.
- Online DDL: MySQL 8+ and PostgreSQL support non-blocking index creation and column additions.
- Shadow tables: Create the new table structure, migrate data in background, swap with atomic rename.
Rule: never run a raw ALTER TABLE that locks a large production table during peak hours. Plan zero-downtime migration strategy for any schema change on tables with >1M rows.
Architecture & Logic Flow
βοΈ Popular Migration Tools
| Feature | Flyway | Liquibase | Prisma Migrate | Knex | Alembic |
|---|---|---|---|---|---|
| Primary language | SQL / Java | SQL / XML / YAML | TypeScript | JavaScript | Python |
| Auto-generates migrations | |||||
| Rollback support | Pro only | Manual | |||
| Schema drift detection | |||||
| Popular with | Java / Spring | Enterprise | Next.js / Prisma | Node.js | Python / Flask |
Deep Dive Analysis
Think of migrations like Git commits for your database schema. Just as you wouldn't hand-edit code files directly on a production server, you don't manually run SQL ALTER TABLE statements in production. Instead, every schema change is a migration file β version controlled, peer reviewed, tested in staging, and automatically applied on deployment. The migration history table is the source of truth: it records every migration that has ever been applied, with a timestamp and checksum. When you deploy, the tool reads this table, compares it to your migration files, and runs only the new ones in order.
Implementation Reference
// migrations/20240115_001_create_users_table.js
exports.up = async function(knex) {
await knex.schema.createTable('users', function(table) {
table.increments('id').primary();
table.string('name', 100).notNullable();
table.string('email', 255).notNullable().unique();
table.string('password_hash', 255).notNullable();
table.enum('status', ['active', 'inactive', 'banned']).defaultTo('active');
table.timestamps(true, true); // created_at, updated_at
});
};
exports.down = async function(knex) {
await knex.schema.dropTableIfExists('users');
};
// --- Next migration ---
// migrations/20240120_002_add_profile_fields.js
exports.up = async function(knex) {
await knex.schema.alterTable('users', function(table) {
table.string('avatar_url').nullable(); // always nullable for new columns!
table.text('bio').nullable();
table.date('birthdate').nullable();
table.index('status'); // add index
});
};
exports.down = async function(knex) {
await knex.schema.alterTable('users', function(table) {
table.dropColumn('avatar_url');
table.dropColumn('bio');
table.dropColumn('birthdate');
table.dropIndex('status');
});
};-- File: V1__Create_users_table.sql (Flyway naming: V{version}__{description})
CREATE TABLE users (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE INDEX ux_users_email (email)
);
-- File: V2__Create_posts_table.sql
CREATE TABLE posts (
id BIGINT PRIMARY KEY AUTO_INCREMENT,
user_id BIGINT NOT NULL,
title VARCHAR(255) NOT NULL,
body TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE,
INDEX idx_posts_user_id (user_id)
);
-- File: V3__Add_soft_delete_to_users.sql
-- Safe: nullable column addition doesn't require a full table lock
ALTER TABLE users
ADD COLUMN deleted_at TIMESTAMP NULL DEFAULT NULL,
ADD INDEX idx_users_deleted_at (deleted_at);-- UNSAFE: Direct rename locks the table
-- ALTER TABLE users RENAME COLUMN username TO display_name;
-- This is fine for small tables but locks large tables for minutes!
-- SAFE: Expand-Contract pattern (3 separate deployments)
-- PHASE 1: Add new column (nullable β no default needed, fast)
ALTER TABLE users ADD COLUMN display_name VARCHAR(100) NULL;
-- PHASE 2: Backfill existing data (run in batches to avoid long lock)
UPDATE users SET display_name = username
WHERE display_name IS NULL
LIMIT 10000; -- run repeatedly in small batches
-- Application now writes to BOTH username AND display_name
-- PHASE 3: Make new column NOT NULL, drop old column
-- (only after app code only uses display_name)
ALTER TABLE users MODIFY display_name VARCHAR(100) NOT NULL;
ALTER TABLE users DROP COLUMN username;Comparative Best Practices
New columns must be nullable first β NOT NULL on a large table locks it for minutes in production
-- This scans ALL 10 million rows to set the default
-- Locks the entire table for minutes!
-- Your entire app is degraded while this runs
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NOT NULL DEFAULT '';
-- Users see: ERROR 1205 Lock wait timeout exceeded-- This scans ALL 10 million rows to set the default
-- Locks the entire table for minutes!
-- Your entire app is degraded while this runs
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NOT NULL DEFAULT '';
-- Users see: ERROR 1205 Lock wait timeout exceeded-- Phase 1: Add nullable (runs in milliseconds)
ALTER TABLE users
ADD COLUMN display_name VARCHAR(100) NULL;
-- Phase 2: Backfill in small batches (no lock held)
UPDATE users SET display_name = name
WHERE display_name IS NULL LIMIT 5000;
-- Repeat until 0 rows updated
-- Phase 3: Enforce NOT NULL after full backfill
ALTER TABLE users
MODIFY display_name VARCHAR(100) NOT NULL;Common Pitfalls
- β’Editing deployed migration files β this breaks the checksum check and desynchronizes environments.
- β’Adding NOT NULL columns without defaults to large production tables β causes a full table scan and lock for minutes.
- β’Not taking a database backup before running production migrations β you lose the ability to restore if something goes wrong.
- β’Running migrations manually instead of in CI/CD β leads to environment drift where staging and production have different schemas.
Key Takeaways
Hands-on Practice
- βSet up Knex or Flyway in a Node.js or Java project. Create 5 sequential migrations building a complete blog schema.
- βPractice the full rollback cycle: run migrations up, verify the schema, run migrations down, verify the rollback worked.
- βImplement the expand-contract pattern on a sample table: add a new column, backfill it, then drop the old column across 3 migration files.
- βSet up a CI pipeline (GitHub Actions) that automatically runs migrations on push to main, and fails the build if a migration fails.
Expert Pro Tips
Interview Preparation
Q: Why is it dangerous to modify an existing migration file after it has been deployed?
Master Answer:
Migration tools store a checksum of each migration file in the schema history table. If you modify a file, the checksum no longer matches what was applied. The migration tool will detect this and throw an error, refusing to run. In the best case, this blocks your CI/CD pipeline. In the worst case, if you somehow bypass the check, you'll have different schema states across environments with no way to reconcile them. New migrations should always be new numbered files.
Q: What is the expand-contract pattern for zero-downtime migrations?
Master Answer:
The <strong>expand-contract</strong> (also called parallel change) pattern involves 3 phases: (1) <strong>Expand</strong> β add the new column/table alongside the old one (purely additive, no locks). (2) <strong>Migrate</strong> β update application code to write to both old and new structures simultaneously; backfill historical data in small batches. (3) <strong>Contract</strong> β once all data is migrated and no code references the old structure, drop the old column/table. This approach never locks tables and is fully reversible at each phase.
Q: How do you handle a failed migration in production?
Master Answer:
Strategies depend on the migration type: (1) If it failed mid-run and the tool left the DB in a partial state, manually ROLLBACK or run the down migration. (2) If the migration was a pure DDL (no data loss), write and run a corrective migration. (3) If data was corrupted, restore from the pre-migration backup (this is why you always take a backup before running migrations in production). (4) Fix the migration file, reset it in the schema history table, and re-run. Always keep a pre-migration backup for production deployments.
Industrial Blueprint
"GitHub runs hundreds of database migrations per year on tables with billions of rows. Their blog has documented how adding a single column to the repositories table (with 40 million rows) required weeks of planning. They used GitHub's own tool 'gh-ost' (GitHub Online Schema Tool) to ghost-copy the table with the new schema, replicate changes in real time, and atomically swap tables β zero downtime, zero locks, zero data loss. The migration was a single migration file in their repo, reviewed by the DBA team, and executed automatically."
Simulated Scenarios
Β© 2026 DevHub Engineering β’ All Proprietary Rights Reserved
Generated on March 7, 2026 β’ Ver: 4.0.2
Document Class: Master Education
Confidential Information β’ Licensed to User