Database Design & Normalization
TL;DR β Quick Summary
- Primary keys uniquely identify rows; foreign keys enforce relationships between tables.
- Normalization eliminates redundancy: 1NF = atomic values, 2NF = no partial dependencies, 3NF = no transitive dependencies.
- Relationships: 1:1, 1:Many (most common), Many:Many (needs junction table).
- Denormalization is a deliberate performance trade-off β always measure before doing it.
Lesson Overview
ποΈ Database Design: Getting It Right from the Start
A poorly designed database is like a bad foundation β everything built on top of it will be unstable, slow, and painful to change. Database design is the process of structuring your data to minimize redundancy, maximize integrity, and ensure long-term scalability.
π Keys β The Foundation of Relationships
- Primary Key (PK): Uniquely identifies every row in a table. Cannot be NULL. Usually an auto-incrementing integer or UUID.
- Foreign Key (FK): A column in one table that references the Primary Key of another table. Enforces referential integrity β you can't have an order that references a non-existent user.
- Composite Key: Two or more columns together that form a unique identifier (e.g.,
user_id + product_idin a favorites table). - Unique Key: Ensures all values in a column are distinct β like an email address.
π Normalization β Eliminating Redundancy
Normalization is a set of rules (called Normal Forms) for organizing data. Each form addresses a specific type of data anomaly:
- 1NF (First Normal Form): Every cell holds one atomic value. No repeating groups or arrays in columns.
- 2NF: Everything in 1NF, plus every non-key column depends on the entire primary key (eliminates partial dependencies).
- 3NF: Everything in 2NF, plus non-key columns depend only on the primary key β not on other non-key columns (eliminates transitive dependencies).
The goal of normalization is not to achieve the highest normal form possible β it's to eliminate update anomalies. A user's city stored in 10,000 order rows means 10,000 rows to update if they move.
π Types of Relationships
- One-to-One: One user has one profile. Store as a separate table with a FK back to users.
- One-to-Many: One user has many orders. Most common relationship in relational databases.
- Many-to-Many: Many students take many courses. Requires a junction table (enrollments) with two foreign keys.
Conceptual Deep Dive
orders table that stores the customer's city in every row:
order_id | customer_id | customer_city | product | price
1 | 5 | New York | Book | 15.00
2 | 5 | New York | Pen | 2.00
If customer 5 moves to Los Angeles, you have to update every single one of their order rows. Miss one and your data is inconsistent β this is called an update anomaly.
The fix: store the city in a customers table and just reference customer_id from orders. Now a customer's city lives in exactly one place.Architecture & Data Flow
Implementation Lab
-- Users table (1NF: atomic values, single identity column)
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE, -- unique constraint
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Posts (One-to-Many: one user β many posts)
CREATE TABLE posts (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL,
title VARCHAR(255) NOT NULL,
body TEXT,
published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
-- Tags (Many-to-Many: posts β tags via junction table)
CREATE TABLE tags (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL UNIQUE
);
CREATE TABLE post_tags (
post_id INT NOT NULL,
tag_id INT NOT NULL,
PRIMARY KEY (post_id, tag_id), -- composite primary key
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);-- ON DELETE CASCADE: delete user β auto-delete all their posts
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
-- ON DELETE SET NULL: delete category β set product's category_id to NULL
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE SET NULL
-- ON DELETE RESTRICT (default): prevent deleting a user who has posts
-- The DB throws an error β you must delete posts first
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
-- Checking foreign key relationships (MySQL)
SELECT
TABLE_NAME, COLUMN_NAME, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME = 'users';Best Practices β Interactive Comparison
Store multiple values in a proper junction table, never as a comma-separated column
-- Comma-separated IDs in one cell β violates 1NF
CREATE TABLE posts (
id INT PRIMARY KEY,
tags VARCHAR(255) -- stores '1,3,7,12'3,7,12'
);
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values-- Comma-separated IDs in one cell β violates 1NF
CREATE TABLE posts (
id INT PRIMARY KEY,
tags VARCHAR(255) -- stores '1,3,7,12'3,7,12'
);
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values-- Many-to-many via junction table
CREATE TABLE post_tags (
post_id INT NOT NULL,
tag_id INT NOT NULL,
PRIMARY KEY (post_id, tag_id),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);
-- Clean, indexable, and queryable:
SELECT p.* FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
WHERE pt.tag_id = 3;Pro Tips β Senior Dev Insights
Use a diagramming tool like dbdiagram.io or DrawSQL to visualize your schema before writing any SQL β catching design mistakes on paper is far cheaper than migrations.
Add updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP to tables that change β invaluable for debugging and cache invalidation.
For money/currency columns, always use DECIMAL(10, 2), never FLOAT β floating-point arithmetic has rounding errors that cause financial disasters.
Consider soft deletes β adding a deleted_at timestamp column instead of actually deleting rows. Recovering accidentally deleted data is trivial.
βοΈ Relationship Types β Design Guide
| Feature | One-to-One | One-to-Many | Many-to-Many |
|---|---|---|---|
| Example | User β Profile | User β Orders | Students β Courses |
| FK lives in | Either table | Child table | Junction table |
| Junction table needed? | |||
| Most common type? | |||
| Can store extra relation metadata? |
Common Developer Pitfalls
tags = '1,2,3') β this violates 1NF and makes querying nearly impossible.Forgetting to add foreign key constraints and relying on application code alone for referential integrity β the DB can't enforce what it doesn't know about.
TEXT for long content, TINYINT for booleans, DECIMAL for money.Creating junction tables without a composite primary key, allowing duplicate relationship entries.
Interview Mastery
1NF: Each column holds a single atomic value β no arrays, no comma-separated lists in a cell. Each row is unique. 2NF: Satisfies 1NF, and every non-key attribute depends on the whole primary key (only relevant when using composite keys β eliminates partial dependency). 3NF: Satisfies 2NF, and no non-key column depends on another non-key column (eliminates transitive dependency). Example: storing city in orders when city depends on customer, not the order.
Referential integrity means every foreign key value must either be NULL or match an existing primary key in the referenced table. Foreign keys enforce this automatically at the database level β you cannot insert an order with a user_id that doesn't exist in the users table, and you cannot delete a user who still has orders (unless ON DELETE CASCADE is set). This prevents orphaned records and data inconsistency.
Denormalization (adding intentional redundancy) is appropriate when: (1) read performance is critical and JOINs are too slow at scale, (2) you're building reporting/analytics tables (data warehouses use star schemas which are denormalized), (3) the data changes infrequently so update anomalies are rare, (4) you're caching computed values (like a post's comment count stored directly in the posts table). Always measure first β premature denormalization creates maintenance nightmares.
Real-World Blueprint
users table, tweets table (FK to users), follows junction table (user_id + follower_id). The tweet timeline query β 'show me tweets from everyone I follow, ordered by time' β is essentially a many-to-many join across these normalized tables. At Twitter's scale, this simple schema handles 6,000+ tweets per second.Hands-on Lab Exercises
Design a schema for an e-commerce platform: users, products, orders, and order_items. Identify all relationships (1:1, 1:N, N:M) and draw an ERD.
Take a denormalized 'spreadsheet' table and normalize it to 3NF step by step, creating the appropriate tables and foreign keys.
ON DELETE CASCADE to a foreign key and verify it works by deleting a parent record and checking the child records are gone.Create a many-to-many relationship between students and courses using a junction table with additional metadata (enrollment date, grade).
Real-World Practice Scenarios
post_tags = '1,3,7'). What problems does this cause and how do you redesign it?Your orders table stores the product name and price directly instead of referencing a products table. A product's price changes β what data integrity problem does this create?
You need to track friendship between users (many-to-many, self-referential). How do you design the junction table?
(order_id, product_id) and also stores product_name. Which normal form does this violate and why?Database Design & Normalization
TL;DR β Quick Summary
- Primary keys uniquely identify rows; foreign keys enforce relationships between tables.
- Normalization eliminates redundancy: 1NF = atomic values, 2NF = no partial dependencies, 3NF = no transitive dependencies.
- Relationships: 1:1, 1:Many (most common), Many:Many (needs junction table).
- Denormalization is a deliberate performance trade-off β always measure before doing it.
Overview
ποΈ Database Design: Getting It Right from the Start
A poorly designed database is like a bad foundation β everything built on top of it will be unstable, slow, and painful to change. Database design is the process of structuring your data to minimize redundancy, maximize integrity, and ensure long-term scalability.
π Keys β The Foundation of Relationships
- Primary Key (PK): Uniquely identifies every row in a table. Cannot be NULL. Usually an auto-incrementing integer or UUID.
- Foreign Key (FK): A column in one table that references the Primary Key of another table. Enforces referential integrity β you can't have an order that references a non-existent user.
- Composite Key: Two or more columns together that form a unique identifier (e.g.,
user_id + product_idin a favorites table). - Unique Key: Ensures all values in a column are distinct β like an email address.
π Normalization β Eliminating Redundancy
Normalization is a set of rules (called Normal Forms) for organizing data. Each form addresses a specific type of data anomaly:
- 1NF (First Normal Form): Every cell holds one atomic value. No repeating groups or arrays in columns.
- 2NF: Everything in 1NF, plus every non-key column depends on the entire primary key (eliminates partial dependencies).
- 3NF: Everything in 2NF, plus non-key columns depend only on the primary key β not on other non-key columns (eliminates transitive dependencies).
The goal of normalization is not to achieve the highest normal form possible β it's to eliminate update anomalies. A user's city stored in 10,000 order rows means 10,000 rows to update if they move.
π Types of Relationships
- One-to-One: One user has one profile. Store as a separate table with a FK back to users.
- One-to-Many: One user has many orders. Most common relationship in relational databases.
- Many-to-Many: Many students take many courses. Requires a junction table (enrollments) with two foreign keys.
Architecture & Logic Flow
βοΈ Relationship Types β Design Guide
| Feature | One-to-One | One-to-Many | Many-to-Many |
|---|---|---|---|
| Example | User β Profile | User β Orders | Students β Courses |
| FK lives in | Either table | Child table | Junction table |
| Junction table needed? | |||
| Most common type? | |||
| Can store extra relation metadata? |
Deep Dive Analysis
Here's a concrete example of why normalization matters. Imagine an <code>orders</code> table that stores the customer's city in every row: <code>order_id | customer_id | customer_city | product | price</code> <code>1 | 5 | New York | Book | 15.00</code> <code>2 | 5 | New York | Pen | 2.00</code> If customer 5 moves to Los Angeles, you have to update every single one of their order rows. Miss one and your data is <strong>inconsistent</strong> β this is called an <em>update anomaly</em>. The fix: store the city in a <code>customers</code> table and just reference <code>customer_id</code> from orders. Now a customer's city lives in exactly one place.
Implementation Reference
-- Users table (1NF: atomic values, single identity column)
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) NOT NULL UNIQUE, -- unique constraint
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Posts (One-to-Many: one user β many posts)
CREATE TABLE posts (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT NOT NULL,
title VARCHAR(255) NOT NULL,
body TEXT,
published BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
-- Tags (Many-to-Many: posts β tags via junction table)
CREATE TABLE tags (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL UNIQUE
);
CREATE TABLE post_tags (
post_id INT NOT NULL,
tag_id INT NOT NULL,
PRIMARY KEY (post_id, tag_id), -- composite primary key
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);-- ON DELETE CASCADE: delete user β auto-delete all their posts
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
-- ON DELETE SET NULL: delete category β set product's category_id to NULL
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE SET NULL
-- ON DELETE RESTRICT (default): prevent deleting a user who has posts
-- The DB throws an error β you must delete posts first
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
-- Checking foreign key relationships (MySQL)
SELECT
TABLE_NAME, COLUMN_NAME, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME = 'users';Comparative Best Practices
Store multiple values in a proper junction table, never as a comma-separated column
-- Comma-separated IDs in one cell β violates 1NF
CREATE TABLE posts (
id INT PRIMARY KEY,
tags VARCHAR(255) -- stores '1,3,7,12'3,7,12'
);
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values-- Comma-separated IDs in one cell β violates 1NF
CREATE TABLE posts (
id INT PRIMARY KEY,
tags VARCHAR(255) -- stores '1,3,7,12'3,7,12'
);
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values-- Many-to-many via junction table
CREATE TABLE post_tags (
post_id INT NOT NULL,
tag_id INT NOT NULL,
PRIMARY KEY (post_id, tag_id),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);
-- Clean, indexable, and queryable:
SELECT p.* FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
WHERE pt.tag_id = 3;Common Pitfalls
- β’Storing multiple values in a single column (<code>tags = '1,2,3'</code>) β this violates 1NF and makes querying nearly impossible.
- β’Forgetting to add foreign key constraints and relying on application code alone for referential integrity β the DB can't enforce what it doesn't know about.
- β’Using VARCHAR(255) for every column regardless of content β use appropriate types: <code>TEXT</code> for long content, <code>TINYINT</code> for booleans, <code>DECIMAL</code> for money.
- β’Creating junction tables without a composite primary key, allowing duplicate relationship entries.
Key Takeaways
Hands-on Practice
- βDesign a schema for an e-commerce platform: users, products, orders, and order_items. Identify all relationships (1:1, 1:N, N:M) and draw an ERD.
- βTake a denormalized 'spreadsheet' table and normalize it to 3NF step by step, creating the appropriate tables and foreign keys.
- βAdd <code>ON DELETE CASCADE</code> to a foreign key and verify it works by deleting a parent record and checking the child records are gone.
- βCreate a many-to-many relationship between students and courses using a junction table with additional metadata (enrollment date, grade).
Expert Pro Tips
Interview Preparation
Q: Explain the difference between 1NF, 2NF, and 3NF.
Master Answer:
<strong>1NF:</strong> Each column holds a single atomic value β no arrays, no comma-separated lists in a cell. Each row is unique. <strong>2NF:</strong> Satisfies 1NF, and every non-key attribute depends on the <em>whole</em> primary key (only relevant when using composite keys β eliminates partial dependency). <strong>3NF:</strong> Satisfies 2NF, and no non-key column depends on another non-key column (eliminates transitive dependency). Example: storing city in orders when city depends on customer, not the order.
Q: What is referential integrity and how do foreign keys enforce it?
Master Answer:
Referential integrity means every foreign key value must either be NULL or match an existing primary key in the referenced table. Foreign keys enforce this automatically at the database level β you cannot insert an order with a <code>user_id</code> that doesn't exist in the users table, and you cannot delete a user who still has orders (unless ON DELETE CASCADE is set). This prevents orphaned records and data inconsistency.
Q: When would you intentionally denormalize a database?
Master Answer:
Denormalization (adding intentional redundancy) is appropriate when: (1) read performance is critical and JOINs are too slow at scale, (2) you're building reporting/analytics tables (data warehouses use star schemas which are denormalized), (3) the data changes infrequently so update anomalies are rare, (4) you're caching computed values (like a post's comment count stored directly in the posts table). Always measure first β premature denormalization creates maintenance nightmares.
Industrial Blueprint
"Twitter's core schema is a classic normalization example: <code>users</code> table, <code>tweets</code> table (FK to users), <code>follows</code> junction table (user_id + follower_id). The tweet timeline query β 'show me tweets from everyone I follow, ordered by time' β is essentially a many-to-many join across these normalized tables. At Twitter's scale, this simple schema handles 6,000+ tweets per second."
Simulated Scenarios
Β© 2026 DevHub Engineering β’ All Proprietary Rights Reserved
Generated on March 7, 2026 β’ Ver: 4.0.2
Document Class: Master Education
Confidential Information β’ Licensed to User