Databases

Database Design & Normalization

4 min read

Focus: DATABASES

⚡

TL;DR — Quick Summary

Primary keys uniquely identify rows; foreign keys enforce relationships between tables.
Normalization eliminates redundancy: 1NF = atomic values, 2NF = no partial dependencies, 3NF = no transitive dependencies.
Relationships: 1:1, 1:Many (most common), Many:Many (needs junction table).
Denormalization is a deliberate performance trade-off — always measure before doing it.

Lesson Overview

🏗️ Database Design: Getting It Right from the Start

A poorly designed database is like a bad foundation — everything built on top of it will be unstable, slow, and painful to change. Database design is the process of structuring your data to minimize redundancy, maximize integrity, and ensure long-term scalability.

🔑 Keys — The Foundation of Relationships

Primary Key (PK): Uniquely identifies every row in a table. Cannot be NULL. Usually an auto-incrementing integer or UUID.
Foreign Key (FK): A column in one table that references the Primary Key of another table. Enforces referential integrity — you can't have an order that references a non-existent user.
Composite Key: Two or more columns together that form a unique identifier (e.g., user_id + product_id in a favorites table).
Unique Key: Ensures all values in a column are distinct — like an email address.

📐 Normalization — Eliminating Redundancy

Normalization is a set of rules (called Normal Forms) for organizing data. Each form addresses a specific type of data anomaly:

1NF (First Normal Form): Every cell holds one atomic value. No repeating groups or arrays in columns.
2NF: Everything in 1NF, plus every non-key column depends on the entire primary key (eliminates partial dependencies).
3NF: Everything in 2NF, plus non-key columns depend only on the primary key — not on other non-key columns (eliminates transitive dependencies).

The goal of normalization is not to achieve the highest normal form possible — it's to eliminate update anomalies. A user's city stored in 10,000 order rows means 10,000 rows to update if they move.

🔗 Types of Relationships

One-to-One: One user has one profile. Store as a separate table with a FK back to users.
One-to-Many: One user has many orders. Most common relationship in relational databases.
Many-to-Many: Many students take many courses. Requires a junction table (enrollments) with two foreign keys.

Conceptual Deep Dive

Here's a concrete example of why normalization matters. Imagine an orders table that stores the customer's city in every row: order_id | customer_id | customer_city | product | price 1 | 5 | New York | Book | 15.00 2 | 5 | New York | Pen | 2.00 If customer 5 moves to Los Angeles, you have to update every single one of their order rows. Miss one and your data is inconsistent — this is called an update anomaly. The fix: store the city in a customers table and just reference customer_id from orders. Now a customer's city lives in exactly one place.

Architecture & Data Flow

Blog Platform — Entity Relationship Diagram

Rendering diagram…

Normalization Journey — Unnormalized to 3NF

Rendering diagram…

Implementation Lab

Well-Designed Schema — Blog Platform

-- Users table (1NF: atomic values, single identity column)
CREATE TABLE users (
  id          INT PRIMARY KEY AUTO_INCREMENT,
  name        VARCHAR(100) NOT NULL,
  email       VARCHAR(255) NOT NULL UNIQUE,  -- unique constraint
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Posts (One-to-Many: one user → many posts)
CREATE TABLE posts (
  id          INT PRIMARY KEY AUTO_INCREMENT,
  user_id     INT NOT NULL,
  title       VARCHAR(255) NOT NULL,
  body        TEXT,
  published   BOOLEAN DEFAULT FALSE,
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);
 
-- Tags (Many-to-Many: posts ↔ tags via junction table)
CREATE TABLE tags (
  id   INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(50) NOT NULL UNIQUE
);
 
CREATE TABLE post_tags (
  post_id INT NOT NULL,
  tag_id  INT NOT NULL,
  PRIMARY KEY (post_id, tag_id),           -- composite primary key
  FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id)  REFERENCES tags(id) ON DELETE CASCADE
);

Referential Integrity — ON DELETE Behavior

-- ON DELETE CASCADE: delete user → auto-delete all their posts
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
 
-- ON DELETE SET NULL: delete category → set product's category_id to NULL
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE SET NULL
 
-- ON DELETE RESTRICT (default): prevent deleting a user who has posts
-- The DB throws an error — you must delete posts first
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT
 
-- Checking foreign key relationships (MySQL)
SELECT
  TABLE_NAME, COLUMN_NAME, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME = 'users';

Best Practices — Interactive Comparison

Store multiple values in a proper junction table, never as a comma-separated column

sql

-- Comma-separated IDs in one cell — violates 1NF
CREATE TABLE posts (
  id   INT PRIMARY KEY,
  tags VARCHAR(255)  -- stores '1,3,7,12'3,7,12'
);
 
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values

Violates 1NF — comma-separated IDs

sql

-- Comma-separated IDs in one cell — violates 1NF
CREATE TABLE posts (
  id   INT PRIMARY KEY,
  tags VARCHAR(255)  -- stores '1,3,7,12'3,7,12'
);
 
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values

Better approach

Proper Junction Table

sql

-- Many-to-many via junction table
CREATE TABLE post_tags (
  post_id INT NOT NULL,
  tag_id  INT NOT NULL,
  PRIMARY KEY (post_id, tag_id),
  FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id)  REFERENCES tags(id) ON DELETE CASCADE
);
 
-- Clean, indexable, and queryable:
SELECT p.* FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
WHERE pt.tag_id = 3;

Pro Tips — Senior Dev Insights

Use a diagramming tool like dbdiagram.io or DrawSQL to visualize your schema before writing any SQL — catching design mistakes on paper is far cheaper than migrations.

Add updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP to tables that change — invaluable for debugging and cache invalidation.

For money/currency columns, always use DECIMAL(10, 2), never FLOAT — floating-point arithmetic has rounding errors that cause financial disasters.

Consider soft deletes — adding a deleted_at timestamp column instead of actually deleting rows. Recovering accidentally deleted data is trivial.

⚖️ Relationship Types — Design Guide

Feature	One-to-One	One-to-Many	Many-to-Many
Example	User → Profile	User → Orders	Students ↔ Courses
FK lives in	Either table	Child table	Junction table
Junction table needed?
Most common type?
Can store extra relation metadata?

Common Developer Pitfalls

Storing multiple values in a single column (tags = '1,2,3') — this violates 1NF and makes querying nearly impossible.

Forgetting to add foreign key constraints and relying on application code alone for referential integrity — the DB can't enforce what it doesn't know about.

Using VARCHAR(255) for every column regardless of content — use appropriate types: TEXT for long content, TINYINT for booleans, DECIMAL for money.

Creating junction tables without a composite primary key, allowing duplicate relationship entries.

Interview Mastery

1NF: Each column holds a single atomic value — no arrays, no comma-separated lists in a cell. Each row is unique. 2NF: Satisfies 1NF, and every non-key attribute depends on the whole primary key (only relevant when using composite keys — eliminates partial dependency). 3NF: Satisfies 2NF, and no non-key column depends on another non-key column (eliminates transitive dependency). Example: storing city in orders when city depends on customer, not the order.

Referential integrity means every foreign key value must either be NULL or match an existing primary key in the referenced table. Foreign keys enforce this automatically at the database level — you cannot insert an order with a user_id that doesn't exist in the users table, and you cannot delete a user who still has orders (unless ON DELETE CASCADE is set). This prevents orphaned records and data inconsistency.

Denormalization (adding intentional redundancy) is appropriate when: (1) read performance is critical and JOINs are too slow at scale, (2) you're building reporting/analytics tables (data warehouses use star schemas which are denormalized), (3) the data changes infrequently so update anomalies are rare, (4) you're caching computed values (like a post's comment count stored directly in the posts table). Always measure first — premature denormalization creates maintenance nightmares.

Real-World Blueprint

Twitter's core schema is a classic normalization example: users table, tweets table (FK to users), follows junction table (user_id + follower_id). The tweet timeline query — 'show me tweets from everyone I follow, ordered by time' — is essentially a many-to-many join across these normalized tables. At Twitter's scale, this simple schema handles 6,000+ tweets per second.

Hands-on Lab Exercises

Design a schema for an e-commerce platform: users, products, orders, and order_items. Identify all relationships (1:1, 1:N, N:M) and draw an ERD.

Take a denormalized 'spreadsheet' table and normalize it to 3NF step by step, creating the appropriate tables and foreign keys.

Add ON DELETE CASCADE to a foreign key and verify it works by deleting a parent record and checking the child records are gone.

Create a many-to-many relationship between students and courses using a junction table with additional metadata (enrollment date, grade).

Real-World Practice Scenarios

A developer stored comma-separated tag IDs in a single column (post_tags = '1,3,7'). What problems does this cause and how do you redesign it?

Your orders table stores the product name and price directly instead of referencing a products table. A product's price changes — what data integrity problem does this create?

You need to track friendship between users (many-to-many, self-referential). How do you design the junction table?

A table has the composite key (order_id, product_id) and also stores product_name. Which normal form does this violate and why?

DevHub

Global Software Engineering Curriculum

Generated Tracking ID

DH-TX-data-rela

Databases • Module Reference

Database Design & Normalization

⚡

TL;DR — Quick Summary

Primary keys uniquely identify rows; foreign keys enforce relationships between tables.
Normalization eliminates redundancy: 1NF = atomic values, 2NF = no partial dependencies, 3NF = no transitive dependencies.
Relationships: 1:1, 1:Many (most common), Many:Many (needs junction table).
Denormalization is a deliberate performance trade-off — always measure before doing it.

Overview

🏗️ Database Design: Getting It Right from the Start

🔑 Keys — The Foundation of Relationships

Primary Key (PK): Uniquely identifies every row in a table. Cannot be NULL. Usually an auto-incrementing integer or UUID.
Foreign Key (FK): A column in one table that references the Primary Key of another table. Enforces referential integrity — you can't have an order that references a non-existent user.
Composite Key: Two or more columns together that form a unique identifier (e.g., user_id + product_id in a favorites table).
Unique Key: Ensures all values in a column are distinct — like an email address.

📐 Normalization — Eliminating Redundancy

Normalization is a set of rules (called Normal Forms) for organizing data. Each form addresses a specific type of data anomaly:

1NF (First Normal Form): Every cell holds one atomic value. No repeating groups or arrays in columns.
2NF: Everything in 1NF, plus every non-key column depends on the entire primary key (eliminates partial dependencies).
3NF: Everything in 2NF, plus non-key columns depend only on the primary key — not on other non-key columns (eliminates transitive dependencies).

🔗 Types of Relationships

One-to-One: One user has one profile. Store as a separate table with a FK back to users.
One-to-Many: One user has many orders. Most common relationship in relational databases.
Many-to-Many: Many students take many courses. Requires a junction table (enrollments) with two foreign keys.

Architecture & Logic Flow

Diagram

Rendering diagram…

Diagram

Rendering diagram…

⚖️ Relationship Types — Design Guide

Feature	One-to-One	One-to-Many	Many-to-Many
Example	User → Profile	User → Orders	Students ↔ Courses
FK lives in	Either table	Child table	Junction table
Junction table needed?
Most common type?
Can store extra relation metadata?

Deep Dive Analysis

Here's a concrete example of why normalization matters. Imagine an <code>orders</code> table that stores the customer's city in every row: <code>order_id | customer_id | customer_city | product | price</code> <code>1 | 5 | New York | Book | 15.00</code> <code>2 | 5 | New York | Pen | 2.00</code> If customer 5 moves to Los Angeles, you have to update every single one of their order rows. Miss one and your data is inconsistent — this is called an update anomaly. The fix: store the city in a <code>customers</code> table and just reference <code>customer_id</code> from orders. Now a customer's city lives in exactly one place.

Implementation Reference

Well-Designed Schema — Blog Platform

sql

-- Users table (1NF: atomic values, single identity column)
CREATE TABLE users (
  id          INT PRIMARY KEY AUTO_INCREMENT,
  name        VARCHAR(100) NOT NULL,
  email       VARCHAR(255) NOT NULL UNIQUE,  -- unique constraint
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Posts (One-to-Many: one user → many posts)
CREATE TABLE posts (
  id          INT PRIMARY KEY AUTO_INCREMENT,
  user_id     INT NOT NULL,
  title       VARCHAR(255) NOT NULL,
  body        TEXT,
  published   BOOLEAN DEFAULT FALSE,
  created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
);

-- Tags (Many-to-Many: posts ↔ tags via junction table)
CREATE TABLE tags (
  id   INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(50) NOT NULL UNIQUE
);

CREATE TABLE post_tags (
  post_id INT NOT NULL,
  tag_id  INT NOT NULL,
  PRIMARY KEY (post_id, tag_id),           -- composite primary key
  FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id)  REFERENCES tags(id) ON DELETE CASCADE
);

Referential Integrity — ON DELETE Behavior

sql

-- ON DELETE CASCADE: delete user → auto-delete all their posts
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE

-- ON DELETE SET NULL: delete category → set product's category_id to NULL
FOREIGN KEY (category_id) REFERENCES categories(id) ON DELETE SET NULL

-- ON DELETE RESTRICT (default): prevent deleting a user who has posts
-- The DB throws an error — you must delete posts first
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE RESTRICT

-- Checking foreign key relationships (MySQL)
SELECT
  TABLE_NAME, COLUMN_NAME, REFERENCED_TABLE_NAME, REFERENCED_COLUMN_NAME
FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
WHERE REFERENCED_TABLE_NAME = 'users';

Comparative Best Practices

Store multiple values in a proper junction table, never as a comma-separated column

sql

-- Comma-separated IDs in one cell — violates 1NF
CREATE TABLE posts (
  id   INT PRIMARY KEY,
  tags VARCHAR(255)  -- stores '1,3,7,12'3,7,12'
);
 
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values

Violates 1NF — comma-separated IDs

sql

-- Comma-separated IDs in one cell — violates 1NF
CREATE TABLE posts (
  id   INT PRIMARY KEY,
  tags VARCHAR(255)  -- stores '1,3,7,12'3,7,12'
);
 
-- Impossible to query efficiently:
-- WHERE tags LIKE '%3%' also matches '13', '30'
-- Cannot add a foreign key constraint
-- Cannot index individual tag values

Better approach

Proper Junction Table

sql

-- Many-to-many via junction table
CREATE TABLE post_tags (
  post_id INT NOT NULL,
  tag_id  INT NOT NULL,
  PRIMARY KEY (post_id, tag_id),
  FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id)  REFERENCES tags(id) ON DELETE CASCADE
);
 
-- Clean, indexable, and queryable:
SELECT p.* FROM posts p
JOIN post_tags pt ON p.id = pt.post_id
WHERE pt.tag_id = 3;

Common Pitfalls

•Storing multiple values in a single column (<code>tags = '1,2,3'</code>) — this violates 1NF and makes querying nearly impossible.
•Forgetting to add foreign key constraints and relying on application code alone for referential integrity — the DB can't enforce what it doesn't know about.
•Using VARCHAR(255) for every column regardless of content — use appropriate types: <code>TEXT</code> for long content, <code>TINYINT</code> for booleans, <code>DECIMAL</code> for money.
•Creating junction tables without a composite primary key, allowing duplicate relationship entries.

Key Takeaways

• Use <code>ON DELETE CASCADE</code> carefully — it silently deletes child records. Make sure that's truly what you want before using it in production.

• Choose between <code>INT AUTO_INCREMENT</code> and <code>UUID</code> for primary keys: integers are faster and smaller; UUIDs are globally unique and better for distributed systems.

• Always add <code>created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP</code> to every table — you'll always wish you had it later.

• Denormalization (intentionally adding redundancy) is sometimes the right choice for read-heavy systems where JOIN performance matters more than update simplicity.

Hands-on Practice

✓Design a schema for an e-commerce platform: users, products, orders, and order_items. Identify all relationships (1:1, 1:N, N:M) and draw an ERD.
✓Take a denormalized 'spreadsheet' table and normalize it to 3NF step by step, creating the appropriate tables and foreign keys.
✓Add <code>ON DELETE CASCADE</code> to a foreign key and verify it works by deleting a parent record and checking the child records are gone.
✓Create a many-to-many relationship between students and courses using a junction table with additional metadata (enrollment date, grade).

Expert Pro Tips

"Use a diagramming tool like dbdiagram.io or DrawSQL to visualize your schema before writing any SQL — catching design mistakes on paper is far cheaper than migrations."

"Add <code>updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP</code> to tables that change — invaluable for debugging and cache invalidation."

"For money/currency columns, always use <code>DECIMAL(10, 2)</code>, never <code>FLOAT</code> — floating-point arithmetic has rounding errors that cause financial disasters."

"Consider <code>soft deletes</code> — adding a <code>deleted_at</code> timestamp column instead of actually deleting rows. Recovering accidentally deleted data is trivial."

Interview Preparation

Q: Explain the difference between 1NF, 2NF, and 3NF.

Master Answer:

1NF: Each column holds a single atomic value — no arrays, no comma-separated lists in a cell. Each row is unique. 2NF: Satisfies 1NF, and every non-key attribute depends on the whole primary key (only relevant when using composite keys — eliminates partial dependency). 3NF: Satisfies 2NF, and no non-key column depends on another non-key column (eliminates transitive dependency). Example: storing city in orders when city depends on customer, not the order.

Q: What is referential integrity and how do foreign keys enforce it?

Master Answer:

Referential integrity means every foreign key value must either be NULL or match an existing primary key in the referenced table. Foreign keys enforce this automatically at the database level — you cannot insert an order with a <code>user_id</code> that doesn't exist in the users table, and you cannot delete a user who still has orders (unless ON DELETE CASCADE is set). This prevents orphaned records and data inconsistency.

Q: When would you intentionally denormalize a database?

Master Answer:

Industrial Blueprint

"Twitter's core schema is a classic normalization example: <code>users</code> table, <code>tweets</code> table (FK to users), <code>follows</code> junction table (user_id + follower_id). The tweet timeline query — 'show me tweets from everyone I follow, ordered by time' — is essentially a many-to-many join across these normalized tables. At Twitter's scale, this simple schema handles 6,000+ tweets per second."

Simulated Scenarios

"A developer stored comma-separated tag IDs in a single column (<code>post_tags = '1,3,7'</code>). What problems does this cause and how do you redesign it?"

"Your orders table stores the product name and price directly instead of referencing a products table. A product's price changes — what data integrity problem does this create?"

"You need to track friendship between users (many-to-many, self-referential). How do you design the junction table?"

"A table has the composite key <code>(order_id, product_id)</code> and also stores <code>product_name</code>. Which normal form does this violate and why?"

DevHub

Generated on March 7, 2026 • Ver: 4.0.2

Document Class: Master Education

Confidential Information • Licensed to User