Databases

NoSQL Databases

4 min read

Focus: DATABASES

⚡

TL;DR — Quick Summary

NoSQL = 'Not Only SQL' — four families: Document, Key-Value, Columnar, Graph.
MongoDB for flexible JSON documents; Redis for ultra-fast caching and real-time data structures.
CAP theorem: distributed systems can guarantee only 2 of 3: Consistency, Availability, Partition Tolerance.
Most production systems use SQL + NoSQL together — the right tool for the right data.

Lesson Overview

🔄 When Relational Isn't the Right Tool

Relational databases are exceptional for structured, consistent data with clear relationships. But modern applications often deal with data that is variable in structure, massive in scale, or naturally hierarchical — use cases where the rigidity of SQL schemas becomes a liability rather than an asset.

NoSQL databases emerged to solve these specific problems. The term means "Not Only SQL" — it's not a rejection of SQL, but a recognition that different data problems need different data models.

🗂️ The Four NoSQL Families

Document stores (MongoDB, CouchDB, Firestore): Store JSON-like documents with flexible schemas. Great for product catalogs, user profiles, content management.
Key-Value stores (Redis, DynamoDB, Memcached): Ultra-fast lookup by key. Great for caching, sessions, real-time leaderboards.
Column-family stores (Cassandra, HBase): Optimized for writing and reading massive amounts of time-series or event data. Used by Netflix, Twitter, Discord.
Graph databases (Neo4j, Amazon Neptune): Nodes and edges. Perfect for social networks, fraud detection, recommendation engines.

📐 The CAP Theorem

In distributed systems, you can only guarantee two of these three properties simultaneously:

Consistency: Every read gets the most recent write
Availability: Every request gets a response (not necessarily the latest data)
Partition Tolerance: The system continues operating despite network partitions

Most NoSQL databases sacrifice Consistency for Availability + Partition Tolerance (AP), offering eventual consistency — data converges to the same state eventually, but may briefly be inconsistent across nodes.

Choose NoSQL when: schema is variable, you need horizontal scaling at massive scale, or the data model naturally fits documents/graphs. Choose SQL when: data is highly relational, ACID compliance is required, or the team needs strong consistency.

Conceptual Deep Dive

Think of the different NoSQL types as different kinds of filing systems for different kinds of offices. Document store = a filing cabinet where every folder (document) has its own custom layout — one user has an address, another has 3 addresses, another has none. No rigid form to fill out. Key-value = a coat check — you hand in your coat and get a ticket number (key). That's it. Incredibly fast retrieval, but zero structure beyond key → value. Columnar = a spreadsheet designed for analytics — optimized for reading entire columns, not individual rows. Graph = a whiteboard with circles (people/things) connected by labeled arrows (relationships) — perfect for 'how is Alice connected to Bob through 3 hops?'

Architecture & Data Flow

NoSQL Families — Use Case Map

Rendering diagram…

CAP Theorem — Pick Any Two

Rendering diagram…

Implementation Lab

MongoDB — Document Operations

const { MongoClient } = require('mongodb');
const client = new MongoClient(process.env.MONGO_URI);
const db = client.db('ecommerce');
const products = db.collection('products');
 
// INSERT a document with flexible schema
await products.insertOne({
  name: 'Wireless Headphones'Headphones',
  price: 79.99,
  category: 'electronics',
  specs: { battery: '30h', connectivity: 'Bluetooth 5.0'5.0' },
  tags: ['wireless', 'audio', 'portable'],
  inStock: true
});
 
// FIND with filters, projection, and sorting
const results = await products.find(
  { category: 'electronics', price: { $lt: 100 }, inStock: true },
  { projection: { name: 1, price: 1, _id: 0 } }  // return only name and price
).sort({ price: 1 }).limit(10).toArray();
 
// UPDATE: add a field or modify existing
await products.updateMany(
  { category: 'electronics' },
  { $set: { taxRate: 0.08 }, $inc: { viewCount: 1 } }
);
 
// AGGREGATION PIPELINE (MongoDB's equivalent of SQL GROUP BY + JOIN)PIPELINE (MongoDB's equivalent of SQL GROUP BY + JOIN)
const salesByCategory = await products.aggregate([
  { $match: { inStock: true } },
  { $group: { _id: '$category', count: { $sum: 1 }, avgPrice: { $avg: '$price' } } },
  { $sort: { count: -1 } }
]).toArray();

Redis — Caching and Real-Time Data

const redis = require('redis');
const client = redis.createClient({ url: process.env.REDIS_URL });
await client.connect();
 
// Cache-aside pattern: check Redis first, fallback to DBRedis first, fallback to DB
async function getUser(userId) {
  const cacheKey = `user:${userId}`}`;
  
  // Check cache first
  const cached = await client.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Miss: fetch from DB, store in cacheDB, store in cache
  const user = await db.query('SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [userId]);
  await client.setEx(cacheKey, 3600, JSON.stringify(user));  // expire in 1hr
  return user;
}
 
// Real-time leaderboard with sorted sets
await client.zAdd('game:leaderboard', [
  { score: 1500, value: 'alice' },
  { score: 2300, value: 'bob' },
  { score: 1800, value: 'charlie' }
]);
 
// Get top 10 players10 players
const topPlayers = await client.zRangeWithScores('game:leaderboard', 0, 9, { REV: true });
 
// Session storage with automatic expiry
await client.setEx(`session:${sessionId}`}`, 1800, JSON.stringify(sessionData));
 
// Atomic counter (no race conditions)(no race conditions)
await client.incr(`page:${pageId}:views`}:views`);

Best Practices — Interactive Comparison

Design MongoDB documents around your query patterns — embed data you always read together

javascript

// Normalizing in MongoDB like SQLMongoDB like SQL
// requires 3 separate round trips to render a post
const post = await posts.findOne({ _id: postId });
const author = await users.findOne({ _id: post.userId });
const comments = await comments.find({ postId }).toArray();
 
// N+1 queries per page load1 queries per page load
// Defeats MongoDB's document model entirelyMongoDB's document model entirely

Over-normalized — SQL thinking in MongoDB

javascript

// Normalizing in MongoDB like SQLMongoDB like SQL
// requires 3 separate round trips to render a post
const post = await posts.findOne({ _id: postId });
const author = await users.findOne({ _id: post.userId });
const comments = await comments.find({ postId }).toArray();
 
// N+1 queries per page load1 queries per page load
// Defeats MongoDB's document model entirelyMongoDB's document model entirely

Better approach

Embed for your access pattern

javascript

// Embed what you always load together — one query
const post = await posts.findOne({ _id: postId });
// Returns everything in one round trip:
// {
//   title: 'SQL vs NoSQL',NoSQL',
//   author: { name: 'Alice', avatar: '/alice.jpg' },'Alice', avatar: '/alice.jpg' },
//   tags: ['databases', 'nosql'],'databases', 'nosql'],
//   comments: [{ body: 'Great post!', user: 'Bob' }]{ body: 'Great post!', user: 'Bob' }]
// }
// Single query, blazing fast, no joins needed

Pro Tips — Senior Dev Insights

MongoDB's aggregation pipeline is incredibly powerful — it can do most things SQL GROUP BY + JOIN can do, with stages like $lookup (JOIN), $unwind (flatten arrays), $facet (multiple aggregations in one pass).

Use Redis Cluster for horizontal scaling and Redis Sentinel for high availability — single-node Redis is a single point of failure in production.

Consider PostgreSQL JSONB before adopting MongoDB — PostgreSQL can store and efficiently query JSON documents while still supporting full ACID transactions and JOINs with relational data.

For time-series data, TimescaleDB (PostgreSQL extension) or InfluxDB are purpose-built — far better than generic MongoDB or Redis for sensor/metric data.

⚖️ SQL vs NoSQL — Head to Head

Feature	PostgreSQL	MongoDB	Redis	Cassandra
Schema	Strict, typed	Flexible JSON	None (key-value)	Column families
ACID transactions		Single doc only
Horizontal write scale
Complex JOINs
Best for	Relational data	Variable structure	Caching / sessions	Time-series / logs
Consistency model	Strong	Tunable	Strong (single node)	Eventual

Common Developer Pitfalls

Treating MongoDB like a SQL database and creating heavily normalized document structures — this leads to costly N+1 query problems (multiple separate queries instead of one embedded document).

Using Redis without TTLs — memory fills up indefinitely until the server crashes or starts evicting critical data.

Choosing NoSQL because it's 'cool' or 'modern' — start with PostgreSQL for most apps. Add NoSQL components when you have a specific need.

Not planning for eventual consistency — building UI that shows 'Your like was saved!' when the data might not be replicated yet, causing confusing user experiences.

Interview Mastery

SQL: structured schema, strong ACID transactions, powerful JOINs across tables, great for complex relational data. Choose when data is relational, consistency is critical (financial systems), and team needs complex queries. NoSQL: flexible/dynamic schemas, horizontal scaling, optimized for specific access patterns (key lookup, document retrieval, graph traversal). Choose when schema varies between records (product catalogs), you need massive write throughput (event logging), or the data is naturally document/graph shaped. Most production systems use both — PostgreSQL for core data, Redis for caching, MongoDB for flexible content.

Eventual consistency means that given enough time without new updates, all replicas will converge to the same value — but in the short term, different nodes may return different data. It's acceptable when: (1) slightly stale data is okay (social media likes counts, product view counts), (2) read performance is more important than perfect accuracy, (3) the system needs to remain available during network partitions. It's NOT acceptable for financial transactions, inventory management (can't oversell), or any operation where reading stale data causes real harm.

Cache-aside (also called lazy loading) works like this: (1) Application checks the cache first. (2) If found (cache hit), return the cached value. (3) If not found (cache miss), query the database, store the result in cache with a TTL, return the value. Redis is perfect for this because it's sub-millisecond for reads, supports automatic key expiration (TTL), and can store any serialized data. The trade-off: cache misses still hit the database, so the first request for new data is slow. Thundering herd problem: many requests simultaneously missing the same cache key — mitigated with cache locking.

Real-World Blueprint

"Discord handles 4 billion messages per day and needed a database that could write and read massive amounts of time-series message data. They migrated from MongoDB (document store) to Cassandra (column-family) and eventually to ScyllaDB — because Cassandra's write-optimized LSM tree architecture handles billions of append-heavy writes far better than MongoDB's document model. Their chat history is NoSQL; their user accounts and billing remain in PostgreSQL. The right database for the right job."

Hands-on Lab Exercises

Create a MongoDB collection for a product catalog with at least 20 products of varying schemas. Write queries using $match, $group, and $sort in an aggregation pipeline.

Implement a Redis cache-aside pattern for a user profile endpoint. Measure response time with and without the cache.

Build a real-time leaderboard using Redis sorted sets — add scores, retrieve top 10, update scores atomically.

Compare the same data stored in MongoDB (document) vs PostgreSQL (relational) — perform 3 queries on each and compare the query style and performance.

Real-World Practice Scenarios

Your app stores user sessions. Currently they're in PostgreSQL and the session table has 50M rows, causing slow login queries. How would you migrate to Redis?

A product catalog has items where some are books (author, ISBN, pages), some are electronics (brand, wattage, warranty), some are food (ingredients, calories). Why does this data fit NoSQL better than SQL?

Your social app needs to find 'friends of friends' (2 degrees of separation) for recommendations. Which database type is best suited for this and why?

You're storing IoT sensor data — 10,000 devices sending readings every second, 1 year retention, mostly append-only. Which NoSQL database type fits best and why?

DevHub

Global Software Engineering Curriculum

Generated Tracking ID

DH-TX-data-nosq

Databases • Module Reference

NoSQL Databases

⚡

TL;DR — Quick Summary

NoSQL = 'Not Only SQL' — four families: Document, Key-Value, Columnar, Graph.
MongoDB for flexible JSON documents; Redis for ultra-fast caching and real-time data structures.
CAP theorem: distributed systems can guarantee only 2 of 3: Consistency, Availability, Partition Tolerance.
Most production systems use SQL + NoSQL together — the right tool for the right data.

Overview

🔄 When Relational Isn't the Right Tool

NoSQL databases emerged to solve these specific problems. The term means "Not Only SQL" — it's not a rejection of SQL, but a recognition that different data problems need different data models.

🗂️ The Four NoSQL Families

Document stores (MongoDB, CouchDB, Firestore): Store JSON-like documents with flexible schemas. Great for product catalogs, user profiles, content management.
Key-Value stores (Redis, DynamoDB, Memcached): Ultra-fast lookup by key. Great for caching, sessions, real-time leaderboards.
Column-family stores (Cassandra, HBase): Optimized for writing and reading massive amounts of time-series or event data. Used by Netflix, Twitter, Discord.
Graph databases (Neo4j, Amazon Neptune): Nodes and edges. Perfect for social networks, fraud detection, recommendation engines.

📐 The CAP Theorem

In distributed systems, you can only guarantee two of these three properties simultaneously:

Consistency: Every read gets the most recent write
Availability: Every request gets a response (not necessarily the latest data)
Partition Tolerance: The system continues operating despite network partitions

Architecture & Logic Flow

Diagram

Rendering diagram…

Diagram

Rendering diagram…

⚖️ SQL vs NoSQL — Head to Head

Feature	PostgreSQL	MongoDB	Redis	Cassandra
Schema	Strict, typed	Flexible JSON	None (key-value)	Column families
ACID transactions		Single doc only
Horizontal write scale
Complex JOINs
Best for	Relational data	Variable structure	Caching / sessions	Time-series / logs
Consistency model	Strong	Tunable	Strong (single node)	Eventual

Deep Dive Analysis

Think of the different NoSQL types as different kinds of filing systems for different kinds of offices. Document store = a filing cabinet where every folder (document) has its own custom layout — one user has an address, another has 3 addresses, another has none. No rigid form to fill out. Key-value = a coat check — you hand in your coat and get a ticket number (key). That's it. Incredibly fast retrieval, but zero structure beyond key → value. Columnar = a spreadsheet designed for analytics — optimized for reading entire columns, not individual rows. Graph = a whiteboard with circles (people/things) connected by labeled arrows (relationships) — perfect for 'how is Alice connected to Bob through 3 hops?'

Implementation Reference

MongoDB — Document Operations

javascript

const { MongoClient } = require('mongodb');
const client = new MongoClient(process.env.MONGO_URI);
const db = client.db('ecommerce');
const products = db.collection('products');

// INSERT a document with flexible schema
await products.insertOne({
  name: 'Wireless Headphones',
  price: 79.99,
  category: 'electronics',
  specs: { battery: '30h', connectivity: 'Bluetooth 5.0' },
  tags: ['wireless', 'audio', 'portable'],
  inStock: true
});

// FIND with filters, projection, and sorting
const results = await products.find(
  { category: 'electronics', price: { $lt: 100 }, inStock: true },
  { projection: { name: 1, price: 1, _id: 0 } }  // return only name and price
).sort({ price: 1 }).limit(10).toArray();

// UPDATE: add a field or modify existing
await products.updateMany(
  { category: 'electronics' },
  { $set: { taxRate: 0.08 }, $inc: { viewCount: 1 } }
);

// AGGREGATION PIPELINE (MongoDB's equivalent of SQL GROUP BY + JOIN)
const salesByCategory = await products.aggregate([
  { $match: { inStock: true } },
  { $group: { _id: '$category', count: { $sum: 1 }, avgPrice: { $avg: '$price' } } },
  { $sort: { count: -1 } }
]).toArray();

Redis — Caching and Real-Time Data

javascript

const redis = require('redis');
const client = redis.createClient({ url: process.env.REDIS_URL });
await client.connect();

// Cache-aside pattern: check Redis first, fallback to DB
async function getUser(userId) {
  const cacheKey = `user:${userId}`;
  
  // Check cache first
  const cached = await client.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Miss: fetch from DB, store in cache
  const user = await db.query('SELECT * FROM users WHERE id = ?', [userId]);
  await client.setEx(cacheKey, 3600, JSON.stringify(user));  // expire in 1hr
  return user;
}

// Real-time leaderboard with sorted sets
await client.zAdd('game:leaderboard', [
  { score: 1500, value: 'alice' },
  { score: 2300, value: 'bob' },
  { score: 1800, value: 'charlie' }
]);

// Get top 10 players
const topPlayers = await client.zRangeWithScores('game:leaderboard', 0, 9, { REV: true });

// Session storage with automatic expiry
await client.setEx(`session:${sessionId}`, 1800, JSON.stringify(sessionData));

// Atomic counter (no race conditions)
await client.incr(`page:${pageId}:views`);

Comparative Best Practices

Design MongoDB documents around your query patterns — embed data you always read together

javascript

// Normalizing in MongoDB like SQLMongoDB like SQL
// requires 3 separate round trips to render a post
const post = await posts.findOne({ _id: postId });
const author = await users.findOne({ _id: post.userId });
const comments = await comments.find({ postId }).toArray();
 
// N+1 queries per page load1 queries per page load
// Defeats MongoDB's document model entirelyMongoDB's document model entirely

Over-normalized — SQL thinking in MongoDB

javascript

// Normalizing in MongoDB like SQLMongoDB like SQL
// requires 3 separate round trips to render a post
const post = await posts.findOne({ _id: postId });
const author = await users.findOne({ _id: post.userId });
const comments = await comments.find({ postId }).toArray();
 
// N+1 queries per page load1 queries per page load
// Defeats MongoDB's document model entirelyMongoDB's document model entirely

Better approach

Embed for your access pattern

javascript

// Embed what you always load together — one query
const post = await posts.findOne({ _id: postId });
// Returns everything in one round trip:
// {
//   title: 'SQL vs NoSQL',NoSQL',
//   author: { name: 'Alice', avatar: '/alice.jpg' },'Alice', avatar: '/alice.jpg' },
//   tags: ['databases', 'nosql'],'databases', 'nosql'],
//   comments: [{ body: 'Great post!', user: 'Bob' }]{ body: 'Great post!', user: 'Bob' }]
// }
// Single query, blazing fast, no joins needed

Common Pitfalls

•Treating MongoDB like a SQL database and creating heavily normalized document structures — this leads to costly N+1 query problems (multiple separate queries instead of one embedded document).
•Using Redis without TTLs — memory fills up indefinitely until the server crashes or starts evicting critical data.
•Choosing NoSQL because it's 'cool' or 'modern' — start with PostgreSQL for most apps. Add NoSQL components when you have a specific need.
•Not planning for eventual consistency — building UI that shows 'Your like was saved!' when the data might not be replicated yet, causing confusing user experiences.

Key Takeaways

• MongoDB's flexible schema is both a superpower and a footgun. Enforce schema validation at the application layer or use MongoDB's built-in JSON Schema validation to prevent garbage data.

• Design MongoDB documents around your query patterns, not your data relationships. If you always load a post with its comments, embed comments in the post document — don't normalize like SQL.

• Redis data is in-memory — plan your memory budget. Use <code>TTL (Time To Live)</code> on all cached data to prevent memory bloat. Never cache data forever unless you have an explicit invalidation strategy.

• For Redis in production, enable persistence (AOF or RDB snapshots) if the cached data would be expensive to rebuild — otherwise a Redis restart means a cold cache.

Hands-on Practice

✓Create a MongoDB collection for a product catalog with at least 20 products of varying schemas. Write queries using $match, $group, and $sort in an aggregation pipeline.
✓Implement a Redis cache-aside pattern for a user profile endpoint. Measure response time with and without the cache.
✓Build a real-time leaderboard using Redis sorted sets — add scores, retrieve top 10, update scores atomically.
✓Compare the same data stored in MongoDB (document) vs PostgreSQL (relational) — perform 3 queries on each and compare the query style and performance.

Expert Pro Tips

"MongoDB's aggregation pipeline is incredibly powerful — it can do most things SQL GROUP BY + JOIN can do, with stages like $lookup (JOIN), $unwind (flatten arrays), $facet (multiple aggregations in one pass)."

"Use Redis Cluster for horizontal scaling and Redis Sentinel for high availability — single-node Redis is a single point of failure in production."

"Consider PostgreSQL JSONB before adopting MongoDB — PostgreSQL can store and efficiently query JSON documents while still supporting full ACID transactions and JOINs with relational data."

"For time-series data, TimescaleDB (PostgreSQL extension) or InfluxDB are purpose-built — far better than generic MongoDB or Redis for sensor/metric data."

Interview Preparation

Q: What is the difference between SQL and NoSQL databases? When would you choose one over the other?

Master Answer:

SQL: structured schema, strong ACID transactions, powerful JOINs across tables, great for complex relational data. Choose when data is relational, consistency is critical (financial systems), and team needs complex queries. NoSQL: flexible/dynamic schemas, horizontal scaling, optimized for specific access patterns (key lookup, document retrieval, graph traversal). Choose when schema varies between records (product catalogs), you need massive write throughput (event logging), or the data is naturally document/graph shaped. Most production systems use both — PostgreSQL for core data, Redis for caching, MongoDB for flexible content.

Q: What is eventual consistency and when is it acceptable?

Master Answer:

Eventual consistency means that given enough time without new updates, all replicas will converge to the same value — but in the short term, different nodes may return different data. It's acceptable when: (1) slightly stale data is okay (social media likes counts, product view counts), (2) read performance is more important than perfect accuracy, (3) the system needs to remain available during network partitions. It's NOT acceptable for financial transactions, inventory management (can't oversell), or any operation where reading stale data causes real harm.

Q: What is the cache-aside pattern and how does Redis implement it?

Master Answer:

Industrial Blueprint

Simulated Scenarios

"Your app stores user sessions. Currently they're in PostgreSQL and the session table has 50M rows, causing slow login queries. How would you migrate to Redis?"

"A product catalog has items where some are books (author, ISBN, pages), some are electronics (brand, wattage, warranty), some are food (ingredients, calories). Why does this data fit NoSQL better than SQL?"

"Your social app needs to find 'friends of friends' (2 degrees of separation) for recommendations. Which database type is best suited for this and why?"

"You're storing IoT sensor data — 10,000 devices sending readings every second, 1 year retention, mostly append-only. Which NoSQL database type fits best and why?"

DevHub

Generated on March 7, 2026 • Ver: 4.0.2

Document Class: Master Education

Confidential Information • Licensed to User