Databases

NoSQL Databases

4 min read
Focus: DATABASES
⚑

TL;DR β€” Quick Summary

  • NoSQL = 'Not Only SQL' β€” four families: Document, Key-Value, Columnar, Graph.
  • MongoDB for flexible JSON documents; Redis for ultra-fast caching and real-time data structures.
  • CAP theorem: distributed systems can guarantee only 2 of 3: Consistency, Availability, Partition Tolerance.
  • Most production systems use SQL + NoSQL together β€” the right tool for the right data.

Lesson Overview

πŸ”„ When Relational Isn't the Right Tool

Relational databases are exceptional for structured, consistent data with clear relationships. But modern applications often deal with data that is variable in structure, massive in scale, or naturally hierarchical β€” use cases where the rigidity of SQL schemas becomes a liability rather than an asset.

NoSQL databases emerged to solve these specific problems. The term means "Not Only SQL" β€” it's not a rejection of SQL, but a recognition that different data problems need different data models.

πŸ—‚οΈ The Four NoSQL Families

  • Document stores (MongoDB, CouchDB, Firestore): Store JSON-like documents with flexible schemas. Great for product catalogs, user profiles, content management.
  • Key-Value stores (Redis, DynamoDB, Memcached): Ultra-fast lookup by key. Great for caching, sessions, real-time leaderboards.
  • Column-family stores (Cassandra, HBase): Optimized for writing and reading massive amounts of time-series or event data. Used by Netflix, Twitter, Discord.
  • Graph databases (Neo4j, Amazon Neptune): Nodes and edges. Perfect for social networks, fraud detection, recommendation engines.

πŸ“ The CAP Theorem

In distributed systems, you can only guarantee two of these three properties simultaneously:

  • Consistency: Every read gets the most recent write
  • Availability: Every request gets a response (not necessarily the latest data)
  • Partition Tolerance: The system continues operating despite network partitions

Most NoSQL databases sacrifice Consistency for Availability + Partition Tolerance (AP), offering eventual consistency β€” data converges to the same state eventually, but may briefly be inconsistent across nodes.

Choose NoSQL when: schema is variable, you need horizontal scaling at massive scale, or the data model naturally fits documents/graphs. Choose SQL when: data is highly relational, ACID compliance is required, or the team needs strong consistency.

Conceptual Deep Dive

Think of the different NoSQL types as different kinds of filing systems for different kinds of offices. Document store = a filing cabinet where every folder (document) has its own custom layout β€” one user has an address, another has 3 addresses, another has none. No rigid form to fill out. Key-value = a coat check β€” you hand in your coat and get a ticket number (key). That's it. Incredibly fast retrieval, but zero structure beyond key β†’ value. Columnar = a spreadsheet designed for analytics β€” optimized for reading entire columns, not individual rows. Graph = a whiteboard with circles (people/things) connected by labeled arrows (relationships) β€” perfect for 'how is Alice connected to Bob through 3 hops?'

Architecture & Data Flow

NoSQL Families β€” Use Case Map
Rendering diagram…
CAP Theorem β€” Pick Any Two
Rendering diagram…

Implementation Lab

MongoDB β€” Document Operations
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const { MongoClient } = require('mongodb');
const client = new MongoClient(process.env.MONGO_URI);
const db = client.db('ecommerce');
const products = db.collection('products');
 
// INSERT a document with flexible schema
await products.insertOne({
  name: 'Wireless Headphones'Headphones',
  price: 79.99,
  category: 'electronics',
  specs: { battery: '30h', connectivity: 'Bluetooth 5.0'5.0' },
  tags: ['wireless', 'audio', 'portable'],
  inStock: true
});
 
// FIND with filters, projection, and sorting
const results = await products.find(
  { category: 'electronics', price: { $lt: 100 }, inStock: true },
  { projection: { name: 1, price: 1, _id: 0 } }  // return only name and price
).sort({ price: 1 }).limit(10).toArray();
 
// UPDATE: add a field or modify existing
await products.updateMany(
  { category: 'electronics' },
  { $set: { taxRate: 0.08 }, $inc: { viewCount: 1 } }
);
 
// AGGREGATION PIPELINE (MongoDB's equivalent of SQL GROUP BY + JOIN)PIPELINE (MongoDB's equivalent of SQL GROUP BY + JOIN)
const salesByCategory = await products.aggregate([
  { $match: { inStock: true } },
  { $group: { _id: '$category', count: { $sum: 1 }, avgPrice: { $avg: '$price' } } },
  { $sort: { count: -1 } }
]).toArray();
Redis β€” Caching and Real-Time Data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
const redis = require('redis');
const client = redis.createClient({ url: process.env.REDIS_URL });
await client.connect();
 
// Cache-aside pattern: check Redis first, fallback to DBRedis first, fallback to DB
async function getUser(userId) {
  const cacheKey = `user:${userId}`}`;
  
  // Check cache first
  const cached = await client.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  // Miss: fetch from DB, store in cacheDB, store in cache
  const user = await db.query('SELECT * FROM users WHERE id = ?'FROM users WHERE id = ?', [userId]);
  await client.setEx(cacheKey, 3600, JSON.stringify(user));  // expire in 1hr
  return user;
}
 
// Real-time leaderboard with sorted sets
await client.zAdd('game:leaderboard', [
  { score: 1500, value: 'alice' },
  { score: 2300, value: 'bob' },
  { score: 1800, value: 'charlie' }
]);
 
// Get top 10 players10 players
const topPlayers = await client.zRangeWithScores('game:leaderboard', 0, 9, { REV: true });
 
// Session storage with automatic expiry
await client.setEx(`session:${sessionId}`}`, 1800, JSON.stringify(sessionData));
 
// Atomic counter (no race conditions)(no race conditions)
await client.incr(`page:${pageId}:views`}:views`);

Best Practices β€” Interactive Comparison

Design MongoDB documents around your query patterns β€” embed data you always read together

javascript
1
2
3
4
5
6
7
8
// Normalizing in MongoDB like SQLMongoDB like SQL
// requires 3 separate round trips to render a post
const post = await posts.findOne({ _id: postId });
const author = await users.findOne({ _id: post.userId });
const comments = await comments.find({ postId }).toArray();
 
// N+1 queries per page load1 queries per page load
// Defeats MongoDB's document model entirelyMongoDB's document model entirely

Pro Tips β€” Senior Dev Insights

1

MongoDB's aggregation pipeline is incredibly powerful β€” it can do most things SQL GROUP BY + JOIN can do, with stages like $lookup (JOIN), $unwind (flatten arrays), $facet (multiple aggregations in one pass).

2

Use Redis Cluster for horizontal scaling and Redis Sentinel for high availability β€” single-node Redis is a single point of failure in production.

3

Consider PostgreSQL JSONB before adopting MongoDB β€” PostgreSQL can store and efficiently query JSON documents while still supporting full ACID transactions and JOINs with relational data.

4

For time-series data, TimescaleDB (PostgreSQL extension) or InfluxDB are purpose-built β€” far better than generic MongoDB or Redis for sensor/metric data.

βš–οΈ SQL vs NoSQL β€” Head to Head

FeaturePostgreSQLMongoDBRedisCassandra
SchemaStrict, typedFlexible JSONNone (key-value)Column families
ACID transactionsSingle doc only
Horizontal write scale
Complex JOINs
Best forRelational dataVariable structureCaching / sessionsTime-series / logs
Consistency modelStrongTunableStrong (single node)Eventual

Common Developer Pitfalls

!

Treating MongoDB like a SQL database and creating heavily normalized document structures β€” this leads to costly N+1 query problems (multiple separate queries instead of one embedded document).

!

Using Redis without TTLs β€” memory fills up indefinitely until the server crashes or starts evicting critical data.

!

Choosing NoSQL because it's 'cool' or 'modern' β€” start with PostgreSQL for most apps. Add NoSQL components when you have a specific need.

!

Not planning for eventual consistency β€” building UI that shows 'Your like was saved!' when the data might not be replicated yet, causing confusing user experiences.

Interview Mastery

SQL: structured schema, strong ACID transactions, powerful JOINs across tables, great for complex relational data. Choose when data is relational, consistency is critical (financial systems), and team needs complex queries. NoSQL: flexible/dynamic schemas, horizontal scaling, optimized for specific access patterns (key lookup, document retrieval, graph traversal). Choose when schema varies between records (product catalogs), you need massive write throughput (event logging), or the data is naturally document/graph shaped. Most production systems use both β€” PostgreSQL for core data, Redis for caching, MongoDB for flexible content.

Eventual consistency means that given enough time without new updates, all replicas will converge to the same value β€” but in the short term, different nodes may return different data. It's acceptable when: (1) slightly stale data is okay (social media likes counts, product view counts), (2) read performance is more important than perfect accuracy, (3) the system needs to remain available during network partitions. It's NOT acceptable for financial transactions, inventory management (can't oversell), or any operation where reading stale data causes real harm.

Cache-aside (also called lazy loading) works like this: (1) Application checks the cache first. (2) If found (cache hit), return the cached value. (3) If not found (cache miss), query the database, store the result in cache with a TTL, return the value. Redis is perfect for this because it's sub-millisecond for reads, supports automatic key expiration (TTL), and can store any serialized data. The trade-off: cache misses still hit the database, so the first request for new data is slow. Thundering herd problem: many requests simultaneously missing the same cache key β€” mitigated with cache locking.

Real-World Blueprint

"Discord handles 4 billion messages per day and needed a database that could write and read massive amounts of time-series message data. They migrated from MongoDB (document store) to Cassandra (column-family) and eventually to ScyllaDB β€” because Cassandra's write-optimized LSM tree architecture handles billions of append-heavy writes far better than MongoDB's document model. Their chat history is NoSQL; their user accounts and billing remain in PostgreSQL. The right database for the right job."

Hands-on Lab Exercises

1

Create a MongoDB collection for a product catalog with at least 20 products of varying schemas. Write queries using $match, $group, and $sort in an aggregation pipeline.

2

Implement a Redis cache-aside pattern for a user profile endpoint. Measure response time with and without the cache.

3

Build a real-time leaderboard using Redis sorted sets β€” add scores, retrieve top 10, update scores atomically.

4

Compare the same data stored in MongoDB (document) vs PostgreSQL (relational) β€” perform 3 queries on each and compare the query style and performance.

Real-World Practice Scenarios

Your app stores user sessions. Currently they're in PostgreSQL and the session table has 50M rows, causing slow login queries. How would you migrate to Redis?

A product catalog has items where some are books (author, ISBN, pages), some are electronics (brand, wattage, warranty), some are food (ingredients, calories). Why does this data fit NoSQL better than SQL?

Your social app needs to find 'friends of friends' (2 degrees of separation) for recommendations. Which database type is best suited for this and why?

You're storing IoT sensor data β€” 10,000 devices sending readings every second, 1 year retention, mostly append-only. Which NoSQL database type fits best and why?