Joins & Complex Queries
TL;DR β Quick Summary
- INNER JOIN = intersection (both sides must match). LEFT JOIN = all left rows + matched right rows.
- LEFT JOIN +
WHERE right.id IS NULL= find rows in left table with NO match in right table. - CTEs (
WITH name AS) make complex multi-step queries readable and maintainable. - Window functions (
OVER()) calculate across rows without collapsing them like GROUP BY does.
Lesson Overview
π JOINs: Bringing Tables Together
Real-world data lives across multiple tables. JOINs let you combine rows from two or more tables based on a related column β usually a foreign key relationship. A well-written JOIN is one of the most powerful tools in a developer's SQL toolkit.
ποΈ Types of JOINs
- INNER JOIN: Returns only rows where the join condition matches in both tables. Most common. Think: intersection.
- LEFT JOIN (LEFT OUTER): Returns ALL rows from the left table, plus matching rows from the right. Non-matching right-side values are
NULL. Use when you need all records from one table regardless of matches. - RIGHT JOIN: Mirror of LEFT JOIN β all rows from right table. Rarely used (just swap the table order and use LEFT JOIN).
- FULL OUTER JOIN: All rows from both tables. NULLs where no match exists. PostgreSQL supports this natively; MySQL requires a UNION workaround.
- CROSS JOIN: Every row from the left table paired with every row from the right. Returns MΓN rows. Rarely intentional β an accidental CROSS JOIN can return billions of rows.
- SELF JOIN: A table joined to itself. Used for hierarchical data like org charts or finding related records in the same table.
π¦ Subqueries and CTEs
Sometimes one query isn't enough. You can nest queries inside each other:
- Subquery (inline): A query inside another query's WHERE, FROM, or SELECT clause.
- CTE (Common Table Expression): Named temporary result sets using
WITH. Far more readable than nested subqueries. - Window Functions: Perform calculations across rows related to the current row β like
ROW_NUMBER(),RANK(),LAG(),SUM() OVER.
The Venn diagram mental model for JOINs: INNER JOIN = intersection, LEFT JOIN = full left circle, FULL OUTER = full union of both circles.
Conceptual Deep Dive
users table and an orders table. An INNER JOIN gives you only users who have placed orders. A LEFT JOIN gives you all users β those with orders will have order data, and users with no orders will have NULL in the order columns.
When would you choose LEFT over INNER? Use LEFT JOIN when the absence of a match is meaningful information β like 'show me all users AND their order count (even users with zero orders).' Use INNER JOIN when you only care about records that exist in both tables.Architecture & Data Flow
Implementation Lab
-- INNER JOIN: only users WHO HAVE orders
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- LEFT JOIN: ALL users, with order info if it exists
-- Users with no orders show NULL for order columns
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Find users who have NEVER placed an order
SELECT u.name
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL; -- the NULL trick: no matching row = never ordered
-- SELF JOIN: find employees and their manager's name
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;-- Count orders per user, including users with zero orders
SELECT
u.name,
u.email,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name, u.email
ORDER BY lifetime_value DESC;
-- Posts with author name AND tag list
SELECT
p.title,
u.name AS author,
GROUP_CONCAT(t.name ORDER BY t.name SEPARATOR ', ') AS tags
FROM posts p
INNER JOIN users u ON p.user_id = u.id
LEFT JOIN post_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
GROUP BY p.id, p.title, u.name;-- CTE: readable multi-step query
WITH monthly_revenue AS (
SELECT
DATE_FORMAT(created_at, '%Y-%m') AS month,
SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY month
),
revenue_with_growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue
FROM monthly_revenue
)
SELECT
month,
revenue,
ROUND((revenue - prev_month_revenue) / prev_month_revenue * 100, 2) AS growth_pct
FROM revenue_with_growth
ORDER BY month;
-- Window function: rank users by order count
SELECT
name,
order_count,
RANK() OVER (ORDER BY order_count DESC) AS rank,
NTILE(4) OVER (ORDER BY order_count DESC) AS quartile
FROM user_order_counts;Best Practices β Interactive Comparison
Use LEFT JOIN to include records with no related data β INNER JOIN silently drops them
-- INNER JOIN silently excludes users
-- who have never placed an order
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.id;
-- Users with 0 orders are completely invisible
-- Your report is missing data and you won't know it-- INNER JOIN silently excludes users
-- who have never placed an order
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.id;
-- Users with 0 orders are completely invisible
-- Your report is missing data and you won't know it-- LEFT JOIN keeps ALL users
-- Users with no orders show count = 0
SELECT
u.name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
ORDER BY lifetime_value DESC;Refactor nested subqueries into readable CTEs
-- 4-level nesting β impossible to debug or modify
SELECT * FROM (
SELECT user_id, SUM(total) AS rev FROM (
SELECT * FROM orders
WHERE status = 'completed'
) o GROUP BY user_id
) r WHERE r.rev > (
SELECT AVG(total_rev) FROM (
SELECT user_id, SUM(total) AS total_rev
FROM orders GROUP BY user_id
) x
);-- 4-level nesting β impossible to debug or modify
SELECT * FROM (
SELECT user_id, SUM(total) AS rev FROM (
SELECT * FROM orders
WHERE status = 'completed'
) o GROUP BY user_id
) r WHERE r.rev > (
SELECT AVG(total_rev) FROM (
SELECT user_id, SUM(total) AS total_rev
FROM orders GROUP BY user_id
) x
);-- Each step is named and readable
WITH completed_orders AS (
SELECT user_id, total
FROM orders
WHERE status = 'completed'
),
user_revenue AS (
SELECT user_id, SUM(total) AS rev
FROM completed_orders
GROUP BY user_id
),
avg_revenue AS (
SELECT AVG(rev) AS avg_rev FROM user_revenue
)
SELECT ur.*
FROM user_revenue ur, avg_revenue
WHERE ur.rev > avg_revenue.avg_rev;Pro Tips β Senior Dev Insights
Run EXPLAIN on JOIN queries to see if the database is doing nested loops (slow) vs hash joins (fast) β the difference can be orders of magnitude.
In PostgreSQL, LATERAL JOIN is extremely powerful β it allows the right side of the join to reference columns from the left side, like a correlated subquery but joinable.
For many-to-many JOINs returning duplicate rows, consider EXISTS or IN instead of a JOIN when you only need to check existence, not retrieve data.
EXPLAIN ANALYZE (PostgreSQL) runs the query and shows actual vs estimated row counts β invaluable for identifying bad query planner estimates.
βοΈ JOIN Types at a Glance
| Feature | INNER | LEFT | RIGHT | FULL OUTER | CROSS |
|---|---|---|---|---|---|
| Returns unmatched left rows | |||||
| Returns unmatched right rows | |||||
| Can produce NULLs | |||||
| Common in production | |||||
| Risk of row explosion |
Common Developer Pitfalls
Using INNER JOIN when LEFT JOIN is needed β losing users/products with no related records silently.
Forgetting to include all non-aggregated SELECT columns in the GROUP BY clause β causes 'ambiguous column' errors or wrong results.
Accidentally creating a CROSS JOIN by forgetting the ON clause β can return billions of rows and crash your database.
Nesting subqueries 4-5 levels deep instead of using CTEs β creates unreadable, unmaintainable query spaghetti.
Interview Mastery
INNER JOIN returns only rows where the join condition matches in both tables β rows with no match are excluded entirely. LEFT JOIN returns all rows from the left table regardless of whether a match exists in the right table; non-matching right-side columns are NULL. Use INNER when you only want matched records; use LEFT when you want all records from the primary table and optional data from the secondary.
A CTE (Common Table Expression) is a named temporary result set defined with WITH name AS (...) before the main query. Use CTEs when: (1) you need to reference the same subquery multiple times in a query, (2) the logic is complex enough that nesting subqueries becomes unreadable, (3) you need recursive queries (like traversing org charts). CTEs are not materialized in most databases β they're just syntactic sugar for readability, but they make complex queries dramatically easier to understand and debug.
GROUP BY collapses many rows into one row per group β you lose the individual row details. Window functions (using OVER()) perform calculations across a set of rows related to the current row, but do not collapse the rows β you can see both the individual row and the aggregate value in the same result. Example: GROUP BY gives you 'total sales per department'. A window function gives you 'each employee's salary AND their department's average salary in the same row'.
Real-World Blueprint
"LinkedIn's 'People You May Know' feature uses self-joins on the connections table: find users A and C who both have a connection to user B, but A and C aren't directly connected yet. This is a classic multi-level self-join query. The analytics pipeline that powers LinkedIn's recruiter insights uses CTEs and window functions to calculate rolling averages, percentile rankings, and month-over-month growth β all in SQL."
Hands-on Lab Exercises
Write a query using INNER, LEFT, and RIGHT JOIN on the same two tables and compare the row counts returned by each.
Find all users who have never made a purchase using the LEFT JOIN + IS NULL pattern.
Rewrite a deeply nested subquery as a CTE and compare readability.
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) to rank employees within each department by salary.Real-World Practice Scenarios
You need a report showing every product and its total sales revenue β including products that have never been sold (show 0). Which JOIN and aggregation do you use?
A query with 5 nested subqueries is timing out and impossible to debug. How do you refactor it using CTEs?
You need to find 'second highest salary per department' β GROUP BY alone can't do this. How do you use a window function to solve it?
Your JOIN query returns more rows than expected β 3x the actual data. What likely went wrong and how do you diagnose it?
Joins & Complex Queries
TL;DR β Quick Summary
- INNER JOIN = intersection (both sides must match). LEFT JOIN = all left rows + matched right rows.
- LEFT JOIN +
WHERE right.id IS NULL= find rows in left table with NO match in right table. - CTEs (
WITH name AS) make complex multi-step queries readable and maintainable. - Window functions (
OVER()) calculate across rows without collapsing them like GROUP BY does.
Overview
π JOINs: Bringing Tables Together
Real-world data lives across multiple tables. JOINs let you combine rows from two or more tables based on a related column β usually a foreign key relationship. A well-written JOIN is one of the most powerful tools in a developer's SQL toolkit.
ποΈ Types of JOINs
- INNER JOIN: Returns only rows where the join condition matches in both tables. Most common. Think: intersection.
- LEFT JOIN (LEFT OUTER): Returns ALL rows from the left table, plus matching rows from the right. Non-matching right-side values are
NULL. Use when you need all records from one table regardless of matches. - RIGHT JOIN: Mirror of LEFT JOIN β all rows from right table. Rarely used (just swap the table order and use LEFT JOIN).
- FULL OUTER JOIN: All rows from both tables. NULLs where no match exists. PostgreSQL supports this natively; MySQL requires a UNION workaround.
- CROSS JOIN: Every row from the left table paired with every row from the right. Returns MΓN rows. Rarely intentional β an accidental CROSS JOIN can return billions of rows.
- SELF JOIN: A table joined to itself. Used for hierarchical data like org charts or finding related records in the same table.
π¦ Subqueries and CTEs
Sometimes one query isn't enough. You can nest queries inside each other:
- Subquery (inline): A query inside another query's WHERE, FROM, or SELECT clause.
- CTE (Common Table Expression): Named temporary result sets using
WITH. Far more readable than nested subqueries. - Window Functions: Perform calculations across rows related to the current row β like
ROW_NUMBER(),RANK(),LAG(),SUM() OVER.
The Venn diagram mental model for JOINs: INNER JOIN = intersection, LEFT JOIN = full left circle, FULL OUTER = full union of both circles.
Architecture & Logic Flow
βοΈ JOIN Types at a Glance
| Feature | INNER | LEFT | RIGHT | FULL OUTER | CROSS |
|---|---|---|---|---|---|
| Returns unmatched left rows | |||||
| Returns unmatched right rows | |||||
| Can produce NULLs | |||||
| Common in production | |||||
| Risk of row explosion |
Deep Dive Analysis
Imagine you have a <code>users</code> table and an <code>orders</code> table. An <strong>INNER JOIN</strong> gives you only users who have placed orders. A <strong>LEFT JOIN</strong> gives you <em>all</em> users β those with orders will have order data, and users with no orders will have <code>NULL</code> in the order columns. When would you choose LEFT over INNER? Use <strong>LEFT JOIN</strong> when the absence of a match is meaningful information β like 'show me all users AND their order count (even users with zero orders).' Use <strong>INNER JOIN</strong> when you only care about records that exist in both tables.
Implementation Reference
-- INNER JOIN: only users WHO HAVE orders
SELECT u.name, o.total
FROM users u
INNER JOIN orders o ON u.id = o.user_id;
-- LEFT JOIN: ALL users, with order info if it exists
-- Users with no orders show NULL for order columns
SELECT u.name, o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id;
-- Find users who have NEVER placed an order
SELECT u.name
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE o.id IS NULL; -- the NULL trick: no matching row = never ordered
-- SELF JOIN: find employees and their manager's name
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;-- Count orders per user, including users with zero orders
SELECT
u.name,
u.email,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.name, u.email
ORDER BY lifetime_value DESC;
-- Posts with author name AND tag list
SELECT
p.title,
u.name AS author,
GROUP_CONCAT(t.name ORDER BY t.name SEPARATOR ', ') AS tags
FROM posts p
INNER JOIN users u ON p.user_id = u.id
LEFT JOIN post_tags pt ON p.id = pt.post_id
LEFT JOIN tags t ON pt.tag_id = t.id
GROUP BY p.id, p.title, u.name;-- CTE: readable multi-step query
WITH monthly_revenue AS (
SELECT
DATE_FORMAT(created_at, '%Y-%m') AS month,
SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY month
),
revenue_with_growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue
FROM monthly_revenue
)
SELECT
month,
revenue,
ROUND((revenue - prev_month_revenue) / prev_month_revenue * 100, 2) AS growth_pct
FROM revenue_with_growth
ORDER BY month;
-- Window function: rank users by order count
SELECT
name,
order_count,
RANK() OVER (ORDER BY order_count DESC) AS rank,
NTILE(4) OVER (ORDER BY order_count DESC) AS quartile
FROM user_order_counts;Comparative Best Practices
Use LEFT JOIN to include records with no related data β INNER JOIN silently drops them
-- INNER JOIN silently excludes users
-- who have never placed an order
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.id;
-- Users with 0 orders are completely invisible
-- Your report is missing data and you won't know it-- INNER JOIN silently excludes users
-- who have never placed an order
SELECT u.name, COUNT(o.id) AS order_count
FROM users u
INNER JOIN orders o ON u.id = o.user_id
GROUP BY u.id;
-- Users with 0 orders are completely invisible
-- Your report is missing data and you won't know it-- LEFT JOIN keeps ALL users
-- Users with no orders show count = 0
SELECT
u.name,
COUNT(o.id) AS order_count,
COALESCE(SUM(o.total), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id
ORDER BY lifetime_value DESC;Refactor nested subqueries into readable CTEs
-- 4-level nesting β impossible to debug or modify
SELECT * FROM (
SELECT user_id, SUM(total) AS rev FROM (
SELECT * FROM orders
WHERE status = 'completed'
) o GROUP BY user_id
) r WHERE r.rev > (
SELECT AVG(total_rev) FROM (
SELECT user_id, SUM(total) AS total_rev
FROM orders GROUP BY user_id
) x
);-- 4-level nesting β impossible to debug or modify
SELECT * FROM (
SELECT user_id, SUM(total) AS rev FROM (
SELECT * FROM orders
WHERE status = 'completed'
) o GROUP BY user_id
) r WHERE r.rev > (
SELECT AVG(total_rev) FROM (
SELECT user_id, SUM(total) AS total_rev
FROM orders GROUP BY user_id
) x
);-- Each step is named and readable
WITH completed_orders AS (
SELECT user_id, total
FROM orders
WHERE status = 'completed'
),
user_revenue AS (
SELECT user_id, SUM(total) AS rev
FROM completed_orders
GROUP BY user_id
),
avg_revenue AS (
SELECT AVG(rev) AS avg_rev FROM user_revenue
)
SELECT ur.*
FROM user_revenue ur, avg_revenue
WHERE ur.rev > avg_revenue.avg_rev;Common Pitfalls
- β’Using INNER JOIN when LEFT JOIN is needed β losing users/products with no related records silently.
- β’Forgetting to include all non-aggregated SELECT columns in the GROUP BY clause β causes 'ambiguous column' errors or wrong results.
- β’Accidentally creating a CROSS JOIN by forgetting the ON clause β can return billions of rows and crash your database.
- β’Nesting subqueries 4-5 levels deep instead of using CTEs β creates unreadable, unmaintainable query spaghetti.
Key Takeaways
Hands-on Practice
- βWrite a query using INNER, LEFT, and RIGHT JOIN on the same two tables and compare the row counts returned by each.
- βFind all users who have never made a purchase using the LEFT JOIN + IS NULL pattern.
- βRewrite a deeply nested subquery as a CTE and compare readability.
- βUse <code>ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC)</code> to rank employees within each department by salary.
Expert Pro Tips
Interview Preparation
Q: What is the difference between INNER JOIN and LEFT JOIN?
Master Answer:
<strong>INNER JOIN</strong> returns only rows where the join condition matches in <em>both</em> tables β rows with no match are excluded entirely. <strong>LEFT JOIN</strong> returns <em>all</em> rows from the left table regardless of whether a match exists in the right table; non-matching right-side columns are <code>NULL</code>. Use INNER when you only want matched records; use LEFT when you want all records from the primary table and optional data from the secondary.
Q: What is a CTE and when would you use it over a subquery?
Master Answer:
A <strong>CTE (Common Table Expression)</strong> is a named temporary result set defined with <code>WITH name AS (...)</code> before the main query. Use CTEs when: (1) you need to reference the same subquery multiple times in a query, (2) the logic is complex enough that nesting subqueries becomes unreadable, (3) you need recursive queries (like traversing org charts). CTEs are not materialized in most databases β they're just syntactic sugar for readability, but they make complex queries dramatically easier to understand and debug.
Q: What is the difference between a window function and GROUP BY?
Master Answer:
<strong>GROUP BY</strong> collapses many rows into one row per group β you lose the individual row details. <strong>Window functions</strong> (using <code>OVER()</code>) perform calculations across a set of rows related to the current row, but <em>do not collapse</em> the rows β you can see both the individual row and the aggregate value in the same result. Example: GROUP BY gives you 'total sales per department'. A window function gives you 'each employee's salary AND their department's average salary in the same row'.
Industrial Blueprint
"LinkedIn's 'People You May Know' feature uses self-joins on the connections table: find users A and C who both have a connection to user B, but A and C aren't directly connected yet. This is a classic multi-level self-join query. The analytics pipeline that powers LinkedIn's recruiter insights uses CTEs and window functions to calculate rolling averages, percentile rankings, and month-over-month growth β all in SQL."
Simulated Scenarios
Β© 2026 DevHub Engineering β’ All Proprietary Rights Reserved
Generated on March 7, 2026 β’ Ver: 4.0.2
Document Class: Master Education
Confidential Information β’ Licensed to User