← Back to Chapters

Correlated Subqueries

? Correlated Subqueries

? Quick Overview

A correlated subquery is a subquery that refers to columns from the outer query. Unlike a standalone subquery, it is evaluated once per row of the outer query, because it depends on values from that row.

? Key Concepts

  • Dependent on outer row: The inner query references an alias or column from the outer query.
  • Row-by-row evaluation: The database executes the subquery repeatedly for each row of the outer query.
  • Use cases: comparisons to aggregated values per-group, existence checks, filtering with dynamic criteria.
  • Performance: can be slower than joins for large datasets; consider rewriting as a JOIN or using window functions when possible.

⚙️ Syntax / Theory

Basic pattern:

  • The outer query selects rows from a table (aliased).
  • The inner query uses that alias to compute a value specific to the current outer row.

Typical form:

Pattern

SELECT ... FROM outer_table o
WHERE column OPERATOR (
SELECT aggregate(...) FROM other_table t
WHERE t.fk = o.pk
);

? Code Example — Employees with salary > department average

? View Code Example
// Correlated subquery: compare each employee's 
salary to their department's average
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees x
WHERE x.department_id = e.department_id
);

Explanation: The inner query SELECT AVG(salary) FROM employees x WHERE x.department_id = e.department_id refers to e.department_id, which comes from the outer query. The DB calculates the average for the current employee's department and compares it to that employee's salary.

? Code Example — Find customers with total orders above average (correlated)

? View Code Example
// For each customer, 
check if their total order amount exceeds the average customer total
SELECT c.customer_id, c.name
FROM customers c
WHERE (
SELECT SUM(o.amount) 
FROM orders o
WHERE o.customer_id = c.customer_id
) > (
SELECT AVG(total_sum) FROM (
  SELECT SUM(amount) AS total_sum FROM orders GROUP BY customer_id
) AS t
);

This mixes a correlated subquery (left) with a standalone aggregate (right). The left side computes per-customer sum by referring to c.customer_id.

? Code Example — EXISTS correlated subquery

? View Code Example
// Use EXISTS to test related rows per outer row 
(more efficient for existence checks)
SELECT p.product_id, p.name
FROM products p
WHERE EXISTS (
SELECT 1 FROM order_items oi
WHERE oi.product_id = p.product_id
AND oi.quantity > 100
);

Explanation: The inner query returns rows only if the outer product has order items with quantity > 100. The EXISTS stops at the first match, which can be performant.

? Live Output / Example Result

Sample result for employees > dept average

Columns: employee_id | name | salary | department_id

Example rows (conceptual):

// Example output rows
101 | "Asha"   | 95000 | 10
145 | "Rahul"  | 87000 | 20
178 | "Priya"  | 92000 | 10

?️ Use Cases

  • Row-level filters that compare a row to an aggregate computed for its group.
  • Conditional existence checks for related records (orders, logs, events).
  • Per-row validation rules in migrations or cleanup scripts.

? Tips & Best Practices

  • Prefer EXISTS for boolean existence checks; it's usually faster than IN or equality comparisons in correlated contexts.
  • For aggregation per-group, consider window functions (e.g., AVG(...) OVER (PARTITION BY department_id)) to avoid repeated computation.
  • If a correlated subquery is causing performance issues, try rewriting with a JOIN or a derived table that pre-aggregates results.
  • Always check query plans for large tables; correlated subqueries often show nested loop operations which may be costly.

? Try It Yourself / Practice Tasks

  1. Create a correlated query to list departments where the highest-paid employee earns more than twice the department average.
  2. Rewrite the "employees above department average" query using a window function and compare results and performance.
  3. Use a correlated DELETE (carefully) to remove records in a staging table that match criteria in the main table.