? Correlated Subqueries

? Quick Overview

A correlated subquery is a subquery that refers to columns from the outer query. Unlike a standalone subquery, it is evaluated once per row of the outer query, because it depends on values from that row.

? Key Concepts

Dependent on outer row: The inner query references an alias or column from the outer query.
Row-by-row evaluation: The database executes the subquery repeatedly for each row of the outer query.
Use cases: comparisons to aggregated values per-group, existence checks, filtering with dynamic criteria.
Performance: can be slower than joins for large datasets; consider rewriting as a JOIN or using window functions when possible.

⚙️ Syntax / Theory

Basic pattern:

The outer query selects rows from a table (aliased).
The inner query uses that alias to compute a value specific to the current outer row.

Typical form:

Pattern

SELECT ... FROM outer_table o
WHERE column OPERATOR (
SELECT aggregate(...) FROM other_table t
WHERE t.fk = o.pk
);

? Code Example — Employees with salary > department average

? View Code Example

// Correlated subquery: compare each employee's 
salary to their department's average
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees x
WHERE x.department_id = e.department_id
);

Explanation: The inner query SELECT AVG(salary) FROM employees x WHERE x.department_id = e.department_id refers to e.department_id, which comes from the outer query. The DB calculates the average for the current employee's department and compares it to that employee's salary.

? Code Example — Find customers with total orders above average (correlated)

? View Code Example

// For each customer, 
check if their total order amount exceeds the average customer total
SELECT c.customer_id, c.name
FROM customers c
WHERE (
SELECT SUM(o.amount) 
FROM orders o
WHERE o.customer_id = c.customer_id
) > (
SELECT AVG(total_sum) FROM (
  SELECT SUM(amount) AS total_sum FROM orders GROUP BY customer_id
) AS t
);

This mixes a correlated subquery (left) with a standalone aggregate (right). The left side computes per-customer sum by referring to c.customer_id.

? Code Example — EXISTS correlated subquery

? View Code Example

// Use EXISTS to test related rows per outer row 
(more efficient for existence checks)
SELECT p.product_id, p.name
FROM products p
WHERE EXISTS (
SELECT 1 FROM order_items oi
WHERE oi.product_id = p.product_id
AND oi.quantity > 100
);

Explanation: The inner query returns rows only if the outer product has order items with quantity > 100. The EXISTS stops at the first match, which can be performant.

? Live Output / Example Result

Sample result for employees > dept average

Columns: employee_id | name | salary | department_id

Example rows (conceptual):

// Example output rows
101 | "Asha"   | 95000 | 10
145 | "Rahul"  | 87000 | 20
178 | "Priya"  | 92000 | 10

?️ Use Cases

Row-level filters that compare a row to an aggregate computed for its group.
Conditional existence checks for related records (orders, logs, events).
Per-row validation rules in migrations or cleanup scripts.

? Tips & Best Practices

Prefer EXISTS for boolean existence checks; it's usually faster than IN or equality comparisons in correlated contexts.
For aggregation per-group, consider window functions (e.g., AVG(...) OVER (PARTITION BY department_id)) to avoid repeated computation.
If a correlated subquery is causing performance issues, try rewriting with a JOIN or a derived table that pre-aggregates results.
Always check query plans for large tables; correlated subqueries often show nested loop operations which may be costly.

? Try It Yourself / Practice Tasks

Create a correlated query to list departments where the highest-paid employee earns more than twice the department average.
Rewrite the "employees above department average" query using a window function and compare results and performance.
Use a correlated DELETE (carefully) to remove records in a staging table that match criteria in the main table.