USD ($)
$
United States Dollar
Euro Member Countries
India Rupee

Aggregations and Subqueries

Lesson 8/24 | Study Time: 23 Min

Aggregations and subqueries are powerful SQL features used to analyze and summarize data in relational databases.

In backend development, they help transform large datasets into meaningful insights, such as totals, averages, rankings, and filtered summaries.

Aggregation functions work with GROUP BY to combine rows based on shared values, while HAVING applies conditions to aggregated results.

Subqueries, including EXISTS, allow queries to depend on the results of other queries, enabling more flexible and expressive data retrieval.

GROUP BY and Aggregate Functions

GROUP BY categorizes rows based on one or more columns, pairing seamlessly with aggregate functions like COUNT, SUM, AVG, MIN, and MAX to produce summary statistics.

Think of it as grouping items in a shopping cart before tallying totals—crucial for backend reports without pulling entire tables into your app.

Aggregates compute values across each group, preventing row-by-row processing that could overwhelm your server.

Common Aggregates



Consider an orders table for an e-commerce backend:

sql
SELECT customer_id, COUNT(*) as order_count, SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id;


This groups orders by customer, revealing high-value users for targeted marketing— a staple in backend analytics endpoints.

Pro Tip: Always include non-aggregated columns in GROUP BY to comply with SQL standards (e.g., PostgreSQL, MySQL 5.7+), avoiding errors like "column must appear in GROUP BY."

HAVING: Filtering Groups

While WHERE filters individual rows before grouping, HAVING applies conditions after aggregation, sifting groups based on computed values.

It's your backend's way of saying, "Show me only the summaries that matter," like loyal customers who've spent over $1,000.

HAVING uses the same operators as WHERE but on aggregates, enabling precise post-group filtering without subqueries.


Here's the process in action:

Apply WHERE to raw rows (optional).


1. GROUP BY to form summaries.

2. Compute aggregates.

3. HAVING to filter groups.

4. Select final output.


Example: Identify top-spending customers.

sql
SELECT customer_id, AVG(total_amount) as avg_order
FROM orders
WHERE order_date >= '2025-01-01'
GROUP BY customer_id
HAVING AVG(total_amount) > 50
ORDER BY avg_order DESC;

This delivers backend-ready data for VIP segmentation, filtering out low-value groups efficiently.

Subqueries with EXISTS and IN

Subqueries nest queries inside others, powering complex logic like "find customers with orders in both categories." EXISTS checks for row existence (faster for large datasets), while IN matches values—key for correlated backend checks without joins.

Use subqueries when joins feel forced; they're readable for validation endpoints.

EXISTS vs. IN comparison:


Practical Example: Flag active customers (with recent orders).

sql
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2025-06-01'
);

This correlated subquery runs per outer row, ideal for backend user status APIs—EXISTS shines as it short-circuits on matches.


For non-correlated

sql
SELECT * FROM products
WHERE category_id IN (
SELECT category_id FROM categories WHERE is_active = true
);

Window Functions: ROW_NUMBER and RANK

Window functions perform calculations across row sets without collapsing groups like GROUP BY, defined by an OVER clause with partitions and ordering. ROW_NUMBER assigns unique sequential numbers; RANK handles ties by skipping numbers—perfect for backend leaderboards or pagination.

They revolutionize analytics, letting you rank per group while retaining all rows.


Key Variants


1. ROW_NUMBER(): Unique position (1,2,3...).

2. RANK(): Ties share rank, gaps follow (1,2,2,4).

3. DENSE_RANK(): Ties share, no gaps (1,2,2,3).


Example: Rank customers by spending within regions.

sql
SELECT
customer_id, region, total_spent,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_spent DESC) as row_num,
RANK() OVER (PARTITION BY region ORDER BY total_spent DESC) as rank_num
FROM (
SELECT customer_id, region, SUM(total_amount) as total_spent
FROM orders GROUP BY customer_id, region
) AS customer_totals;

This generates backend-friendly top-N lists: ROW_NUMBER for unique pagination, RANK for tied contests.

Window frame syntax: OVER (PARTITION BY col ORDER BY col ROWS BETWEEN ...) for running totals, but start with basics for most APIs.

Combining Techniques for Backend Power

Real backends blend these: Use GROUP BY + HAVING inside subqueries feeding EXISTS, topped with window functions.

sql
SELECT c.name, r.rank_num
FROM customers c
JOIN (
SELECT customer_id,
RANK() OVER (ORDER BY total_spent DESC) as rank_num
FROM (
SELECT customer_id, SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(total_amount) > 1000
) high_spenders
) r ON c.customer_id = r.customer_id
WHERE r.rank_num <= 10;

This fetches top 10 high-spenders—optimized for your FastAPI response serializer.


Best Practices:


Sales Campaign

Sales Campaign

We have a sales campaign on our promoted courses and products. You can purchase 1 products at a discounted price up to 15% discount.