Aggregations and Subqueries

Lesson 8/24 | Study Time: 23 Min

Course: Database Fundamentals for Backend Development

Aggregations and subqueries are powerful SQL features used to analyze and summarize data in relational databases.

In backend development, they help transform large datasets into meaningful insights, such as totals, averages, rankings, and filtered summaries.

Aggregation functions work with GROUP BY to combine rows based on shared values, while HAVING applies conditions to aggregated results.

Subqueries, including EXISTS, allow queries to depend on the results of other queries, enabling more flexible and expressive data retrieval.

GROUP BY and Aggregate Functions

GROUP BY categorizes rows based on one or more columns, pairing seamlessly with aggregate functions like COUNT, SUM, AVG, MIN, and MAX to produce summary statistics.

Think of it as grouping items in a shopping cart before tallying totals—crucial for backend reports without pulling entire tables into your app.

Aggregates compute values across each group, preventing row-by-row processing that could overwhelm your server.

Common Aggregates

Consider an orders table for an e-commerce backend:

sql

SELECT customer_id, COUNT(*) as order_count, SUM(total_amount) as total_spent

FROM orders

GROUP BY customer_id;

This groups orders by customer, revealing high-value users for targeted marketing— a staple in backend analytics endpoints.

Pro Tip: Always include non-aggregated columns in GROUP BY to comply with SQL standards (e.g., PostgreSQL, MySQL 5.7+), avoiding errors like "column must appear in GROUP BY."

HAVING: Filtering Groups

While WHERE filters individual rows before grouping, HAVING applies conditions after aggregation, sifting groups based on computed values.

It's your backend's way of saying, "Show me only the summaries that matter," like loyal customers who've spent over $1,000.

HAVING uses the same operators as WHERE but on aggregates, enabling precise post-group filtering without subqueries.

Here's the process in action:

Apply WHERE to raw rows (optional).

1. GROUP BY to form summaries.

2. Compute aggregates.

3. HAVING to filter groups.

4. Select final output.

Example: Identify top-spending customers.

sql

SELECT customer_id, AVG(total_amount) as avg_order

FROM orders

WHERE order_date >= '2025-01-01'

GROUP BY customer_id

HAVING AVG(total_amount) > 50

ORDER BY avg_order DESC;

This delivers backend-ready data for VIP segmentation, filtering out low-value groups efficiently.

Subqueries with EXISTS and IN

Subqueries nest queries inside others, powering complex logic like "find customers with orders in both categories." EXISTS checks for row existence (faster for large datasets), while IN matches values—key for correlated backend checks without joins.

Use subqueries when joins feel forced; they're readable for validation endpoints.

EXISTS vs. IN comparison:

Practical Example: Flag active customers (with recent orders).

sql

SELECT c.customer_id, c.name

FROM customers c

WHERE EXISTS (

    SELECT 1 FROM orders o

    WHERE o.customer_id = c.customer_id

    AND o.order_date >= '2025-06-01'

);

This correlated subquery runs per outer row, ideal for backend user status APIs—EXISTS shines as it short-circuits on matches.

For non-correlated

sql

SELECT * FROM products

WHERE category_id IN (

    SELECT category_id FROM categories WHERE is_active = true

);

Window Functions: ROW_NUMBER and RANK

Window functions perform calculations across row sets without collapsing groups like GROUP BY, defined by an OVER clause with partitions and ordering. ROW_NUMBER assigns unique sequential numbers; RANK handles ties by skipping numbers—perfect for backend leaderboards or pagination.

They revolutionize analytics, letting you rank per group while retaining all rows.

Key Variants

1. ROW_NUMBER(): Unique position (1,2,3...).

2. RANK(): Ties share rank, gaps follow (1,2,2,4).

3. DENSE_RANK(): Ties share, no gaps (1,2,2,3).

Example: Rank customers by spending within regions.

sql

SELECT 

    customer_id, region, total_spent,

    ROW_NUMBER() OVER (PARTITION BY region ORDER BY total_spent DESC) as row_num,

    RANK() OVER (PARTITION BY region ORDER BY total_spent DESC) as rank_num

FROM (

    SELECT customer_id, region, SUM(total_amount) as total_spent

    FROM orders GROUP BY customer_id, region

) AS customer_totals;

This generates backend-friendly top-N lists: ROW_NUMBER for unique pagination, RANK for tied contests.

Window frame syntax: OVER (PARTITION BY col ORDER BY col ROWS BETWEEN ...) for running totals, but start with basics for most APIs.

Combining Techniques for Backend Power

Real backends blend these: Use GROUP BY + HAVING inside subqueries feeding EXISTS, topped with window functions.

sql

SELECT c.name, r.rank_num

FROM customers c

JOIN (

    SELECT customer_id,

           RANK() OVER (ORDER BY total_spent DESC) as rank_num

    FROM (

        SELECT customer_id, SUM(total_amount) as total_spent

        FROM orders

        GROUP BY customer_id

        HAVING SUM(total_amount) > 1000

    ) high_spenders

) r ON c.customer_id = r.customer_id

WHERE r.rank_num <= 10;

This fetches top 10 high-spenders—optimized for your FastAPI response serializer.

Best Practices:

Previous Lesson Next Lesson

Troy Lane

Product Designer

Profile

Class Sessions

1- Introduction to Databases 2- Relational Model 3- Normalization Principles 4- Data Definition 5- Data Manipulation 6- Query Optimization 7- Joins: INNER, LEFT, RIGHT, FULL, and Self-Joins for Complex Data Retrieval 8- Aggregations and Subqueries 9- Stored Procedures, Functions, and Triggers for Reusable Backend Logic 10- ACID Transactions 11- Concurrency Issues 12- Locking Mechanisms and Deadlock Prevention 13- ER Diagrams and Schema Design 14- Indexing Strategies 15- Partitioning and Sharding Basics for Handling Large-Scale Backend Data 16- Connection Management 17- Query Building 18- Performance Tuning 19- Key-value and Document Stores 20- Column-Family and Graph Databases 21- Hybrid Approaches 22- Authentication and Authorization 23- SQL Injection Prevention, Encryption, and Auditing 24- Backup, Recovery, and Monitoring

Aggregations and Subqueries

Troy Lane

Class Sessions

Sales Campaign