Sitemap

6 Simple Ways to Leverage Advanced SQL Techniques

7 min readApr 10, 2023

Powerful SQL techniques to take your data manipulation skills to the next level.

Press enter or click to view image in full size
Photo by Caspar Camille Rubin on Unsplash

SQL is the foundation of data analysis and transformation, but some cases demand a more intricate approach. That’s where advanced SQL techniques come in.

1. Window Functions for Complex Calculations

Window functions are like the Swiss Army knife of SQL — versatile, powerful, and always ready to help you tackle any data challenge. They allow you to perform complex calculations across rows of a result set that are related to the current row.

Running Total

SELECT 
category,
SUM(revenue) OVER (PARTITION BY category ORDER BY date) AS running_total
FROM sales_data;

In this example, the SUM function is used to calculate the total revenue for each product category. The OVER clause is used to define a window within which the function will be calculated. The PARTITION BY clause is used to group the data by product category, and the ORDER BY clause is used to sort the data by date. The result is a table that shows the running total revenue for each product category.

Moving Average

SELECT
date,
AVG(amount) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg
FROM
sales
ORDER BY
date;

Here, the AVG function is used with the OVER clause to calculate the moving average. The ORDER BY clause specifies that the data should be ordered by date, and the ROWS BETWEEN 6 PRECEDING AND CURRENT ROW clause specifies that the average should be calculated for the current row and the six preceding rows.

Top n Rows

SELECT
date,
region,
amount,
DENSE_RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS rank
FROM
sales
WHERE
rank <= 3;

Here, the DENSE_RANK function is used with the OVER clause to assign a rank to each row based on the amount of sales in descending order. The PARTITION BY clause specifies that the calculation should be done separately for each region, and the ORDER BY clause specifies that the calculation should be done in order of the amount of sales.

Note that we use DENSE_RANK rather than RANK. RANK assigns a unique number to each distinct value in a dataset while DENSE_RANK assigns consecutive numbers to each unique value, regardless of the number of times it appears in the dataset.

Percentiles

SELECT
student_id,
score,
PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY score) OVER (PARTITION BY student_id) AS percentile_90
FROM
scores;

Here, the PERCENTILE_CONT function is used with the OVER clause to calculate the 90th percentile score for each student. The PARTITION BY clause specifies that the calculation should be done separately for each student, and the ORDER BY clause specifies that the calculation should be done in order of the score. The WITHIN GROUP clause specifies that the percentile calculation should be done within the group of scores for each student.

Window functions in SQL are a powerful tool that allow for complex calculations across rows of a result set. With just a few lines of code, you can calculate running totals, percentiles, and other fancy stats that will make you the envy of all your SQL peers.

2. Window Functions for Row Comparison

Window functions are not just great for complex calculations, but also handy for comparing and grouping data, making them the perfect tool for data analysis multitaskers.

Get the Previous or Next Value

SELECT
order_date,
order_total,
LAG(order_total) OVER (ORDER BY order_date) AS previous_order_total,
order_total - LAG(order_total) OVER (ORDER BY order_date) AS order_total_diff
FROM
orders;

This query calculates the difference between the current order total and the previous order total for each order, using the LAG function and the ORDER BY clause.

Get the First or Last Value

SELECT
customer_id,
order_date,
order_total,
FIRST_VALUE(order_total) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_order_total
LAST_VALUE(order_total) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_order_total
FROM
orders;

This query gets the first and last order total for each customer, using the FIRST_VALUE and LAST_VALUE functions respectively, with the PARTITION BY and ORDER BY clauses to define the window.

Histogram Buckets

SELECT
product_id,
price,
WIDTH_BUCKET(price, 0, 100, 4) AS bucket_number
FROM
products;

This query groups the products into four histogram buckets based on their price, using the WIDTH_BUCKET function and the minimum value, maximum value, and number of buckets specified in the function’s arguments. The result set includes the product ID, price, and the assigned bucket number.

Window functions provide a versatile toolset that can help you perform row comparison and analysis with ease.

3. Common Table Expressions and Subqueries

The ongoing debate in the SQL community over common table expressions and subqueries is as hotly contested as the age-old argument of whether Die Hard is a Christmas movie or not. Whether you prefer the elegance of CTEs or the versatility of subqueries, the choice ultimately comes down to what works best for your specific data needs.

Is Die Hard a Christmas movie? Probably not. Do I still watch it every holiday season? Of course I do! You don’t have to choose between CTEs and subqueries, just when to use them to make your queries run like John McClane, efficient and unstoppable (or at least avoid getting stuck in the ventilation shaft).

CTEs

WITH revenue_by_category AS (
SELECT category, SUM(revenue) AS total_revenue
FROM sales_data
GROUP BY category
)
SELECT
category,
total_revenue,
total_revenue / (SELECT SUM(total_revenue) FROM revenue_by_category) AS revenue_share
FROM revenue_by_category

CTEs work within a contained scope and allow for more readability and reusability, which can make queries easier to understand and maintain.

Subqueries

SELECT *
FROM customers
WHERE customer_id IN (
SELECT customer_id
FROM orders
GROUP BY customer_id
ORDER BY COUNT(*) DESC
LIMIT 10
)

On the other hand, the more concise and performant nature of subqueries offers flexibility that can be used to retrieve data from multiple tables and filter results in a single query.

CTEs and subqueries are both powerful features of SQL that can help you perform complex queries. Understanding when to use each one will help you write more efficient and effective SQL queries.

4. Recursive Common Table Expressions (CTEs)

Loops? In SQL? You bet! With recursive common table expressions (CTEs), you can create loops that allow you to explore hierarchical data structures, like trees and graphs, in a way that’s both powerful and flexible. By calling a query within itself, you can break down complex data analysis tasks into smaller, more manageable parts, and iterate over them until you’ve uncovered the insights you’re looking for.

Here is an example of a recursive CTE that calculates the total cost of a product and all of its components:

WITH RECURSIVE product_cost AS (
SELECT id, cost, id AS component_id
FROM products
WHERE id = 1
UNION ALL
SELECT products.id, products.cost, product_cost.component_id
FROM products
JOIN product_components ON products.id = product_components.product_id
JOIN product_cost ON product_components.component_id = product_cost.id
)
SELECT component_id, SUM(cost) AS total_cost
FROM product_cost
GROUP BY component_id

In this example, the recursive CTE called product_cost is used to calculate the total cost of a product and all of its components. The base case is defined in the first part of the CTE, where the cost of the initial product is selected. The recursive case is defined in the second part of the CTE, where the cost of the product components is selected and joined to the previous results. The recursion continues until all components are processed. Finally, the main query calculates the total cost of each component by grouping by the component_id column.

5. Self Joins

Self joins are like looking at a mirror image of your data — you’re essentially joining a table to itself to compare rows within the same table. This is useful when you need to analyze relationships and connections between data points within a single table.

To perform a self join, you need to give the table a different alias name for each instance of the join. This allows you to differentiate between the two instances of the same table when referring to columns in your SELECT statement.

SELECT e.name AS employee_name, m.name AS manager_name
FROM employees e
JOIN employees m ON e.manager_id = m.employee_id;

In this example, we’re joining the “employees” table to itself using the “manager_id” column and joining on the “employee_id” column. The result of this query will be a table with two columns: “employee_name” and “manager_name”.

6. PIVOT

Data can be a messy affair, but with the PIVOT function in SQL, you can tidy it up like a pro. By converting your row-based data into columnar format, you can see things in a whole new light. Just be aware that not all SQL dialects support this magical transformation, so keep an eye out for any compatibility issues.

If you’re working with a SQL dialect that doesn’t support PIVOT, don’t fret! You can still achieve similar results using CASE statements.

Here’s an example of how you can pivot data using PIVOT:

SELECT *
FROM (
SELECT category, year, revenue
FROM sales
) sales
PIVOT (
SUM(revenue)
FOR year IN ([2018], [2019], [2020])
) AS sales_pivot;

And here is how you can achieve these results using CASE statements:


SELECT
category,
SUM(CASE WHEN year = '2018' THEN revenue ELSE 0 END) AS `2018`,
SUM(CASE WHEN year = '2019' THEN revenue ELSE 0 END) AS `2019`,
SUM(CASE WHEN year = '2020' THEN revenue ELSE 0 END) AS `2020`
FROM sales
GROUP BY category;

In both examples, the SQL query is selecting data from the “sales” table and grouping it by the “category” column. We then use “SUM” functions with “PIVOT” or “CASE” statements to create three new columns that contain the revenue for each category in each of the years 2018, 2019, and 2020.

Want to Learn More?

Click here to find out how to use your SQL knowledge to learn data manipulation in Python with Pandas!

--

--

Ashley Biddle
Ashley Biddle

Written by Ashley Biddle

Ashley is a highly skilled Data Arcitect with a passion for building and maintaining complex data ecosystems.

Responses (3)