SQL Case: The Swiss Army Knife
Ever felt like you needed a Swiss Army knife for your SQL adventures? Well, let me introduce you to the CASE
expression. This multifaceted tool has been traditionally used in the SELECT
expression mostly, but its true potential and versatility can be unlocked beyond that. In this blog, we’re re-discovering the quirky ways you can fully utilize the CASE
expression in SQL right from the Beginner and to the Pro levels.
Getting Started
We will be using MySQL throughout this tutorial, so before we deep dive further let’s create some sample dataset in MySQL to run our queries against.
-- Create and insert into sales table
CREATE TABLE IF NOT EXISTS sales (
id INT,
product VARCHAR(20),
color VARCHAR(10),
status_id INT,
units INT
);
INSERT INTO sales VALUES
(1, 'iPhone 15 Pro Max', 'Black', 1, 10),
(2, 'iPhone 15 Pro', 'Black', 1, 20),
(3, 'iPhone 15', 'Gray', 2, 15),
(4, 'MacBook Air 13', 'Black', 3, 25),
(5, 'MacBook Air 15', 'Gray', 4, 5),
(6, 'MacBook Air 13', 'Black', NULL, 15);
-- Create and insert into names table
CREATE TABLE IF NOT EXISTS names (
fname VARCHAR(20),
lname VARCHAR(20)
);
INSERT INTO names VALUES
('John','Smith'),
('James','Lee'),
(NULL,'Sam Miller');
-- Create and insert into ages table
CREATE TABLE IF NOT EXISTS ages (
fname VARCHAR(20),
lname VARCHAR(20),
age INT
);
INSERT INTO ages VALUES
('John','Smith',30),
('James','Lee',35),
('Sam','Miller',40);
Here is a sample of how our table will look like.
sales
id | product | color | status_id | units |
---|---|---|---|---|
1 | iPhone 15 Pro Max | Black | 1 | 10 |
2 | iPhone 15 Pro | Black | 1 | 20 |
3 | iPhone 15 | Gray | 2 | 15 |
4 | MacBook Air 13 | Black | 3 | 25 |
5 | MacBook Air 15 | Gray | 4 | 5 |
6 | MacBook Air 13 | Black | NULL | 15 |
CASE Syntax
If you are familiar with the IF
function in Excel or the for
loop in Python, then you already know how the CASE
expression works. To keep it simple, the CASE
expression evaluates column values based on specified conditions and outputs a specific result. If the conditions do not match any column value, a default result can be specified.
In the following query, we are evaluating the values in the col
column with three conditions. The CASE expression will,
- Evaluate the conditions starting from the top and output the corresponding result if the condition is true in the
result_col
. - If there is no true condition, it evaluates to the
ELSE
part. - If there is no true condition and no
ELSE
part written, it outputs the result asNULL
.
SELECT
col,
CASE
WHEN col = value1 THEN result1
WHEN col = value2 THEN result2
WHEN col = value3 THEN result3
ELSE result4
END result_col
FROM tab;
We are not restricted to evaluating just one condition on one column at a time, but can run as many conditions on as many columns to generate a resulting output using a combination of logical operators.
SELECT
col1,
col2,
CASE
WHEN col1 = value1 THEN result1
WHEN col2 = value2 THEN result2
WHEN col1 = value3 AND col2 = value4 THEN result3
WHEN col1 = value5 OR col2 = value6 THEN result4
ELSE result5
END
FROM tab;
The
CASE
expression evaluates to the first true condition for a row. Once the first true condition has been met, it will proceed to the next row and will skip evaluating the rest of the conditions. Hence, it is quite important to arrange the conditions in the right order from the top.CASE in SELECT
Traditionally, we all have used CASE
expression in the SELECT
statement. So, let’s run some queries against our sample dataset to warm-up with it. In the following exercise, we are writing two CASE
expressions to,
- Group the Products into higher-level Categories.
- Translate the Status IDs to more reader-friendly Status descriptions.
SELECT
id,
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END category,
product,
color,
status_id,
CASE
WHEN status_id = 1 THEN 'Pending'
WHEN status_id = 2 THEN 'Processing'
WHEN status_id = 3 THEN 'Rejected'
WHEN status_id = 4 THEN 'Completed'
ELSE 'Error'
END status_desc,
units
FROM sales;
id | category | product | color | status_id | status_desc | units |
---|---|---|---|---|---|---|
1 | Phone | iPhone 15 Pro Max | Black | 1 | Pending | 10 |
2 | Phone | iPhone 15 Pro | Black | 1 | Pending | 20 |
3 | Phone | iPhone 15 | Gray | 2 | Processing | 15 |
4 | Laptop | MacBook Air 13 | Black | 3 | Rejected | 25 |
5 | Laptop | MacBook Air 15 | Gray | 4 | Completed | 5 |
6 | Laptop | MacBook Air 13 | Black | NULL | Error | 15 |
With that, we have improved our data with the category
and the status_desc
columns and made it more reader-friendly, as compared to the way it was previously.
Nested CASE
If you thought we are limited to only single-dimension of the CASE
expression, then think again. We have the freedom to go into deeper levels with the CASE
expression by simply using NESTING
, aka, CASE
within a CASE
.
In the following example, we are creating two status_desc
columns with the same results, but with the following differences
status_desc_1
: Using aCASE
expression to evaluate multiple conditions.status_desc_2
: Using a NestedCASE
expression to evaluate multiple conditions.
SELECT
id,
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END category,
product,
color,
status_id,
-- CASE with multiple conditionals
CASE
WHEN status_id = 1 AND units > 15 THEN 'Pending - Bulk Order'
WHEN status_id = 1 AND units < 15 THEN 'Pending - Small Order'
WHEN status_id = 1 THEN 'Pending'
WHEN status_id = 2 THEN 'Processing'
WHEN status_id = 3 THEN 'Rejected'
WHEN status_id = 4 THEN 'Completed'
ELSE 'Error'
END status_desc_1,
-- Nested CASE
CASE
WHEN status_id = 1 THEN (
CASE
WHEN units > 15 THEN 'Pending - Bulk Order'
WHEN units < 15 THEN 'Pending - Small Order'
ELSE 'Pending'
END
)
WHEN status_id = 2 THEN 'Processing'
WHEN status_id = 3 THEN 'Rejected'
WHEN status_id = 4 THEN 'Completed'
ELSE 'Error'
END status_desc_2,
units
FROM sales;
id | category | product | color | status_id | status_desc_1 | status_desc_2 | units |
---|---|---|---|---|---|---|---|
1 | Phone | iPhone 15 Pro Max | Black | 1 | Pending - Small Order | Pending - Small Order | 10 |
2 | Phone | iPhone 15 Pro | Black | 1 | Pending - Bulk Order | Pending - Bulk Order | 20 |
3 | Phone | iPhone 15 | Gray | 2 | Processing | Processing | 15 |
4 | Laptop | MacBook Air 13 | Black | 3 | Rejected | Rejected | 25 |
5 | Laptop | MacBook Air 15 | Gray | 4 | Completed | Completed | 5 |
6 | Laptop | MacBook Air 13 | Black | NULL | Error | Error | 15 |
CASE
expressions, the decision about which one to use depends upon various factors like code readability, complexity, etc.CASE as an alternative
In the following query, we are converting the numerical price column to a more reader-friendly format with commas. This can be easily achieved using the in-built FORMAT()
function in MySQL. But, if we have to achieve the same result without the use of a function, we can use a CASE
expression to achieve the same result.
WITH data AS (
SELECT 1000 AS amount UNION
SELECT 10000 AS amount UNION
SELECT 100000 AS amount UNION
SELECT 1000000 AS amount UNION
SELECT 10000000 AS amount UNION
SELECT 100000000 AS amount
)
SELECT
amount,
CONCAT('$', FORMAT(amount, 0)) formatted_amount_1,
CASE
WHEN LENGTH(amount) <= 3 THEN CONCAT('$', CONVERT(amount,CHAR(10)))
WHEN LENGTH(amount) > 3 AND LENGTH(amount) <= 6
THEN CONCAT('$', SUBSTRING(amount, 1, LENGTH(amount) - 3), ',', SUBSTRING(amount, -3))
WHEN LENGTH(amount) > 6 AND LENGTH(amount) <= 9
THEN CONCAT('$', SUBSTRING(amount, 1, LENGTH(amount) - 6), ',', SUBSTRING(amount, LENGTH(amount) - 5, 3), ',', SUBSTRING(amount, -3))
END AS formatted_amount_2
FROM data;
amount | formatted_amount_1 | formatted_amount_2 |
---|---|---|
1000 | $1,000 | $1,000 |
10000 | $10,000 | $10,000 |
100000 | $100,000 | $100,000 |
1000000 | $1,000,000 | $1,000,000 |
10000000 | $10,000,000 | $10,000,000 |
100000000 | $100,000,000 | $100,000,000 |
CASE in JOIN
Now that you have had a good hands-on on the basic CASE
expression in the SELECT
statement, let’s jump into some intermediate difficulty territory. Let’s consider the following two tables that we created earlier. We are assuming that the data has not been normalized properly in the names
table, and we have to join them together to get all the records in the ages
table.
names
fname | lname |
---|---|
John | Smith |
James | Lee |
NULL | Sam Miller |
ages
fname | lname | age |
---|---|---|
John | Smith | 30 |
James | Lee | 35 |
Sam | Miller | 40 |
In our case the following simple JOIN
condition will not be able to join all the values as a result of the misplaced data entry in the third row of the names
table.
SELECT ages.*
FROM ages
INNER JOIN names
ON ages.fname = names.fname
AND ages.lname = names.lname;
fname | lname | age |
---|---|---|
John | Smith | 30 |
James | Lee | 35 |
Well, how about using a CASE
expression to save the day? The CASE
expression gives us the flexibility to go around the erroneous record and make the JOIN
work again.
SELECT ages.*
FROM ages
INNER JOIN names
ON CONCAT_WS(' ', ages.fname, ages.lname)
= (CASE
WHEN names.fname IS NULL THEN names.lname
ELSE CONCAT_WS(' ',names.fname, names.lname)
END);
fname | lname | age |
---|---|---|
John | Smith | 30 |
James | Lee | 35 |
Sam | Miller | 40 |
CASE in WHERE
In our previous example of using the SELECT
statement, we explicitly created a status_desc
column to convert the status_id
into a reader-friendly status description. If we want to filter that data on a specific status description, we can easily use the status_desc
column to do that.
But, what if we want to avoid creating a column to just filter based on a status description? In that case, we can use the CASE
statement in the WHERE
clause to filter accordingly.
SELECT
id, product, color, status_id, units
FROM sales
WHERE
CASE
WHEN status_id = 1 THEN 'Pending'
WHEN status_id = 2 THEN 'Processing'
WHEN status_id = 3 THEN 'Rejected'
WHEN status_id = 4 THEN 'Completed'
END = 'Pending';
id | product | color | units | status_id |
---|---|---|---|---|
1 | iPhone 15 Pro Max | Black | 10 | 1 |
2 | iPhone 15 Pro | Black | 20 | 1 |
The above example shows an example which might not necessarily require a CASE
expression, but when it comes to complex filtering a CASE
expression will be the better option.
In the following example, we are evaluating a bunch of conditions together. We can use multiple conditions as shown below, but we are definitely making it hard for the reader to understand what we are trying to achieve.
SELECT product, color
FROM sales
WHERE NOT (
product IN ('iPhone 15', 'iPhone 15 Pro') AND
color = 'Black' AND
(status_id IS NULL OR status_id != 2)
);
product | color |
---|---|
iPhone 15 Pro Max | Black |
iPhone 15 | Gray |
MacBook Air 13 | Black |
MacBook Air 15 | Gray |
MacBook Air 13 | Black |
However, using a CASE
expression to substitute the above filtering is better in this case, as we are make the query reader-friendly while achieving the same result as above.
SELECT product, color
FROM sales
WHERE
CASE
WHEN product IN ('iPhone 15','iPhone 15 Pro') AND color = 'Black' AND status_id != 2 THEN 'Exclude'
ELSE 'Include'
END = 'Include'
;
product | color |
---|---|
iPhone 15 Pro Max | Black |
iPhone 15 | Gray |
MacBook Air 13 | Black |
MacBook Air 15 | Gray |
MacBook Air 13 | Black |
CASE in ORDER BY
Traditionally, when it comes to ordering the columns using the ORDER BY
clause, we have to rely on the columns and their values to order them in the ascending or descending order. For e.g., the sales
table can be sorted as per two types of columns,
- the
VARCHAR
columns to sort the values as per ASCII value. - the
INT
columns to sort the values by the numerical value.
id | product | color | status_id | units |
---|---|---|---|---|
1 | iPhone 15 Pro Max | Black | 1 | 10 |
2 | iPhone 15 Pro | Black | 1 | 20 |
3 | iPhone 15 | Gray | 2 | 15 |
4 | MacBook Air 13 | Black | 3 | 25 |
5 | MacBook Air 15 | Gray | 4 | 5 |
6 | MacBook Air 13 | Black | NULL | 15 |
But, what if we want a custom ordering of the above data, but by using the Dollar Amount of the product? In the absence of amount
column in the table, we can easily resort to using the CASE
statement to arrange the data manually by giving a numerical value to the product
column, that does not need to show up as a separate column.
In the above dataset, we already know that MacBook Air 15 will be the most expensive product and the iPhone 15 the least expensive, and the rest of the products can be sorted manually based on the product
name itself. So, let’s arrange that manually using a CASE
expression.
SELECT *
FROM sales
ORDER BY
(CASE
WHEN product = 'MacBook Air 15' THEN 1
WHEN product = 'MacBook Air 13' THEN 2
WHEN product = 'iPhone 15 Pro Max' THEN 3
WHEN product = 'iPhone 15 Pro' THEN 4
WHEN product = 'iPhone 15' THEN 5
END)
;
id | product | color | status_id | units |
---|---|---|---|---|
5 | MacBook Air 15 | Gray | 4 | 5 |
4 | MacBook Air 13 | Black | 3 | 25 |
6 | MacBook Air 13 | Black | NULL | 15 |
1 | iPhone 15 Pro Max | Black | 1 | 10 |
2 | iPhone 15 Pro | Black | 1 | 20 |
3 | iPhone 15 | Gray | 2 | 15 |
CASE in GROUP BY
Now, let’s switch gears to using CASE
in an aggregation. In the following example, we want to establish the total units sold at the product category
level. We can accomplish this using a CASE
expression as follows,
SELECT
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END category,
SUM(units) total_units
FROM sales
GROUP BY product;
category | total_units |
---|---|
Phone | 10 |
Phone | 20 |
Phone | 15 |
Laptop | 40 |
Laptop | 5 |
But, if we look at the above result, it seems that although the CASE
expression we implemented is correct the data has not been grouped at a higher product category level as expected. This is due to the data being aggregated at the lower product
level, instead of the newly created higher category
level.
To rectify this issue, we need to make sure the CASE
expression defined in the SELECT
statement is also explicitly mentioned in the GROUP BY
clause to group the data at the higher category
level.
SELECT
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END category,
SUM(units) total_units
FROM sales
GROUP BY
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END;
category | total_units |
---|---|
Phone | 45 |
Laptop | 45 |
CASE to Pivot
MySQL does not natively support the PIVOT
function which are offered by Oracle SQL, Microsoft T-SQL, etc. To overcome this, we can recreate the same using the CASE
expression.
In the following example, we are using the status_id
dimension column and creating separate columns out of its members (pending
, processing
, rejected
, etc.) while also grouping them at a higher product category
level to get a total count of the various order status_id
the product category
falls under.
SELECT
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END category,
SUM(CASE WHEN status_id = 1 THEN 1 ELSE 0 END) AS pending,
SUM(CASE WHEN status_id = 2 THEN 1 ELSE 0 END) AS processing,
SUM(CASE WHEN status_id = 3 THEN 1 ELSE 0 END) AS rejected,
SUM(CASE WHEN status_id = 4 THEN 1 ELSE 0 END) AS completed,
SUM(CASE WHEN status_id IS NULL THEN 1 ELSE 0 END) AS error
FROM sales
GROUP BY
CASE
WHEN product LIKE 'iPhone%' THEN 'Phone'
WHEN product LIKE 'MacBook%' THEN 'Laptop'
END;
category | pending | processing | rejected | completed | error |
---|---|---|---|---|---|
Phone | 2 | 1 | 0 | 0 | 0 |
Laptop | 0 | 0 | 1 | 1 | 1 |
CASE in DML
So far we have looked at CASE
as part of the DQL (Data Query Language), but let’s expand our horizon to the DML (Data Manipulation Language) as well.
CASE in UPDATE
In the CASE in SELECT section above, we used a CASE
expression to convert the status_id
column to reader-friendly status_desc
column. Instead of writing a CASE
expression every time we query the data, let’s create a column in the sales
table and update it with the status_desc
info. This can be achieved using the following,
-- Add column
ALTER TABLE sales
ADD status_desc VARCHAR(20);
-- Update the status_desc column
UPDATE sales
SET status_desc =
(CASE
WHEN status_id = 1 THEN 'Pending'
WHEN status_id = 2 THEN 'Processing'
WHEN status_id = 3 THEN 'Rejected'
WHEN status_id = 4 THEN 'Completed'
ELSE 'Error'
END)
;
id | product | color | status_id | units | status_desc |
---|---|---|---|---|---|
1 | iPhone 15 Pro Max | Black | 1 | 10 | Pending |
2 | iPhone 15 Pro | Black | 1 | 20 | Pending |
3 | iPhone 15 | Gray | 2 | 15 | Processing |
4 | MacBook Air 13 | Black | 3 | 25 | Rejected |
5 | MacBook Air 15 | Gray | 4 | 5 | Completed |
6 | MacBook Air 13 | Black | NULL | 15 | Error |
CASE in DELETE
Similarly to updating the table, we can combine the DELETE
and CASE
expression to conditionally evaluate and delete specific rows from the table as follows,
DELETE FROM sales
WHERE TRUE =
(CASE
WHEN color = 'Black' AND product NOT LIKE 'iPhone%' THEN TRUE
ELSE FALSE
END)
;
CASE Exercises
Let’s test out our newfound knowledge of the CASE
expression and solve the following two riddles:
Q: What percentage of all tournament games did Italy win?
country | result |
---|---|
Germany | Win |
Spain | Loss |
Italy | Win |
Germany | Loss |
Italy | Loss |
Expected result:
italy_win_pct |
---|
0.2000 |
Reveal Answer
WITH data AS (
SELECT 'Germany' country, 'Win' result UNION ALL
SELECT 'Spain', 'Loss' UNION ALL
SELECT 'Italy', 'Win' UNION ALL
SELECT 'Germany', 'Loss' UNION ALL
SELECT 'Italy', 'Loss'
)
SELECT
SUM(CASE WHEN country = 'Italy' AND result = 'Win' THEN 1 END)/ COUNT(1) italy_win_pct1,
AVG(CASE WHEN country = 'Italy' AND result = 'Win' THEN 1 ELSE 0 END) italy_win_pct2
FROM data;
italy_win_pct1 | italy_win_pct2 |
---|---|
0.2000 | 0.2000 |
With the above query we have achieved the same result, but with 2 different approaches as shown in the columns,
italy_win_pct1
: UsingCASE
expression with a standard numerator/denominator approach.italy_win_pct1
: UsingCASE
expression but with a clever way of using theAVERAGE
aggregation.
Q: What is the percentage of all rows where the old_date
and new_date
values match?
id | old_date | new_date |
---|---|---|
1 | 2023-01-01 | 2023-01-01 |
2 | 2023-01-01 | 2023-01-02 |
3 | 2023-01-01 | 2023-01-01 |
Expected result:
date_match_pct |
---|
0.6667 |
Reveal Answer
WITH dates AS (
SELECT 1 id,'2023-01-01' old_date,'2023-01-01' new_date UNION
SELECT 2,'2023-01-01','2023-01-02' UNION
SELECT 3,'2023-01-01','2023-01-02'
)
SELECT
SUM(CASE WHEN old_date <> new_date THEN 1 END) / COUNT(1) date_match_pct1,
AVG(CASE WHEN old_date <> new_date THEN 1 ELSE 0 END) date_match_pct2,
AVG(CASE WHEN old_date <> new_date THEN 1 END) date_match_pct3
FROM dates;
date_match_pct1 | date_match_pct2 | date_match_pct3 |
---|---|---|
0.6667 | 0.6667 | 1.0000 |
The first two columns are the correct results using 2 different approaches. The third column gives the wrong result because we excluded the crucial ELSE
portion from the CASE
expression. Always make sure you are including the ELSE portion as well, in this case.
Summary
If you have gone through all the above examples and have been to grasp the magnificent and multipurpose CASE
expression firmly, I will grant you the coveted status of the CASE
Ninja 🥷 and wish you luck on your future SQL adventures.