Select Star SQL
Notes of what I've been learning from https://selectstarsql.com/.
The table used for practice is filled with data about death executions that happened in Texas.
- A query has several blocks, like
WHEREis a filter applied in the table given to
SELECTcan do math. If you want numbers with decimals and is getting an integer, multiply by 1.0.
- Aggregation functions (COUNT, AVG, MIN, MAX, SUM, etc.)
- Take multiple rows of data and combine them into one number.
- Most of the functions act on non-NULL values.
WHEREhappens before aggregations.
- It changes the behavior of aggregation functions. It makes the functions perform their operations on groups of rows, instead of the entire table.
HAVINGto filter after a
HAVINGcan take the results of aggregations into consideration.
- We can use nested queries (specially useful to calculate percentages).
Beazley's Last Statement
SELECT and FROM
Most important parts of a query:
Most of the time we are
SELECTing some fields
FROM a table.
So, the arguments for
SELECT can go wild, but
FROM receives only a table.
SELECT can also do math:
SELECT 50 * 2, 51 / 2 -- 100, 25 -- if you want decimal numbers, -- one of the terms must have decimals SELECT 51.0 / 2 -- 25.5
Used as a filter. Example:
-- younger than 25 SELECT * FROM executions WHERE ex_age < 25
To filter text, we can use '=', but
LIKE makes some things easier. Example:
SELECT * FROM executions WHERE first_name LIKE '%roy'
Implications of using
- the string becomes case insensitive
%as a multicharacter wildcard
_as a single character wildcard
challenge: What was the Beazley's last sentence before being executed?
SELECT last_sentence FROM executions WHERE last_name LIKE '%beazley%'
Claims of Innocence
- Aggregation functions take multiple rows of data and combine them into one number.
- Most of the functions act on non-NULL values.
Most common ones:
a strange query!
SELECT first_name, COUNT(*) FROM executions
Although it's a valid query it's probably not what you want.
As an aggregation function,
COUNT(*) is trying to return a single value, but
first_name is trying to return multiple values (one entry for each row).
It works like a big if-else statement:
CASE WHEN <clause> THEN <result> WHEN <clause> THEN <result> ... ELSE <result> END
It's useful for times when we want to go through a table multiple times in the same query. Like in this example where we want to count the executions in two different counties:
SELECT COUNT( CASE WHEN county='Harris' THEN 1 ELSE NULL END) as harris_county_counter, COUNT( CASE WHEN county='Bexar' THEN 1 ELSE NULL END) as bexar_county_counter FROM executions
COUNT() only counts non-NULL values. That's why we use
ELSE NULL in the query above.
challenge: What's the percentage of executed people who claim innocence?
To solve this we need to go through the table twice:
- count the claims of innocence
- count the amount of people who said a last sentence
Once we have these numbers, we do the math.
This needs to be solved in one query.
SELECT 100.0 * COUNT( CASE WHEN last_statement LIKE '%innocent%' THEN 1 ELSE NULL END) / COUNT( CASE WHEN last_statement IS NOT NULL THEN 1 ELSE NULL END) FROM executions
100.0 * multiplication is important to make the division with decimals.
The Long Tail
GROUP BY clause is normally used along with "aggregate" functions. These functions perform special operations on an entire table, set or group, of rows (rather than on each row) and then return one row of values for each group.
WHERE block happens before grouping aggregation. This is reflected in the order of syntax since the
WHERE block always precedes the
GROUP BY block.
Things to keep in mind:
HAVINGcan act on aggregation functions
HAVING clause was added to SQL because the
WHERE keyword cannot be used with aggregate functions.
You can think of
HAVING as a post-aggregation
In this example we are getting the list of counties in which more than 2 inmates aged >= 50:
SELECT county FROM executions WHERE ex_age >= 50 GROUP BY county HAVING COUNT(*) > 2
Who has the longest last sentence?
We want the first and last name of the inmate with the longest last statement.
To achieve this we need to know the length of the longest last statement and then query the info about the row where such a long statement is.
SELECT first_name, last_name FROM executions WHERE LENGTH(last_statement) = ( SELECT MAX(LENGTH(last_statement)) FROM executions )
Percentage of execution on each county
For this we need the amount of executions on each county, and then divide by the total amount of executions.
Here's the query:
SELECT county, 100.0 * COUNT(*) / (SELECT COUNT(*) FROM executions) AS percentage FROM executions GROUP BY county ORDER BY percentage DESC
challenge: Which counties has more executions?
The solution is in the example above. 👆