Ok if you've been following along with my SQL tutorials you should
now have a pretty good grasp of how to make queries and get the data
you need. However, the data is still the data. You can use aggregate
functions to sum or count things, but so far you've been limited in
actually creating new data itself with your
But what if you wanted to create a new group to denote say what
line a player would play on on an NHL team? The way to do that is
CASE SQL statement. If you are familiar with other
programming languages, then a
CASE clause is similar to
IF/THEN statement or
those that may be more familiar with R or Python.
CASE allows you to set conditions on your data and
if the data meets those conditions then it assigns the data a label.
Reading that may not make any sense but once you see it in action
everything will hopefully be a little clearer. So like the example
above lets try and determine the lines of a team from our
playerstats database. If you don't have that setup I
suggest visting my Postgres Setup
tutorial and also the SQL Basics Part 1
and Part 2.
Ok let's look at the query we are going to use to determine that:
SELECT *, CASE WHEN toi_per_gp < 11 THEN 4 WHEN toi_per_gp between 11 and 14 THEN 3 WHEN toi_per_gp between 14 and 17 THEN 2 ELSE 1 END AS line FROM (SELECT player, team, sum(toi) as toi, sum(g) as goals, sum(a1 + a2) as assists, count(player) as GP, trunc((sum(toi)/sum(gp)))::numeric, 2) as toi_per_gp FROM playerstats where season = '20172018' GROUP BY player, team) as stats WHERE team = 'CAR' ORDER BY line DESC;
This query will then return results like this:
player | team | toi | goals | assists | gp | toi_per_gp | line ---------------------+------+---------+-------+---------+----+------------+------ MARTIN.NECAS | CAR | 6.9 | 0 | 0 | 1 | 6.90 | 4 TREVOR.CARRICK | CAR | 7.53333 | 0 | 0 | 1 | 7.53 | 4 LUCAS.WALLMARK | CAR | 78.7333 | 1 | 0 | 9 | 8.74 | 4 PHILLIP.DI GIUSEPPE | CAR | 490.35 | 4 | 8 | 47 | 10.43 | 4 JOSH.JOORIS | CAR | 325.817 | 3 | 3 | 31 | 10.51 | 4 JOAKIM.NORDSTROM | CAR | 792.883 | 2 | 5 | 75 | 10.57 | 4 NICOLAS.ROY | CAR | 10.7667 | 0 | 0 | 1 | 10.76 | 4 MARCUS.KRUGER | CAR | 520.183 | 1 | 5 | 48 | 10.83 | 4 JANNE.KUOKKANEN | CAR | 44.25 | 0 | 0 | 4 | 11.06 | 4 WARREN.FOEGELE | CAR | 23.85 | 2 | 1 | 2 | 11.92 | 4 LEE.STEMPNIAK | CAR | 432.117 | 3 | 6 | 35 | 12.34 | 3 ROLAND.MCKEOWN | CAR | 134.1 | 0 | 3 | 10 | 13.40 | 3 VALENTIN.ZYKOV | CAR | 137.067 | 3 | 4 | 10 | 13.70 | 3 BROCK.MCGINN | CAR | 1126.08 | 16 | 14 | 78 | 14.43 | 2
And you can see by the results we've group each player into a line by their TOI per game played. Obviously there's nothing exact about these groupings in fact they are rather arbitrary bins, but it definitely helps you see quickly which players are more favored by their coach and which ones are not.
So now we've got out results let's break this query
down piece by piece since it includes
a lot of things you haven't seen before in the other tutorials. The first
thing we'll look at (and the other main part of the title) is the
SELECT statment in the query
'What is that?' you may be asking yourself because if you've
followed along with the tutorials all you've seen so far are queries
with just one
SELECT in each query. This is what is known
as a sub query, or sub
SELECT, and is a critical concept
to add to your SQL toolbox.
FROM (SELECT player, team, sum(toi) as toi, sum(g) as goals, sum(a1 + a2) as assists, count(player) as GP, trunc((sum(toi)/sum(gp))::numeric, 2) as toi_per_gp FROM playerstats where season = '20172018' GROUP BY player, team) as stats WHERE team = 'CAR' ORDER BY line DESC;
This part above is the actual subquery of the query in question. Usually subqueries are contained in parenthesis and after them is a name that will be used as an alias to refer to the subquery in other parts of the original query
So a subquery is just another query that you execute before you pass
it to the original query you want to execute. Oftentimes in an SQL
database the data will not be in the perfect form you need it to be.
So in order to shape the data into the from you need it to be you'll
execute a sub query to create a new table on which you'll then
execute functions or in this instance our
Basically what I'm doing with my sub query is I'm creating a brand
new table that I will then
SELECT from to creat my results.
In this case I'm creating a new table from which I am then selecting
from to run my
CASE statement on. One way to determine if
you are working with a subquery is that it will always be encased
Inside those parentheses is just a standard
seen plenty of times before, but there are a couple of new things I want
to go over first. First is the
trunc is a function that is short for truncate which
means shorten. Basically what we are doing here is rounding the results of
However in Postgres
trunc needs a
input. If you remember back when we were creating out table schemas
we had to define what value type each column would be. In Postgres
numeric is one of those value types. Unlike
numeric in postgres is allowed to have a user
defined percision hence the reason its needed for the
function. The way we change our the
real results to a
numerice is with the
The double colons allows us to cast the type of the column, i.e. what
data format it is, to another within our
SELECT query. In
this instance if we didn't cast the results of our division to a
numeric type the SQL would return an error.
Ok let's move on to the second part of this article and look at the
CASE statement part of the query as you see below.
SELECT *, CASE WHEN toi_per_gp < 11 THEN 4 WHEN toi_per_gp between 11 and 14 THEN 3 WHEN toi_per_gp between 14 and 17 THEN 2 ELSE 1 END AS line
* you've seen before that just means pull all the
columns of the table we are feeding our
FROM clause. But in
addition to that we are adding the
CASE statement to calculate
what line the players are on by their ice time per game played. So if you
look at the sub query above
toi_per_gp was the column where we
calculated a players average TOI per game.
Using that column will pass it's values to our
The syntax for a
CASE statement in Postgres follows this syntax:
CASE WHEN logical_statement THEN value ELSE another_value END new_column_name.
CASE checks to see if the value in the logical statement
is true and if so it puts the
value in the
column for that row. If it's not true then it will place
You can chain the
WHEN statements as I've done above to create
multiple conditions and the resultant values that will be put in the new
column we are creating. The code above looks at the
that value is less than 11 it places a 4 in the
line column, if its
between 11 and 14 minutes then they are third liners and so on.
The last part of the statement is the
ELSE clause. This lets you assign
a value for that row in the column whenever none of the logical conditions you
set have been met. In our example if a player didn't meet any of the conditions
he would be playing over 17 minutes a night on average which allows us to just
call him a first liner without even checking. It's just a catch all for all the
values you don't need to set a specific check for. If you plan only every value
being caught in your logical checks then you don't even need, but most of the time
that won't be true.
The conditions that you are checking for in a
can be whatever you want them to be. You can check to see if a value in a column
is a certain phrase, or if the sum of two different columns is greater than another
column; it's up to you but any time you need to do a logical check in SQL
CASE is the tool you need.