Home

Projects

Tutorials

Analysis

SQL Basics Part 2

by Matthew Barlowe


Ok if you've been following these in order our last tutorial covered on the SELECT SQL command and some of the clauses associated with it. I however negelected to mention two more important clause I commonly use and that you will find useful to add to your repertoire.

HAVING

The clause we are going to look at is the HAVING clause. If you remember at the end of the last tutorial linked above we were dealing with agregate functions. But what if you want to limit the rows returned in a SELCT query based on the value returned by the aggregate function? Let's refresh our memory and look at one of these queries first:


SELECT player, sum(g) AS goals, sum(a1) AS a1, sum(a2) AS a2
FROM playerstats WHERE game_date > '2017-10-01'
GROUP BY player ORDER BY goals DESC;
      

Ok say we only wanted to see goal scorers that have scored over 30 goals instead of all the goal scorers in the NHL. If you tried to put that in a WHERE clause Postgres would throw an error. That's because, if you remember from last tutorial, that the WHERE clause acts before the sum(). WHERE won't know what goals is at all because in the order of operations it will look for goals before you have even declared it.

"What if we just passed it WHERE g > 30? That should work right?" Nope, because as we are working with individual games you won't find a single player who has scored 30 goals in a game so the WHERE will just exclude all rows and return an empty table.

This is where the HAVING clause comes in play. HAVING is like a WHERE clause but it comes after the agregate functions are run. So lets change our query from above:


SELECT player, sum(g) AS goals, sum(a1) AS a1, sum(a2) AS a2
FROM playerstats WHERE game_date > '2017-10-01'
GROUP BY player HAVING sum(g) > 30 ORDER BY goals DESC;
      

And this will be your result:


       player       | goals | a1 | a2
--------------------+-------+----+----
 ALEX.OVECHKIN      |    43 | 20 | 15
 PATRIK.LAINE       |    43 | 12 | 13
 EVGENI.MALKIN      |    40 | 27 | 22
 WILLIAM.KARLSSON   |    39 | 12 | 15
 TYLER.SEGUIN       |    38 | 23 |  8
 NATHAN.MACKINNON   |    38 | 35 | 18
 ERIC.STAAL         |    37 | 18 | 14
 NIKITA.KUCHEROV    |    36 | 29 | 27
 ANDERS.LEE         |    35 | 13 |  7
 CONNOR.MCDAVID     |    33 | 31 | 25
 JOHN.TAVARES       |    32 | 26 | 15
 TAYLOR.HALL        |    32 | 27 | 18
 JAMES.VAN RIEMSDYK |    31 | 11 |  5
 SEAN.MONAHAN       |    31 | 21 | 10
 RICKARD.RAKELL     |    31 | 16 | 15
 BRAD.MARCHAND      |    31 | 27 | 17
(16 rows)
      

So now we have all the goal scorers who have score more than 30 goals in our time frame we passed to the SQL query. The only way you will be able to filter future aggregate functions will be with the HAVING command so make sure to keep it in mind. Another thing I will mention is that the HAVING command is executed before the SELECT command so it won't have any knowledge of the aliases you have created. This means that you will need to pass it the whole aggregate function in order to filter the results as desired.

BETWEEN

Ok our last introductory command/clause for basic select statements is going to be the BETWEEN clause. So using our command above we are going to limit the total goals between the values of 20 and 25 like so:


SELECT player, sum(g) AS goals, sum(a1) AS a1, sum(a2) AS a2
FROM playerstats WHERE game_date > '2017-10-01'
GROUP BY player HAVING sum(g) BETWEEN 20 and 25 ORDER BY goals DESC;
      

Here are our results:


             player       | goals | a1 | a2
--------------------+-------+----+----
 LOGAN.COUTURE      |    30 | 16 |  9
 SEAN.COUTURIER     |    29 | 24 | 12
 ANZE.KOPITAR       |    29 | 27 | 26
 DAVID.PASTRNAK     |    29 | 25 | 16
 PHIL.KESSEL        |    29 | 35 | 15
 JASON.ZUCKER       |    29 | 18 |  9
 BROCK.BOESER       |    29 | 13 | 13
...
 ALEX.DEBRINCAT     |    25 | 11 |  9
 PATRICK.KANE       |    25 | 28 | 17
 ALEX.RADULOV       |    25 | 24 | 14
 ALEKSANDER.BARKOV  |    25 | 24 | 23
 SEBASTIAN.AHO      |    25 | 20 | 12
 CLAUDE.GIROUX      |    25 | 30 | 26
 BRAYDEN.SCHENN     |    25 | 19 | 16
 EVANDER.KANE       |    25 | 12 | 13
      

This isn't the whole selection as the result you should get will be 30 rows but as you see the top is 30 and the bottom is 25. That's because BETWEEN is inclusive. Now you could have done the same thing like this HAVING sum(g)<=30 AND sum(g)>=25 but I feel that BETWEEN is much easier to use. Not only that its much easier to read and for others to interpret what your code is doing which is something you should always have in the back of your mind as you are writing it.

BETWEEN just doesn't work with HAVING you can also use it with your WHERE clauses as well and in the exact same way. Say if we wanted to look at goals scored in the month of October last year we would use WHERE game_date BETWEEN '2017-10-01' and '2017-10-31' and it would limit the games we passed to our GROUP BY clause to those that occured in October only.

Sources

SQL HAVING Clause

A Beginner's Guide to the True Order of Operations in SQLThis is an excellent piece on the order of operations so to speak in an SQL SELECT query. A must read.

SQL BETWEEN Operator