Home

Projects

Tutorials

Analysis

SQL Basics Part 3

by Matthew Barlowe


Ok last time I was talking about SELECT clauses I discussed using OR operator to dictate two states in your WHERE clause that would filter the results. Well with this tutorial I'm going to show you a little shortcut for when you want to set multiple filters on the query results.

Using the IN Operator

Ok let's look at the last query from the last tutorial:


SELECT player, sum(g) AS goals, sum(a1) AS a1, sum(a2) AS a2
FROM playerstats WHERE game_date > '2017-10-01'
GROUP BY player HAVING sum(g) BETWEEN 20 and 25 ORDER BY goals DESC;
      

Ok if you remember that returned the players who had scored between 20 and 25 goals in the data after 10/01/2017. But let's say we wanted to see only the results from a few teams instead of all the teams. We could chain together a bunch of OR statements, but we don't want to do that.

Not only would that make for a harder to read SQL query, it would also be more typing and I'm never a fan of typing more than I have to. So lets change the query to show us only the goal scorers from the Flyers, the Rangers, and Carolina.


SELECT player, sum(g) AS goals, sum(a1) AS a1, sum(a2) AS a2
FROM playerstats WHERE game_date > '2017-10-01' AND team IN ('PHI', 'NYR', 'CAR')
GROUP BY player HAVING sum(g) BETWEEN 20 and 25 ORDER BY goals DESC;
      

This will produce a result that looks like this:


     player      | goals | a1 | a2
-----------------+-------+----+----
 MICHAEL.GRABNER |    25 |  4 |  2
 SEBASTIAN.AHO   |    21 | 15 | 10
 CLAUDE.GIROUX   |    20 | 26 | 24
 WAYNE.SIMMONDS  |    20 | 10 |  7
 JEFF.SKINNER    |    20 |  9 |  8
(5 rows)
      

As you can see we have limited our results to just those teams in the IN. Remember once again this data only goes up to the end of February so it won't have known that Grabner was traded to the New Jersey Devils. And even if it was up to date since each row is a single game it would still return Grabner's stats as a Ranger since he scored over 20 goals. It would not return any data from his New Jersey games though as the team wouldn't be one of the teams in the IN list.

IN will work for any column that is categorical such as team name, player name, and game date. It will also work for numerical columns as well just return the rows where the value is in the IN list. So if you wanted to look at games say for players where they scored two goals you would adjust the query like this:


SELECT player, sum(g) AS goals, sum(a1) AS a1, sum(a2) AS a2
FROM playerstats WHERE game_date > '2017-10-01' AND g IN (2)
GROUP BY player ORDER BY goals DESC;
      

And it would return results like this:


            player      | goals | a1 | a2
------------------+-------+----+----
 SEAN.COUTURIER   |    12 |  1 |  1
 RICK.NASH        |     8 |  0 |  1
 MICHAEL.GRABNER  |     6 |  0 |  0
 TEUVO.TERAVAINEN |     6 |  2 |  0
 SEBASTIAN.AHO    |     6 |  1 |  0
 TAYLOR.HALL      |     6 |  1 |  0
 SCOTT.LAUGHTON   |     4 |  0 |  0
 CHRIS.KREIDER    |     4 |  1 |  0
 JEFF.SKINNER     |     4 |  0 |  0
 NICO.HISCHIER    |     4 |  0 |  1
(10 rows)
      

So as you can see the IN operator is a very powerful way to slice your data while remaining easy to work. If you have many different categories to need to filter for than IN probably isn't the best choice and you need to find more efficient solutions such as JOIN. Still it's a good tool to keep handy and will be one you use regularly.

Sources

SQL IN operator

Max Number of Parameters with IN