Ok if you’ve read the first tutorial on installing Postgres and getting your first database up and running, you’re now wondering “So how do I access all this data?” Well that’s where this tutorial comes in as we are going to cover some of the basic commands in SQL to get the data we want. Ok let’s get started and dive right in to our data.

      Column   |       Type       | Collation | Nullable | Default | Storage  | Stats target | Description
    -----------+------------------+-----------+----------+---------+----------+--------------+-------------
     yearID    | bigint           |           |          |         | plain    |              |
     teamID    | text             |           |          |         | extended |              |
     lgID      | text             |           |          |         | extended |              |
     playerID  | text             |           |          |         | extended |              |
     G_all     | bigint           |           |          |         | plain    |              |
     GS        | double precision |           |          |         | plain    |              |
     G_batting | bigint           |           |          |         | plain    |              |
     G_defense | double precision |           |          |         | plain    |              |
     G_p       | bigint           |           |          |         | plain    |              |
     G_c       | bigint           |           |          |         | plain    |              |
     G_1b      | bigint           |           |          |         | plain    |              |
     G_2b      | bigint           |           |          |         | plain    |              |
     G_3b      | bigint           |           |          |         | plain    |              |
     G_ss      | bigint           |           |          |         | plain    |              |
     G_lf      | bigint           |           |          |         | plain    |              |
     G_cf      | bigint           |           |          |         | plain    |              |
     G_rf      | bigint           |           |          |         | plain    |              |
     G_of      | bigint           |           |          |         | plain    |              |
     G_dh      | double precision |           |          |         | plain    |              |
     G_ph      | double precision |           |          |         | plain    |              |
     G_pr      | double precision |           |          |         | plain    |              |

This table in the database lists the number of appearances at each position for each player, each season and each team. G_1b is games at first base and so on. If you need more info about the data in the database the data dictionary can be found here

Side note here. When I was creating this database file for you I forgot that Postgres only stores column names in lower case. As you can see above it does recognize upper case when displaying the column. The problem comes when you want to select a column. If you tried to execute SELECT playerID from appearances limit 10; you would get an error saying HINT: Perhaps you meant to reference the column "appearances.playerID". But even if you added appearances to your column selection you would still get the same error. THe solution is to add "" around each column name so that the column is searched for on a case sensitive basis. But that’s annoying so lets fix it.

Logout of your current account with \q and log back in with the superuser account that is the same name as your computer user name and run this code:

\t on
select 'ALTER TABLE '||'"'||table_name||'"'||' RENAME COLUMN '||'"'||column_name||'"'||' TO ' || lower(column_name)||';'
from information_schema.columns
where table_schema = 'public' and lower(column_name) != column_name
\g /tmp/go_to_lower
\i /tmp/go_to_lower

Copy this and paste it into the terminal and hit enter and all your column names will be in lower case. I put this in the tutorial for a very good reason. I could have gone back and fixed the file and changed a couple things and made it look like I’m perfect, but I’m not. Mistakes happen. A lot. Things are going break and a vital skill is to learn how to fix them. In this case I literally googled ‘convert all columnames to lowercase postgres’ and the answer was the second link on google.

Do I know exactly what that code does? Nope but what better place to learn than in a situation with no ramifications for it being wrong. Now knowing what to google comes with experience, but a good place to start is to just cut and paste the error into google and start clicking. This is what every professional programmer does and the ones that say they don’t are lying. Ok back to the tutorial.

playerid  | teamid | g_1b | g_2b | g_cf
-----------+--------+------+------+------
 abercda01 | TRO    |    0 |    0 |    0
 addybo01  | RC1    |    0 |   22 |    0
 allisar01 | CL1    |    0 |    2 |   29
 allisdo01 | WS3    |    0 |    0 |    0
 ansonca01 | RC1    |    1 |    2 |    0
 armstbo01 | FW1    |    0 |    0 |   11
 barkeal01 | RC1    |    0 |    0 |    0
 barnero01 | BS1    |    0 |   16 |    0
 barrebi01 | FW1    |    0 |    0 |    0
 barrofr01 | BS1    |    0 |    1 |    0
(10 rows)
SELECT playerid, teamid, g_1b, g_2b, g_cf from appearances WHERE yearid = 2016 limit 10;
 yearid | playerid  | teamid | g_1b | g_2b | g_cf
--------+-----------+--------+------+------+------
   2016 | abadfe01  | BOS    |    0 |    0 |    0
   2016 | abadfe01  | MIN    |    0 |    0 |    0
   2016 | abreujo02 | CHA    |  152 |    0 |    0
   2016 | achteaj01 | LAA    |    0 |    0 |    0
   2016 | ackledu01 | NYA    |   13 |    1 |    0
   2016 | adamecr01 | COL    |    0 |   11 |    0
   2016 | adamsau01 | CLE    |    0 |    0 |    0
   2016 | adamsma01 | SLN    |   86 |    0 |    0
   2016 | adlemti01 | CIN    |    0 |    0 |    0
   2016 | adriaeh01 | SFN    |    0 |    7 |    0
(10 rows)

This has returned all the players that had an appearance in the 2016 baseball season. You can also chain these commands as well using logical switches such as AND or OR like so:

SELECT playerid, teamid, g_1b, g_2b, g_cf from appearances WHERE yearid = 2016 AND g_1b > 100;
SELECT playerid, sum(h) FROM batting WHERE yearid=2017 GROUP BY playerid ORDER BY sum(h) DESC LIMIT 10;

Which gives us this output:

 playerid  | sum
-----------+-----
 blackch02 | 213
 altuvjo01 | 204
 inciaen01 | 201
 gordode01 | 201
 hosmeer01 | 192
 ozunama01 | 191
 andruel01 | 191
 abreujo02 | 189
 lemahdj01 | 189
 arenano01 | 187
SELECT playerid, sum(h) AS Hits FROM batting WHERE yearid=2017 GROUP BY playerid ORDER BY sum(h) DESC LIMIT 10;
 playerid  | hits
-----------+------
 blackch02 |  213
 altuvjo01 |  204
 inciaen01 |  201
 gordode01 |  201
 hosmeer01 |  192
 ozunama01 |  191
 andruel01 |  191
 abreujo02 |  189
 lemahdj01 |  189
 arenano01 |  187
(10 rows)

A couple other notes on this query. We have to do the GROUP BY because this table has a row for each player for each team they played for in the season. A lot of players do play for the same team all season but a lot don’t. If you didn’t do the group by you would miss out on those other team hits for that player. The next is the ORDER BY this tells the database how to sort the results in this case by hits descending, i.e. from largest to smallest, with the DESC keyword. When you alias the sum(h) column you can just using ORDER BY hits next time. ORDER BY will sort ascending by default unless you tell it otherwise

Our next tutorial will cover JOIN so you can finall tell which players are actually accumulating the stats you are calculating in you queries.