Home

Projects

Tutorials

Analysis

Introduction to Git and Github

by Matthew Barlowe


So if you've started out doing a little coding either for fun or like me to try and do data analysis of the NHL, then you've probably heard people talk a lot about Git and Github. And if you were like me, you've even tried to learn it, or use it, but just got so frustrated you gave up and figured it wasn't worth the time.

Well this article will try to dispel some of those confusions because once Git clicks for you, you'll never code without it. It's such a powerful tool to fit into your coding arsenal. But before we get into setting up Git and using let's talk about what Git is and what it does.

What is Git?

First off let's differentiate between Git and Github. Many people use these terms interchangeably, but they are very different things. Git is what is called a Version Control System or VCS. To put it simply it's a software that just tracks the changes you make to the programs you write. That's all it is at its essence. Like working with a Google doc tracking changes, Git remembers the changes, and more importantly the history of those changes, you make to your files. This allows Git to restore those files to any previous state in the history of the development of the code.

Now Git itself is self contained to your computer, nobody can see your changes except for you. Github on the other hand is a website where you can store your code and the history of your changes saved by Git. This is where you will send your work when you want to share it with the world. So keep that in mind when reading the rest of this article. Git is the program on your computer, and Github is the website where you will send your code when you want to share it. This also adds the benefit of an extra back up in case something happens to your system.

Why use Git?

Track Changes

Git as noted above keeps track of the changes you've made to your code throughout the history of developing if used properly. Everytime I add a new feature to my code I save it in Git. And using Git logs I can see all the changes I've made. So if I write something and it messes everything up I can just go back one step in the Git chain and get things back up and running.

Obviously this may not be a big deal if you are just writing code that only you will use yourself. But there's been plenty of times I've made changes one night and forgotten what I've done the next day when I go back to work on it. And if you are working with multiple people on a project then this log of changes is invaluable to see what others have done in the code base. Git also keeps track of the date and time of changes made as well so you can see who made changes and when those changes were made.

Branching

This is perhaps the most important part about git is that you can create branches of the code you are working on in order to fix bugs or create new features. You may be asking now, "Why not just create a copy of the file and work on two copies?" Well if you're only working on one file then you may not need Git, but often you'll be working on multiple files that often interact with each other.

Trying to keep track manually of the working copies and copies that are being worked on would quickly become confusing and could lead to bugs and errors that you'd have to waste futher time tracking down. Git handles all that behind the scenes so you don't have to. You can create a new branch and make as many changes all without breaking any of the files you are using until you have properly tested the new versions.

To use a personal example I recently added the ability to query goalie stats to my twitter bot. Instead of taking the bot down and working on the file until the features were added, I created a new branch where I worked on the new features until they were finished. Once done, I then merged those changes back into my main script. And in doing this I only worked on one file the whole time.

It works with Everything

Git is language agnostic. It works with any programming language out there whether it's Python, R, C++, or something else, Git doesn't care. It tracks all changes for all files no matter what they are. In fact Git works just as well with writing a novel or a story as it does with writing code! It's near universiality makes it a common ground for developers to meet and collaborate on projects.

Setting up Git

>The first step to getting Git up and running is to go to Github and setup an account. Once that's done open up a terminal screen and we'll get started installing Git. If you don't have Hombrew installed on your system please head here and read this tutorial to get up to date as I'll be using Homebrew to install Git.

If you have Homebrew installed and running, all you need to do is type this at the command line:


$ brew install git
      

Brew does its usual stuff and after its done type which git to make sure we are using the git that brew installed. The output should be usr/local/bin/git if it is not then we need to change your $PATH variable to put the Homebrew version first. To do that type these commands one at a time and hit enter after each one:


$ echo export PATH='/usr/local/bin:$PATH' >> ~/.bash_profile
$ source ~/.bash_profile
$ echo $PATH
      

Without getting into too much detail about the commands, what you are doing there is placing the usr/local/bin at the front of your $PATH so that the bash shell will look there first for the commands you are trying to run. Next we'll run some commands to setup git to use the account you just setup on Github when you want to push your work to a repository.


$ git config --global user.name "Your Name Here"
$ git config --global user.email "your_email@youremail.com"
      

"Your Name Here" and "your_email@youremail.com" should be the username you created and the email you created it with on Github. After that's done we're going to setup Git to save your user name and password to the OSX keychain that way you don't have to type it in everytime you want to push something to one of your repositories on Github.


$ git config --global credential.helper osxkeychain
      

The first time you push something to a repository you'll have to enter your user name and password but after that it will be saved to the OSX keychain and you won't need to enter it anymore. Now that we have everything setup lets start creating our first initial repository or repo for short. A repository is just a fancy name for where we will store our code.

Creating Your First Repository

Ok lets create a new directory to store our first repo.


$ mkdir gittutorial
$ cd gittutorial
$ git init
      

So with those three commands we created a new directory called gittutorial we changed the directory to gittutorial and then we created the repository with the git init command. If everything worked correctly you should see this text as output:


Initialized empty Git repository in /Users/MattBarlowe/gittutorial/.git/
      

Instead of /Users/MattBarlowe it will be whatever username you are logged in on your Mac. So now that we have the repository initialized (init is short for initialize) lets look our next command:


$ git status
      

Which will return this:


On branch master

No commits yet

nothing to commit (create/copy files and use "git add" to track)
      

This is the status of our repository and it tells us three important things. The first it tells us we are on branch master. Master is the first branch git creates when you initialize the repository. This is the main branch you will be working on in any repository. You can rename it if you wanted to, but I'd advise against it as its common usage to call it master.

The next line tells us that there have been no changes made to the repository. In Git speak a 'commit' is where you have saved your changes to Git. No commits means we haven't made any changes which is to be expected as it's a newly created repository. The third line means there are no files that we can even commit/save which again as it is a newly created directory shouldn't be surprising.

So lets create a file using whichever text editor you prefer. I use Vim myself but if you want to use Nano or Emacs that's fine as well. But we'll write a simple python script show below and then save it as gittutorial.py.


for x in range(10):
    print(x)
      

After you've saved the file exit back out to the command line and run git status and you should see this on the command line.


On branch master

No commits yet

Untracked files:
  (use "git add file..." to include in what will be committed)

	gittutorial.py

nothing added to commit but untracked files present (use "git add" to track)
      

As you can see the branch is the same and we still haven't made any commits, but now we have a new one of Untracked files. This means that Git has noticed we created a new file, but Git is not currently tracking it either in this commit or past commits. As you can read in the output our next command git add will begin the process of git tracking our changes. Now type:


$ git add gittutorial.py
      

Nothing will print out but if you type git status again you will see this:


On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached file..." to unstage)

	new file:   gittutorial.py
      

So is our file saved? No! At least not yet. All we've done is add the file to the staging area so that when we commit the changes to our repository the file will be saved with the commit. If you're a little confused don't worry a lot of people are confused over this step. Basically what you are doing is telling Git what files you want to save when you actually save the state of your repository. When you create a commit Git doesn't know what it needs to keep track of unless you tell it the files it needs to track. All git add does is tell Git exactly which files you want to save.

So lets actually make our first commit. Type this into the command line:


$ git commit -m "First Commit"
      

So you've made your first commit! The '-m' flag to git commit allows you to add a message to the commit so you can describe what you've done. You should always put a message on your commit so you or others know what exactly you changed with that commit of the code. The message can be anything but its best to keep it short and always put it between quotation marks.

You have now officially saved the status of your repository with Git. Think of a commit as taking a snapshot of your code in time. Except unlike a photo, git allows you to go back and forth along the timeline of your code, i.e. the branch, and stop at any point in time of the development where you made a commmit. So now that we've commited our changes lets type git status and see what it tells us.


On branch master
nothing to commit, working tree clean
      

Once we've commited our changes there is nothing more to commit. That is until you make new changes and then you'll need to repeat the process over again. Ok let's get some more practice with this. Go back into your gittutorial.py file and change the number in the range function to 20 save and then exit. Now type in git status and you'll see this:


On branch master
Changes not staged for commit:
  (use "git add file..." to update what will be committed)
  (use "git checkout -- file..." to discard changes in working directory)

	modified:   gittutorial.py

no changes added to commit (use "git add" and/or "git commit -a")
      

So now that we've made changes to the file we need to git add and then git commit -m "Changed number in range from 10 to 20". And now you have commited your changes for the second time to the repository. So let's introduce a new command where you can look at the history of your changes.


$ git log --oneline
      

Which will produce this:


88e287f (HEAD -> master) Changed number in range from 10 to 20
c9801bb First Commit
      

This is a log of all the changes you've made so far in this repository. As we've only made two changes, there are only two entries. The --oneline shrinks your log down into oneline for each entry to make it easier to read. To get more info for each commit you can just type git log. Let's go over the output of the log as there are several important things in the output.

The string of numbers at the front of each entry is what's called the SHA. This is a unique hash number that is assigned to each commit that keeps track of the commits. It's actually much longer than those seven numbers but its shortened for the oneline output. You can search log for certain commits if you know their SHA number as well. The next thing is the (HEAD -> master). This shows you where the location of the current branch you have checked out. This tells us that we are currently working on the latest commit in the branch.

Now let's check out one of the most powerful features of git which is creating new branches. Branches, and moving back and forth between them, really lie at the heart of what makes Git great. Branches allow developers to work independently on the same files and develop different features simultaneously. Understanding how they work is important to understand Git.

Branching

So lets create a new branch which we'll call newbranch because I don't feel very creative.


$ git checkout -b newbranch
      

git checkout is the main way one switches between branches in Git, however when we add the '-b' flag it tells the command to create a new branch with the name we give it and then switch to that branch. If we wanted to switch back to master all we would need to type is git checkout master. Now lets got back into our python file and add the line print('This is a new feature') after the for loop block and save it. We'll add and commit these changes as we did before. To make sure you're on the newbranch make sure to run a git status and check.

Ok now type git log --oneline again and look at the output.


cf87ca3 (HEAD -> newbranch) Added print statement
88e287f (master) Changed number in range from 10 to 20
c9801bb First Commit
      

Now you can see both branches on the log, master and newbranch, and Head is pointing at the most recent commit on our current branch where we added the print statement to the script. Here with one command we can see that newbranch is one step, or one commit ahead of master. This means up until the last commit in newbranch that the two branches where exactly the same. You can see that the master branch is still at the last commit from the last time we commited a change on that branch.

Ok now thats done we are going to merge our new feature back into the master branch. Obviously with only adding one line of code this is a little overkill but imagine if were working on a project with multiple files and 1000s of lines of code and then you'll start to see the potential of Git and why everybody uses it. So type git checkout master to switch back to the master branch. If you are ever unsure of which branch you are on git status and check, here shortly I'll show you a trick to help with that.

So we'll switch back to the master branch with git checkout master and then follow that with a git merge newbranch which will produce this output:


Fast-forward
 gittutorial.py | 1 +
 1 file changed, 1 insertion(+)
      

This means that our merge is succesful! From the output you can see we only changed one file, and inserted one line. This doesn't happen all the time and we'll look at what does happen with the dreaded merge conflict.

Merge Conflict

So we've done our first merge and everything went smoothly. Now we are going to look at when things don't go smoothly. Open up our python script while still on the master branch and right underneath our new feature print statement type this print('This is a feature of the master branch'). Now we'll git add gittutorial.py and then git commit -m "Added new master feature" to commit the changes to the master branch. Now let's switch back over to newbranch with git checkout newbranch. Once there open the same python file and now add on the fourth line print('This is a feature of the newbranch'). Now add this file and commit just like you did with the master branch except change the commit message to "Added newbranch feature."

Now lets switch back to master with the checkout command and try to merge the two branches. This should be the output you get:


Auto-merging gittutorial.py
CONFLICT (content): Merge conflict in gittutorial.py
Automatic merge failed; fix conflicts and then commit the result.
      

So as you can see the merge failed because we have two different lines of code on the same line in each file. To fix this we'll open up the code in our text editor, Vim for me but whatever you prefer and you'll see this:


for x in range(20):
    print(x)
print('This is a new feature')
<<<<<<< HEAD
print('This is a feature on the master branch')
=======
print('This is a newbranch feature')
>>>>>>> newbranch
      

Git has clearly marked where the conflict is in our file. It will do this through out the file wherever a conflict occurs so if you're working with a large file just search for HEAD and you can quickly zip through the file and change what you want. Here Git gives us three choices: we can keep the master version, the newbranch version, or both. We'll keep both for this excercise since it won't cause any problems. It obviously won't be this easy all the time but this method will be the way you handle these issues when using Git. Since I'm going to keep both for this excercise all I need to do is delete the HEAD, newbranch, and equal signs lines and I'm good to go.

A quick note here, HEAD refers to the HEAD of the branch you are merging in to. In this case it is the master branch but it could any branch that you are merging files into. Ok so after I've deleted the lines I will add, and commit the file with the message "Merged master and newbranch."

Let's get a visual representation of what all just happened. Type git log --oneline --graph and hit enter which should produce this output:


*   fe2e9e1 (HEAD -> master) Merged master and newbranch
|\
| * 688fd3a (newbranch) Added new newbranch feature
* | b6090d1 Added new master feature
|/
* cf87ca3 Added print statement
* 88e287f Changed number in range from 10 to 20
* c9801bb First Commit
      

As you can se here the dots on the left are our master branch and then we see the newbranch branch out and then come back in which represents our merge. Each asterisk represents a commit on that branch. And now you know the basics of Git! There's tons more complex stuff in Git, but these are some of the main commands and along with the next ones I'm going to show you represent the majority of commands you'll use on a day to day basis.

Using Github

Ok so we've done a lot of talking about Git. And remember everything we've discussed so far takes place ONLY ON YOUR COMPUTER. No one else can see that code and those changes unless you allow them access to your computer. But that's never a bright idea so that's why we have Github. On Github you are basically just storing all the changes you make on your computer on the internet so other people can use it or recommend improvements via pull requests. But how do you get what we've done so far to Github itself?

To get our code to Github we are going to do what people call pushing the repository. It's called that because the command to send stuff to Github is push as opposed to pull which brings code from Github to your computer. So to get our code to Github, you'll need to log into Github and at the top you'll see a plus sign. Click on that and a drop down menu will appear and then click create new repository which will bring you to this screen.

Creating a Github Repo

Name the repository but don't change anything else, and click "Create Repository." The next screen will have a url that looks something like this https://github.com/mcbarlowe/gittutorial.git. Copy that url and then in the terminal type git remote add origin your_url and replace my url with your own. After that type git push -u origin master and you should get output that looks something like this:


Counting objects: 18, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (10/10), done.
Writing objects: 100% (18/18), 1.48 KiB | 759.00 KiB/s, done.
Total 18 (delta 3), reused 0 (delta 0)
remote: Resolving deltas: 100% (3/3), done.
To https://github.com/mcbarlowe/gittutorial.git
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'origin'.
      

And now go to the url on github and you'll see your very own brand new repository on the website. If you've gotten this far great job! I know its hard and confusing. It was for me as well, and it took me about four of five failed attempts at trying to learn how to use Git before I finally understood it. But once it clicks you'll wonder how you ever lived without it.

And now every time you make a commit and you want to push it up to Github all you have to type is git push and within minutes your code is posted. How cool is that? And if you want to push another branch than master just checkout that branch and type git push -u origin branchname and then when you have that branch checked out just type git push and that branch will be pushed to your Github repository as well.

Git Advice

Here's a couple rules of thumb to follow to save yourself some Git headaches

One last thing. I mentioned above an easy way to help keep track of what branch you are on. To do this we'll need to edit your .bash_profile which is located in your home directory. In Linux and Unix systems the Home directory is always symbolized with the ~ symbol. So to edit it you would type vim ~/.bash_profile. Or whatever text editor you prefer. Once you have it open add these lines:


# Git branch in prompt.
parse_git_branch() {
    git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/ (\1)/'
}
      

Now save the file and at the command line type source ~/.bash_profile and you should see the git branch you are currently on in the terminal prompt line like this:


Username@computer-name website (master) $
      

Ok thats it for now with this tutorial. I know it must be a lot to take in; trust me it was a lot to write. If you have any questions you can always check the sources for further info. And as always I'm always available on Twitter @matt_barlowe or you can email me barloweanalytics@gmail.com if you have longer questions. Good luck in your future in version control I bet you'll do great.

Sources

Beginners Setup Guide for Git

Mac Setup Guide: Git This link covers a lot of other topics for setting up your Mac as a development environment. And will probably cover things I won't get to for a while. Check it out if you're wanting to do more advanced stuff.

What is Git?

Understanding Git Repositories

What is Version Control

Engineering Stack Discussion on Git strutcture

What is Git and Why You Should Use Version Control if You are a Developer

Caching your Github Password in Git

Adding an Existing Project to Github

What is the Difference between Git Pull and Git Fetch

Git Docs Page