Contents


Note: Please think of this document as a living document, which you are free to improve (like a wiki) with minor edits, new sections that others might find useful (e.g., using git directly in Rstudio), or additional resources that you find that you think others might also find useful. After reading through this, you should have a working understanding of what version control is and why it is useful. You should also be able to start using version control in your own work through a combination of GitHub and either GitKraken or the command line interface.



Introduction: What is version control?

Version control is any system that records changes made within a set of files over time so that different versions of files can be managed and, if necessary, recovered. Put more intuitively, version control is a way of taking a snapshot in time (called a ‘commit’) of all the files in one of your folders (called ‘repositories’); as you make changes to the files within your folder, you can always come back to previous snapshots that you’ve taken (if, e.g., you make a change that you regret, or need information from a previous point in time). You can even have multiple different versions of the same folder existing in parallel (called branches). You can think of it as an extra step on top of ‘saving’ a file – a step that solidifies a key point in time for your work, records how it changed from previous and subsequent points in time, and records who made the change, when, and why.

Version control is indispensable for large coding projects with multiple developers collaborating on the same code, but it’s also a very useful tool for the workflow of scientific research. Using version control can allow you to better manage data files, analysis files (e.g., R code), manuscript files, and more in a way that keep things clean and removes the anxiety of losing track of which file is the ‘right’ one.

Version control is also an excellent tool for doing open science. By keeping a record of how your data, analysis, and manuscripts change over time, the process of doing science becomes more transparent. By uploading your progress to GitHub, you can make the whole process of doing science accessible to others, and have evidence of priority and accuracy in your conclusions (you can also keep repositories private, though this costs a small amount unless you are a student).


An example of the timeline of commits for a recent project. Bold titles on top show more recent changes committed to the repository; the bold messages are written at the time of committing, and make it easier to see important changes added over time. In GitHub, you can click on these bold titles to see what changes were made since the last commit; from here, you can also see the whole repository as it was during the time of commit (you can also do this by clickling on the < > buttons on the right).

An example of the timeline of commits for a recent project. Bold titles on top show more recent changes committed to the repository; the bold messages are written at the time of committing, and make it easier to see important changes added over time. In GitHub, you can click on these bold titles to see what changes were made since the last commit; from here, you can also see the whole repository as it was during the time of commit (you can also do this by clickling on the < > buttons on the right).


There are many different types of version control available. Here, I am going to focus only on git version control software, which has the advantage of being free, open source, available on all platforms (Linux, Mac, and Windows), and the most popular software among research scientists. The software was invented by Linus Torvalds, the same developer who created the Linux kernel.

In this introduction to using version control, I am going to focus heavily on using two software tools that work with git, GitHub and GitKraken. Like git, both GitHub and GitKraken are free for basic use, though more advanced options can come with a small cost. These two tools make using git much easier, especially if you don’t like the idea of working within the command line. GitHub offers a massive online platform where you can store your git repositories, discover and download new repositories, and collaborate with other GitHub users (e.g., in organisations such as the Stirling Coding Club). GitKraken provides a nice graphical user interface for using git, visualising your repository, and linking to GitHub. As you become proficient with git, you might find yourself start thinking less in terms of individual files and file versions, and more in terms of commits and branches with inter-related files.

First, I am going to briefly talk about how to use git entirely within GitHub on your browser. This requires fewer steps than using GitKraken, but it can only get you so far because you cannot work with the changes that you make directly on GitHub. For example, if you edit an R file in GitHub, you would have no way to run the code without pulling the file from GitHub to your local repository.

Things to do before getting started

If you are just starting out with git, I recommend signing up with a GitHub account, then downloading GitKraken. You can do everything that I explain below with only these two tools. For those who prefer to use the command line interface, I have included instructions for how to do this below too. Learning to use git in the command line is probably useful if you already use the command line in your normal work flow (or if you are interested in doing so!), but if it’s not something that you work with already, learning it here is probably more trouble than it’s worth.

How to use git in GitHub

In GitHub, the process of creating, making changes to, and comitting files is fairly straightforward, and it’s possible to create and manage a repository entirely within GitHub in the browser. The downside to this approach is that you cannot actually run code in GitHub, as you can in Rstudio or some other program. You cannot create a DOCX or PDF file for a manuscript, or run your R analysis, in GitHub. Hence, using git in GitHub should probably be thought about as an occasional tool to use for editing when away from your office computer, or just need a very quick fix to something to pull from GitHub later. To create a new file, simply go to your GitHub repository and click the button “Create new file”.


This will take you to a new screen where you can start writing directly in your browser, and you can save the file as whatever name you want. When you’re finished, you can scroll to the bottom of the page to make a commit.

In general, it is a good idea to try to come up with an informative commit message. Traditionally, git commit messages are written in present tense.

In general, it is a good idea to try to come up with an informative commit message. Traditionally, git commit messages are written in present tense.


Unlike using tools outside of your browser such as GitKraken or the command line, there is no need to push to or pull anything from GitHub because you’re making changes directly to GitHub, so the edits made to files are automatically saved into GitHub. It is important to note, however, that changes you make on GitHub need to be pulled (i.e., downloaded) onto your local computer before you start making changes locally – e.g., in Rstudio. They won’t appear automatically.

With no need to push or pull in GitHub itself, we can move to using branches in GitHub. You can create a branch whenever you edit a file by selecting the ‘Create a new branch for this commit and start a pull request’ radio button below your commit message.


Creating a new branch will immediately take you to the pull request screen below. This will send a request to merge the change that you’ve just made with the master branch (i.e., a ‘pull request’) – it let’s you know that the branches can be merged without any merge conflict, and shows the change that you made at the bottom of the secreen (removing ‘Tomatoes’ and adding ‘Avacodo’).