Note: Please think of this document as a living document, which you are free to improve (like a wiki) with minor edits, new sections that others might find useful, or additional resources that you find that you think others might also find useful. After reading through this, you will be able to write a basic R package, which can be installed from GitHub.
Packages are bundles of code and data that can be written by anyone in the R community. R Packages can serve any number of uses, and range from well documented and widely used statistical libraries to packages of functions that tell knock-knock jokes. If you have been using R for even a short length of time, you have probably needed to install and use the functions published in an R package. In these notes, I will walk you through the basics of writing your own R package. Even if you never intend to do this for your own code, I hope that this process will make you more familiar with the R packages that you use in your research, and how those packages are made.
A lot of R users are probably familiar with the Comprehensive R Archive Network
(CRAN), a massive repository that currently holds over 13000
published R packages. Packages on CRAN are published for the R community
and installed in RStudio using the function
install.packages
. But not every R package is or should be
uploaded to CRAN. Packages can uploaded and downloaded from GitHub, or even just built for personal use
(some R users have their own personal R packages with documented
functions that they have written and regularly use in their own
research).
Here I will walk through the process of writing a very simple R package, uploading it to GitHub, and downloading it from GitHub. Throughout these notes, I will present only the Rstudio version of package development, but package development can also be done using the command line (though there is really no reason to do this, as Rstudio makes the whole process much easier). There are some packages that need to be installed before we start developing.
Before getting started, we need to install the devtools
and roxygen2
packages. The former contains a large bundle of tools needed in package
development, while the latter is used to easily write documentation.
install.packages("devtools");
install.packages("roxygen2");
It might be necessary to restart Rstudio after installing the above packages.
Assume that we want to create an R package that includes two functions. The first function will convert temperatures from degrees Fahrenheit to degrees Celsius, while the second function will convert temperatures from degrees Celsius to degrees Fahrenheit. The first thing we need to do is create a new folder somewhere on our computer that will hold the whole R package (there are other ways of doing this, but I am showing the way that I tend to use most often).
The above shows the new folder ‘SCC_R_package’. For now, this folder is empty. The first thing that we need to do is to create a new folder inside of `SCC_R_package’ called ‘R’.
Inside this folder is where we will store the actual R scripts with the coded functions. Any number of ‘.R’ files can be included in the folder, and each file can have any number of functions. You could, for example, give each function its own file, or just have one file with many R functions. For large projects, I find it easiest to group similar functions in the same R file. In our new R package, I will write both functions in the same file called ‘temp_conversion.R’, which has the code below.
F_to_C <- function(F_temp){
C_temp <- (F_temp - 32) * 5/9;
return(C_temp);
}
C_to_F <- function(C_temp){
F_temp <- (C_temp * 9/5) + 32;
return(F_temp);
}
That is the whole file for now; just nine lines of code.
The next thing that we need to do is create a new file called
DESCRIPTION
in the SCC_R_package
directory
(note, not in ‘R’, but just outside of it). This will be a
plain text file with no extension, and it will hold some of the
meta-data on the R package. For now, the whole file is just the
following four lines of code, specifying the package name, type, title,
and version number.
Package: SCCTempConverter
Type: Package
Title: Temperature Conversion Package for Demonstration
Version: 0.0.1.0
If we really wanted to call it quits, this is technically an R
package, albeit an extremely basic one. We could load it using the code
above after first reading in the devtools
library.
library(devtools);
load_all("."); # Working directory should be in the package SCC_R_package
Note that the working directory needs to be set correctly to the R
package directory (e.g., using the setwd
function, or by
choosing Session > Set Working Directory
from the pull
down menu of RStudio). In doing this, the above functions
F_to_C
and C_to_F
are now read into R and we
can use them to convert temperatures.
F_to_C(79);
## [1] 26.11111
C_to_F(20);
## [1] 68
This is not a good stopping point for writing a package though, because we really should include some sort of documentation explaining what the package is for and helping users know what functions do.
To get started on a proper R package complete with documentation, the
best thing to do is to create a new R project. To do this in Rstudio, go
to File > New Project...
; the box below should pop
up.
Note that we could have started with a project right away, creating a
new folder with the New Directory option. Instead, we
will create the project in our Existing Directory,
SCC_R_package
by choosing the middle option. The following
box should appear.
The box above is asking for the local directory in which the project
will be stored. Mine is shown above, but yours will be different
depending on where SCC_R_package
is stored. After clicking
‘Create Project’, you should be able to see the project inside the
package directory.
The R project is shown above as SCC_R_package.Rproj
.
Note that there are a couple other new things in the directory above,
including .Rproj.user
and .Rbuildignore
. These
are hidden files, so you might not see these in your own directory
unless you explicitly ask your computer to show hidden files. The folder
.Rproj.user
is not really important; it stores some more
meta-data about the package development. The file
.Rbuildignore
is not important for now, but could be useful
later; this is just a plain text file that tells R to ignore selected
files or folders when building the package (e.g., if we wanted to
include a folder for our own purposes that is not needed or wanted for
building the package). The interface in RStudio should now look
something like the below.
The colours you use might vary, but you should see the ‘SCC_R_package’ in the upper right indicating the project name.
If we want others to use the functions that we have written, we need
to provide some documentation for them. Documentation shows up in the
‘Help’ tab of RStudio when running the function help
. You
can run the following code to see what I mean.
help(lm);
Note that the code below does the same thing as the code above.
?lm
You should see a tab pop up somewhere in Rstudio that reads a
markdown file with a helpful explanation of the lm
function
in R.
You can make one of these helpful markdown files in Rstudio using the
roxygen2
package. To do this, we need to add to the
functions written in the temp_conversion.R
file. The code
below shows a simple example.
#' Fahrenheit conversion
#'
#' Convert degrees Fahrenheit temperatures to degrees Celsius
#' @param F_temp The temperature in degrees Fahrenheit
#' @return The temperature in degrees Celsius
#' @examples
#' temp1 <- F_to_C(50);
#' temp2 <- F_to_C( c(50, 63, 23) );
#' @export
F_to_C <- function(F_temp){
C_temp <- (F_temp - 32) * 5/9;
return(C_temp);
}
#' Celsius conversion
#'
#' Convert degrees Celsius temperatures to degrees Fahrenheit
#' @param C_temp The temperature in degrees Celsius
#' @return The temperature in degrees Fahrenheit
#' @examples
#' temp1 <- C_to_F(22);
#' temp2 <- C_to_F( c(-2, 12, 23) );
#' @export
C_to_F <- function(C_temp){
F_temp <- (C_temp * 9/5) + 32;
return(F_temp);
}
Note that the total length of the code has increased considerably to
add in the documentation, but we now have some helpful reminders of how
to use each function. The first line (e.g.,
#' Fahrenheit conversion
) shows the function title, with
the next line showing the description. Additional tags such as
@param
and @examples
are used to write
different subsections of the help file. These are not the only tags
available; for more details about the Roxygen format, see Karl Broman’s
page or Hadley Wickham’s introduction
to roxygen2. Using the above format, the roxygen2 package
makes it easy to create help files in markdown. All that we need to do
is make sure that the project is open and that the working directory is
correct (typing getwd()
should return the directory of our
R package), then run the below in the console.
library(roxygen2); # Read in the roxygen2 R package
roxygenise(); # Builds the help files
Here is what our package directory looks like now.
Note that two things have been added. The first is a new directory
called ‘man’, which holds the help files that we have written. The
second is a plain text file NAMESPACE
, which works with R
to integrate them into the package correctly; you do not need to edit
NAMESPACE
manually, in fact, the file itself tells you not
to edit it. Here are the entire contents of NAMESPACE
.
# Generated by roxygen2: do not edit by hand
export(C_to_F)
export(F_to_C)
Inside the ‘man’ folder, there are two new markdown documents, one for each function.
Both are plain text files. Here are the contents of
F_to_C.Rd
.
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/temp_conversion.R
\name{F_to_C}
\alias{F_to_C}
\title{Fahrenheit conversion}
\usage{
F_to_C(F_temp)
}
\arguments{
\item{F_temp}{The temperature in degrees Fahrenheit}
}
\value{
The temperature in degrees Celsius
}
\description{
Convert degrees Fahrenheit temperatures to degrees Celsius
}
\examples{
temp1 <- F_to_C(50);
temp2 <- F_to_C( c(50, 63, 23) );
}
We can load the package now and ask for help with
F_to_C
.
?F_to_C;
RStudio will present the below in the ‘Help’ tab.
Now that we have the key functions and documentation, we can upload this to GitHub for the world to see and use.
Note that putting the R package on GitHub is not a requirement, but it is probably the easiest way to share your work. Before uploading the R package to GitHub, I will add one more folder to the repository.
I use the arbitrarily named ‘notebook’ folder to hold various files that I want to be available to me in development, but not actually present in the R package. I can make the R package ignore this in build by adding a single line of code to the ‘.Rbuildignore’ file mentioned earlier. Below are the entire contents of the ‘.Rbuildignore’ file.
^.*\.Rproj$
^\.Rproj\.user$
notebook*
The lines ^.*\.Rproj$
and ^\.Rproj\.user$
were already added automatically by RStudio. My added line
notebook*
tells R to ignore anything that follows
‘notebook’ in the directory. This would include anything in the folder
‘notebook’ (e.g., ‘notebook/file1.txt’), but also any folder or file
that starts out with these characters (e.g., ‘notebook2/file1.txt’ or
‘notebook_stuff.txt’). I will now add these notes and all the images I
have used to this folder.
With the notebook folder now added, I need to initialise a new GitHub repository (see version control notes for help). After doing this for Stirling Coding Club’s organisation, here is what it looks like on GitHub.
The R package is now live. Anyone can download it by using the
install_github
function in the devtools
package. To do so, type the below into the RStudio console.
library(devtools) # Make sure that the devtools library is loaded
install_github("StirlingCodingClub/SCC_R_package");
Our R package is now installed. We can start using it by reading it in as a normal package.
library(SCCTempConverter);
F_to_C(30);
## [1] -1.111111
That is it! We can share the location of the R package with colleagues who we think might make use of its R functions. If you want to you can stop here, but I will press on with a few more helpful tips and tricks in the next section.
Additional subdirectories
The subdirectories (i.e., folders) that I have walked you through are not the only ones that are useful to include in an R package. Here, for example, is what the directory of the GMSE R package looks like.
There is a lot of extra stuff here, but the following are what each folder contains:
.rda
format
(e.g., using save()
in R), and can be loaded using
data
when a package is read into R (e.g.,
data(cars)
in base R).One more folder that could be useful but is not in the GMSE R package above is the following:
Building a source package
We can build a source package (i.e., a zipped version of the R
package) in Rstudio by selecting
Build > Build Source Package
. This will create a zipped
package outside of the package directory, which would be what we would
need to build if we wanted to submit our package to CRAN.
Tagging a version
It is sometimes helpful to ‘tag’ a particular commit in git to identify a particular version or ‘release’ of your R package (e.g., in GMSE). I did not go into detail about using git tags in the version control session, but the general idea is that a tag is essentially a commit that has a meaningful name rather than a large number – the tag is therefore a snapshot of a particular point in the history of the repository that is of particular interest. In the command line, a commit works as below.
git tag -a v0.0.1.0 -m "my first version of SCC_R_package"
git push -u origin v0.0.1.0
Note that the BASH code above would create the tag ‘v0.0.1.0’ with the quoted message in the first line. In the second line, it would push the tag to GitHub. We can do the same thing in GitKraken with a more friendly graphical user interface.
To tag any commit, right click on the commit and select ‘Create tag here’. This allows you to name the commit, and the name will show up on the left hand side in GitKraken.
See the ‘SCCTempConverter.v0.0.1.0’ tag on the left. To push this tag to GitHub, right click on this tag and select ‘Push SCCTempConverter.v0.0.1.0 to origin’. We can now see that there is one release in the GitHub repository.
If we click on this, we would see the version we tagged.