R packages: What, Why and How?

HTA Hackathon Belfast 2024

Milan Malfait

2024-08-06

What

  • R packages are the fundamental units of reproducible R code1
  • They bundle together code, data, documentation, and tests, and are easy to share with others

R package structure

But Why?

  • Code sharing / reproducibility
  • Convenience
  • Conventions, good practices
  • It makes you look really cool

How

What if I told you you can create an R package

in just 10 minutes

🀯

{devtools} to the rescue

library(devtools)
create_package("~/htahackathon2024/stringsplitter")
use_git()
use_r("split-string.R")
  1. Set up package structure
  2. Set up version control
  3. Add our first function

Add our first function

R/split-string.R
splitstring <- function(x, sep = ",") {
  strsplit(x, split = sep)[[1]]
}

🏎️ Test drive

load_all()
splitstring("a,b,c")
#> [1] "a" "b" "c"


check()
#> ... (output truncated) ...
#> ── R CMD check ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> βœ”  checking DESCRIPTION meta-information ...
#> βœ”  checking R files for syntax errors ...
#> βœ”  checking whether the package can be loaded ...
#> βœ”  checking R code for possible problems (1.3s)
#> βœ”  checking examples ...
#>
#>
#> ── R CMD check results ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── stringsplitter 0.0.1 ────
#>
#> 0 errors βœ” | 0 warnings βœ” | 0 notes βœ”

🏎️ Test drive

RStudio keyboard shortcuts

  • Shift + Ctrl/Cmd + L to load all
  • Shift + Ctrl/Cmd + E to run checks :::

πŸ“ Add documentation

Write some documentation in roxygen format for your new function

Tip

In RStudio, go to Code -> Insert Roxygen skeleton to make your life easier

πŸ“ Document your function

R/split-string.R
splitstring <- function(x, sep = ",") {
  strsplit(x, split = sep)[[1]]
}

πŸ“ Document your function

R/split-string.R
#' Split a string into a vector of strings
#'
#' @param x a character string to be split
#' @param sep the separator on which to split
#'
#' @return a vector of strings
#' @export
#'
#' @examples
#' splitstring("alfa,bravo,charlie")
#' splitstring("alfa,bravo charlie", sep = " ")
splitstring <- function(x, sep = ",") {
  strsplit(x, split = sep)[[1]]
}

Generate the help pages by running document()

or Ctrl/Cmd + Shift + D in RStudio

πŸ“ Make splitstring() available

R/split-string.R
#' Split a string into a vector of strings
#'
#' @param x a character string to be split
#' @param sep the separator on which to split
#'
#' @return a vector of strings
#' @export
#'
#' @examples
#' splitstring("alfa,bravo,charlie")
#' splitstring("alfa,bravo charlie", sep = " ")
splitstring <- function(x, sep = ",") {
  strsplit(x, split = sep)[[1]]
}

document() will update the NAMESPACE file for us

πŸ“¦ Build and install

install()
#> ── R CMD build ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> βœ”  checking for file β€˜/Users/milan/htahackathon2024/stringsplitter/DESCRIPTION’ ...
#> ─  preparing β€˜stringsplitter’:
#> βœ”  checking DESCRIPTION meta-information ...
#> ─  checking for LF line-endings in source and make files and shell scripts
#> ─  checking for empty or unneeded directories
#> ─  building β€˜stringsplitter_0.0.1.tar.gz’
#>
#> ... output truncated ...
#>
#> ** testing if installed package can be loaded from temporary location
#> ** testing if installed package can be loaded from final location
#> ** testing if installed package keeps a record of temporary installation path
#> * DONE (stringsplitter)

πŸŽ‰ Success!

library(stringsplitter)
splitstring("a,b,c")
#> [1] "a" "b" "c"

Some Good Practices

One of the things research programmers struggle with is the transition from exploration to infrastructure, i.e., from β€œcoding to figure out what the problem is” to β€œI’m building a reusable tool”. Habits from the first are often carried over to the second.

– Tweet from Greg Wilson 2018

Don’t trust rm(list = ls())

Important

If the first line of your R script is

rm(list = ls())

I will come into your office and SET YOUR COMPUTER ON FIRE πŸ”₯

Restart your R session regularly

Recommended RStudio settings

Change these settings in RStudio

Restart RStudio

Don’t do setwd()

Important

If the first line of your R script is

setwd("C:\Users\myself\path\that\only\I\have")

I will come into your office and SET YOUR COMPUTER ON FIRE πŸ”₯

Advanced topics

Debugging

What debugging feels like

Code formatting

Code formatting meme

Because readability is important

Do’s and don’ts

# Don’t do this
if(x<100){
  y<-200}

# Do this instead
if (x < 100) {
  y <- 200
}


# NEVER use the shorthand versions of TRUE and FALSE
# Why? This is why:
1T <- FALSE # this is valid R code
1
If you really want to mess with someone’s R code, see evil.R

Resources