Stockholm University, VT2022
Computer Lab 1: R Intro
Resources (all available for free)
- “An Introduction to R” ebook that starts from scratch and includes
videos (for beginner
R users) link
- introduction by the developers of
R (for R
beginners with other programming experience) link
- “R for Data Science” ebook for a hands-on introduction to data
science in
R with a particular focus on packages from the
tidyverse (for beginner and intermediate R
users) link
- “Efficient R Programming” ebook on how to efficiently write code,
and how to write efficient code (for intermediate and advanced
R users) link
- “R Packages” ebook on how to write and publish
R
packages (for intermediate and advanced R users) link
- advanced tips and tricks with a nerdy presentation link
- “Advanced R” ebook that goes deep, yet is easy to follow link
What is R?
- open-source programming language developed for statistical analysis,
computation, and visualization
R compared to Stata, SPSS, or
SAS
- has no point-and-click interface
- has a steeper learning curve
- is more difficult to use if you “just want to run regressions”
- is much more flexible (useful when cleaning data, or when using more
involved estimation procedures)
- has a vastly superior online support community, and learning
resources (any question you can have is probably answered on stackoverflow)
- is a “fully fledged” programming language, i.e. can do more
R compared to Matlab, Octave,
or Julia
- is more focussed on statistical analysis (i.e. it is easier to
do)
- has better data-handling capabilities
- is typically slower
R compared to Python
- is more focussed on statistical analysis (i.e. it is easier to
do)
- has better out-of-the-box data-handling capabilities
- is less of an all-round language
R and RStudio
R is the language in the background,
RStudio is “optional”
RStudio is an IDE (integrated development environment),
and a very good one at that (de facto standard)
- use it to write
R scripts, (interactively) execute
code, look at plots, and look at data
- note that correct
R code can run without
RStudio
Download R here
Download RStudio here
Using R
RStudio
- default window arrangement
- left is console, the current
R instance
- top right is environment (current variables), history (previously
executed commands)
- bottom right is very important, help, plots, and more
- highly recommended working with projects
- forces you to keep code and data structured
- idea is to keep everything related to a “project” in a single folder
and subfolders
- both scripts and data
- allows you to use relative file paths
- reproducible by just sharing that folder
Overpowered calculator
1 + 1
## [1] 2
2 - 1
## [1] 1
2 * 5
## [1] 10
4.4 / 2
## [1] 2.2
2^5
## [1] 32
log(10)
## [1] 2.302585
exp(2)
## [1] 7.389056
Assignments
x <- 2
x
## [1] 2
3 * x
## [1] 6
log(x)
## [1] 0.6931472
- assignment possible using
<- or =
R convention is to use <-
= also used for keyword arguments in function calls
(more later)
Data Modes
- three data modes:
numeric, character, and
logical
y <- "tjena"
z <- TRUE
- mode determines what can be done with the object
try(log(y))
## Error in log(y) : non-numeric argument to mathematical function
Data Structures
- four important data structures:
vector,
matrix, data.frame, and list
- special cases:
- a scalar is a length-1
vector
- a
matrix is a 2-dimensional array
- a
data.frame is a special list
vector and matrix
- can contain only elements of one mode
vec1 <- c(1, 2)
vec2 <- c("tja", "tjena")
vec3 <- c(TRUE, FALSE)
matr1 <- matrix(c(1, 2, 3, 4), nrow = 2)
list
- can contain anything (even more
lists)
list1 <- list(2, "tja", TRUE, list("bla bla"))
list is very versatile and flexible, but you have to be
careful
- more or less irrelevant for us now
data.frame
- workhorse for data analysis
- like a
matrix, but data modes can differ between
columns
df1 <- data.frame(x = c(1, 2), y = c("tja", "tjena"))
df2 <- data.frame(x = vec1, y = vec2)
df1
## x y
## 1 1 tja
## 2 2 tjena
df2
## x y
## 1 1 tja
## 2 2 tjena
Subsetting
- access individual elements, or ranges of elements using
[]
R is one-indexed: first element of vector is
x[1] (unlike e.g. Python, where it is
x[0])
# shorthand to create a vector containing 1, 2, ..., 10
x <- 1:10
x[3]
## [1] 3
x[5:8]
## [1] 5 6 7 8
- matrix subsetting similar, use
[row, column]
y <- matrix(1:4, nrow = 2, byrow = TRUE)
y[1, 1]
## [1] 1
y[, 2]
## [1] 2 4
- note that the last command returns a “directionless” vector; we
extract a column of the matrix, but get a “standard” vector (i.e. not
row or column)
Time Series
R comes with built-in time series capabilities
x <- ts(1:12)
class(x)
## [1] "ts"
# above is kind of useless, could have just as well used x <- 1:12
# useful feature is to add date and frequency info
x <- ts(1:12, start = c(2022, 1), freq = 12)
# note x-axis label
plot(x)

- using
ts() not strictly necessary, but useful for
understanding data and easier plotting
Functions
- every operation is a function
- performs an action, or sequence of actions
- has a name, and arguments
- takes objects as inputs
- e.g.
c() takes two scalars as inputs, and
combines/concatenates them to one vector
- all functions have help-pages, describing what the function does,
input arguments, and output
help(c)
?c
x <- seq(1, 5, by = 1)
sum(x)
## [1] 15
mean(x)
## [1] 3
r <- rnorm(10)
- functions have default values, look at help page of
rnorm
- functions are organized in packages
Packages
- look at
base package in RStudio
- great advantage of open-source languages like
R or
Python is huge universe of user written packages
- anyone can write and publish a package
- CRAN (Comprehensive R Archive Network) is the central
repository
Is a package “good”?
- when googling for how to do things in
R, sometimes you
find very particular packages that promise to help you
- but is a package “good” (i.e. bug-free and does what it says)?
- google “cran [package name]”, and click the first link
- e.g. “cran dynlm” points us to here
- check the packages metadata:
- version 0.3-6: be alert, early version of the package
- published 2019: OK
- author Achim Zeilis with email address at
@r-project.org: good sign
- another example: “cran did” here
- advanced version number
- published very recently
- authored by one of the developers of the estimator himself ->
good sign
- be aware of low version numbers, old publication dates, and random
authors, the code might contain bugs
- CRAN runs technical checks of the code before publication (i.e. do
the functions run without errors/warnings), does not check if the
numbers are correct
Console and Scripts
- essentially two modes of operation: interactive with the console, or
“organized” with scripts
R scripts are text documents (like .txt,
.csv, .py, .do, …) with file
extension .R
- write your code sequentially in the script, and execute either all
at once from the command line, or line-by-line in
RStudio
Hands-On
Installing Michael’s TS package
- open
RStudio, create project (new directory, call it
“Lab-1” or something)
- download
.tar.gz file from athena, put it into the
folder just created for the project
- go to “Console” in
RStudio, execute
list.files()
- should show
"TS_1.0-2022.1.tar.gz"
- execute
install.packages("urca")
install.packages("vars")
install.packages("TS_1.0-2022.1.tar.gz", type = "source", repo = NULL)
Writing our first script
- create new
R script, save it in the project
directory
- add comment at top of file describing what this is
- load
TS package
- load KPIF data
- calculate average inflation
# Calculate average inflation rate in Sweden
library(TS)
data("KPIF")
print("The average inflation rate in Sweden is")
mean(KPIF)
- save file
- in
RStudio go to bottom left, “Terminal”
- type
RScript "[filename].R", enter
- the script is then run in batch mode
- Michael requires this for all assignments and fails you if it does
not work, so test this!
- reasons for it not to work (on Michael’s computer):
- you use
setwd() with absolute paths in the script
(specify all paths relative to project directory)
- the order of commands in wrong (need to load the package first,
before we can load the data)
- a simple coding error
More?
RStudio with integrated tutorials through
learnr package
- I have not used it, can not judge
- introduces the
tidyverse, which is a particular “style”
of using R
- access in
RStudio in top-right
swirl package
- interactive in the
R console
- that’s how I learned the basics
install.packages("swirl")
library(swirl)
swirl()