Stockholm University, VT2022
In package Ecdat there is the data set
MoneyUS 1954:Q1–1994:Q4. Use the inflation variable
infl to make pseudo-out–of–sample forecasts for the
horizons \(h = 1, 2, 3, 4\) with first
forecast origin 1984:Q4. Data from 1985:Q1–1994:Q4 is saved for
evaluation purposes. That makes in total 37 forecast origins where all
horizons can be compared to outcome data. Use auto.arima()
to make recursive and rolling forecasts. Test if they are biased. Test
also if the forecasts with the same horizon have the same forecast
precision or not.
In package Ecdat there is the data set
MoneyUS 1954:Q1–1994:Q4. Use the inflation variable
infl to make pseudo-out–of–sample forecasts for the
horizons \(h = 1, 2, 3, 4\) with first
forecast origin 1984:Q4.
library(forecast)
data(MoneyUS, package = "Ecdat")
infl <- MoneyUS[, "infl"]
plot(infl)
R stores quarterly time indices as 1984:Q1 →
1984.00, 1984:Q2 → 1984.25, 1984:Q3 →
1984.50, 1984:Q4 → 1984.75hmax <- 4
origin_1 <- 1984.75
Data from 1985:Q1–1994:Q4 is saved for evaluation purposes. That makes in total 37 forecast origins where all horizons can be compared to outcome data.
n_origins <- length(window(infl, start = origin_1)) - hmax
n_origins
## [1] 37
Use auto.arima() to make recursive and rolling
forecasts.
forecast() resultsmatrix of NAs as
placeholdersre_models <- vector("list", n_origins)
re_forecasts <- vector("list", n_origins)
re <- ts(
data = matrix(NA, nrow = n_origins, ncol = hmax),
start = origin_1,
frequency = frequency(infl)
)
colnames(re) <- paste0("h=", 1:hmax)
origin, it starts with origin_1
(1984:Q4, or 1984.75), and adds 1 quarter (0.25) at each stepwindow() of
infl, extending the end to originmean in the
forecast() output)listsfor (i in 1:n_origins) {
origin <- origin_1 + (i - 1) / 4
dat <- window(infl, end = origin)
m <- auto.arima(dat)
f <- forecast(m, h = hmax)
re[i, ] <- f$mean
re_models[[i]] <- m
re_forecasts[[i]] <- f
}
For a plot like lecture 9, slide 53 here’s some advanced coding. The
idea is that we have one 4-step timeseries for every origin (so 37
ts objects). We want to plot them all individually, but in
the same graph. Note, that the forecast() returns a
ts object with start and
frequency parameters for the point forecast
(mean). This is exactly what we want! But how do we extract
and store these 37 timeseries?
A list comes to mind for storage. To extract them, we
can either run a for-loop, or use one of the
*apply() functions:
re_list <- lapply(re_forecasts, `[[`, "mean")
Remember, we stored the complete results of all
forecast() calls in the list re_forecasts.
lapply now goes through every element of the
list, and applies a function to it. The function in this
case is [[, the subset-operator for lists.
Wait, that’s a little weird?
Usually we use it like list[[1]], when we want to get
the first element of list. But it is actually implemented
as a function in R, so we could also write
`[[`(list, 1) to get the first element of
list. And we make use of exactly that fact here.
So what is then "mean" in the lapply()
call? If you look at ?lapply, you see that it has
... as the third argument. Here we can supply arguments for
the function supplied for FUN. In this case we supply
"mean", which is the name of the list-element
we want to extract using [[.
Alternatively, we can extract it like this.
re_list <- lapply(re_forecasts, function(x) x[["mean"]])
Here we use a so-called anonymous function. This function is not
stored, and has no name (hence anonymous), but is only supplied inside
lapply(). This function is going to be executed for every
element of the list re_forecasts, and every
element from the list supplied as x to the
function. Then we use the subset operator more or less normally. The
results of running the function for each element are then returned as
yet another list.
We have all the ts objects in a list, but
we want them as a mts object, a multivariate timeseries.
For this we are going to use do.call(), a very particular
function (I have used it maybe 3 times in 5 years). This function allows
us to call/execute a function, but instead of specifying arguments in
the function call, we can supply them in a list.
To combine the ts into a mts, we use
ts.union(). It can take any number of ts
objects and combines them into one mts. So what we do
is
re_plot <- do.call(ts.union, re_list)
which is equivalent to
ts.union(re_list[[1]], re_list[[2]], ..., re_list[[37]]).
Now we have a multivariate timeseries with 37 columns, one for each forecast origin. So let’s plot them.
ts.plot(
re_plot,
main = "Recursive ARIMA forecasts",
col = "grey"
)
lines(window(infl, start = origin_1))
legend(
"topleft",
legend = c("Inflation rate", "4-step forecast"),
lty = 1,
col = c("black", "grey")
)
We also plot the actual timeseries, using lines(). By
using this function, all series in the ts.plot() call will
have col = "grey", and the actual series will have the
default color black. This function adds to the previous plot by default.
And of course a legend.
This was a little detour into lapply and
do.call. The code is advanced and not at all expected of
you. But maybe some of you appreciate it.
window()ro_models <- vector("list", n_origins)
ro_forecasts <- vector("list", n_origins)
ro <- ts(
data = matrix(NA, nrow = n_origins, ncol = hmax),
start = origin_1,
frequency = frequency(infl)
)
colnames(ro) <- paste0("h=", 1:hmax)
for (i in 1:n_origins) {
ro_start <- start(infl) + (i - 1) / 4
origin <- origin_1 + (i - 1) / 4
dat <- window(infl, start = ro_start, end = origin)
m <- auto.arima(dat)
f <- forecast(m, h = hmax)
ro[i, ] <- f$mean
ro_models[[i]] <- m
ro_forecasts[[i]] <- f
}
ro_list <- lapply(ro_forecasts, `[[`, "mean")
ro_plot <- do.call(ts.union, ro_list)
ts.plot(
ro_plot,
main = "Rolling ARIMA forecasts",
col = "grey"
)
lines(window(infl, start = origin_1))
legend(
"topleft",
legend = c("Inflation rate", "4-step forecast"),
lty = 1,
col = c("black", "grey")
)
Test if they are biased.
outcome, the actually realized inflation
outcome <- ts(
data = matrix(NA, nrow = n_origins, ncol = hmax),
start = origin_1,
frequency = frequency(infl)
)
for (i in 1:n_origins) {
# we do not want to include the origin itself
start_date <- origin_1 + i / 4
end_date <- start_date + 0.75
outcome[i, ] <- window(infl, start = start_date, end = end_date)
}
re
or ro from outcomere_fe <- outcome - re
ro_fe <- outcome - ro
bias_test <- function(h, fe) {
# Runs a bias test and returns the test's p-value. Supply point forecasts in a
# mts object, with the horizons as columns.
require(dynlm)
require(lmtest)
require(sandwich)
model <- dynlm(fe[, h] ~ 1)
vari <- NeweyWest(model, lag = h - 1)
test <- coeftest(model, vcov. = vari)
pval <- test[1, 4]
return(pval)
}
bias_pvals <- data.frame(
h = 1:hmax,
recursive = NA,
rolling = NA
)
for (h in 1:hmax) {
bias_pvals$recursive[h] <- bias_test(h, re_fe)
bias_pvals$rolling[h] <- bias_test(h, ro_fe)
}
bias_pvals
## h recursive rolling
## 1 1 0.2592169 0.2261795
## 2 2 0.5172674 0.4702605
## 3 3 0.6105550 0.5496581
## 4 4 0.6954412 0.6529150
Test also if the forecasts with the same horizon have the same forecast precision or not.
dm.test() from forecastdata.frame to store the results inprec_pvals <- data.frame(
h = 1:hmax,
pval = NA
)
for (h in 1:hmax) {
test <- dm.test(re_fe[, h], ro_fe[, h], h = h)
prec_pvals$pval[h] <- test$p.value
}
prec_pvals
## h pval
## 1 1 0.6458881
## 2 2 0.8033437
## 3 3 0.5205088
## 4 4 0.2878299