Stockholm University, VT2022

Computer Lab 4: Exercises

Exercise 3

In package Ecdat there is the dataset MoneyUS, M1 in the US 1954:Q1–1994Q4 in the variable m.

  1. Use auto.arima() with default settings and make a note about the model order.
  2. Which is the order of integration according to ACF and KPSS?
  3. Identify, estimate and contol at least one ARIMA-model (not the model selected by auto.arima()).
  4. Use the Ljung-Box and Jarque-Bera tests to motivate which model is the best. Is it possible to improve the models you first selected by changing the model order or model outliers in the residuals?
  5. Now analyse the model selected by auto.arima() with default settings. Which model is selected? Use the Ljung-Box and Jarque-Bera tests to analyse the model. How does it compare to the model(s) you selected?
  6. Rank the models (your model(s) and the model selected by auto.arima()) with AIC and BIC. Which model should we select? Comment on any difference compared to the choice of auto.arima(). If there is a difference, explain why.
  7. Write a commented script that performs the analysis and documents the results.

Solution

a)

Use auto.arima() with default settings and make a note about the model order.

  • auto.arima() is in the forecast package
  • load the package and the data, and look at it
library(forecast)
library(urca)
library(TS)
library(lmtest)
library(portes)
library(tseries)

data(MoneyUS, package = "Ecdat")
m1 <- MoneyUS[, "m"]
  • information on the timeseries are in the Ecdat documentation
  • look at it like this
    • ? to call the help function
    • Ecdat:: to specify in which namespace to look
      • by default MoneyUS is not in R’s namespace
      • only when loading it with data(...) it enters
      • it is defined inside Ecdat’s namespace
    • MoneyUS is what we want info on
?Ecdat::MoneyUS
  • plot the series
plot(m1)

  • looks non-stationary (“long-term” movements), maybe some seasonal component (higher-frequency ups and downs)
  • now estimate the model
model_auto <- auto.arima(m1)

summary(model_auto)
## Series: m1 
## ARIMA(0,1,1)(1,0,0)[4] 
## 
## Coefficients:
##           ma1    sar1
##       -0.5683  0.0355
## s.e.   0.0650  0.0795
## 
## sigma^2 = 2.18:  log likelihood = -293.99
## AIC=593.98   AICc=594.13   BIC=603.26
## 
## Training set error measures:
##                       ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 0.001053475 1.462914 1.13101 -35.92468 62.39479 0.7383566
##                     ACF1
## Training set -0.04178698
  • what does ARIMA(0,1,1)(1,0,0)[4] mean?
    • lecture 6 slide 39
    • information about model selection
    • notation on the slides and in documentation is ARIMA(p, d, q)(P, D, Q)[s]
    • first parentheses (0, 1, 1) means
      • \(p = 0\) order of the autoregressive term
      • \(d = 1\) order of integration is 1 (becomes stationary in first difference)
      • \(q = 1\) order of the moving-average term
    • next parentheses, seasonal structure (1, 0, 0)[4]
      • \(P = 1\) order of seasonal autoregressive term
      • \(D = 0\) order of integration of the seasonal structure
      • \(Q = 0\) order of seasonal moving-average term
      • \(s = 4\) frequency of seasonal structure, quarterly in this case
b)

Which is the order of integration according to ACF and KPSS?

  • auto.arima() estimates order of integration \(d = 1\), let us check
    • order of integration is the number of times we have to difference to get a stationary series
  • sidenote:
    • stationary process are by definition \(d = 0\), they look “flat”
    • unit-root processes (\(x_t = x_{t - 1} + e_t\) with \(\mathbb{E}[e_t] = 0\)) are \(d = 1\), they look increasing, or decreasing (or both over periods of time)
    • higher-order integration rarely analysed, but they typically look like exponential increases/decreases
  • test for stationarity of the original series
ACF
  • first, visually using acf()
acf(m1)

  • very high autocorrelation over a long horizon in the series, indicative of non-stationarity
  • compare to acf() of first difference of the series
dm1 <- diff(m1)

acf(dm1)

  • large negative autocorrelation at one lag, but essentially zero for longer horizons
  • indicative of first difference stationarity
  • let’s plot the first difference
plot(dm1)

  • looks stationary, maybe some residual seasonal component
KPSS test
  • test more formally with KPSS test
    • the Null hypothesis is stationarity
    • allows for constant (type = "mu"), and/or linear trend (type = "tau")
    • implemented in the urca package
ur.kpss(dm1, type = "mu") |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0924 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
  • |> “pipes” the output of the left-hand side into the function given on the right
  • interpretation:
    • test-statistic is smaller than all critical values, we can not reject the Null
    • assume the series is stationary
  • compare with test result of the original series
ur.kpss(m1, type = "mu") |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.7744 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
  • test-statistic slightly bigger than critical value at 1% level
    • reject the Null of stationarity
  • in either case we choose type = "mu", because there seems to be no linear trend in either series
ADF test
  • another possibility is the augmented Dickey-Fuller test (ADF-test; lecture 5, slide 46)
  • estimates the model

\[ \Delta x_t = a + b t + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t \]

  • parameter of interest is \(\theta = \phi - 1\)
  • more robust to autocorrelation by including past differences as regressors
  • Michael wrote a function in his TS package
  • this function estimates 3 versions of the model above:
    • \(a = b = 0\)
    • \(b = 0\)
    • full model
  • presents test statistics and critical values
ADF.test(m1)
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.9   0.9   0.97 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3     -1.88 -3.99 -3.43 -3.13
## phi3      1.88  8.43  6.49  5.47
## tau2     -1.95 -3.46 -2.88 -2.57
## phi1      1.93  6.52  4.63  3.81
## tau1     -1.15 -2.58 -1.95 -1.62
  • explanation:
    • tau1: test statistic of basic Dickey-Fuller test (lecture 5, slides 25ff)
      • estimates the model \(\Delta x_t = \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t\); no drift (constant), no trend
      • \(H_0: \theta = 0\) and \(H_1: \theta < 0\), where \(\theta = \phi - 1\)
      • tau1 is essentially the \(t\)-statistic of the estimate of \(\theta\)
      • but we have to compare to a different distribution (the DF-distribution)
    • phi1: joint test statistic of Dickey-Fuller test with drift (lecture 5, slides 38f)
      • estimates the model \(\Delta x_t = a + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t\); drift, no trend
      • \(H_0: \theta = a = 0\) and \(H_1: \neg H_0\)
      • phi1 is an \(F\)-statistic on the estimates of \(\theta\) and \(a\)
      • again with special critical values
    • tau2: test statistic of Dickey-Fuller test with drift
      • same model as for phi1, but now test only for AR coefficient
      • \(H_0: \theta = 0\), \(H_1: \theta < 0\)
    • phi3: joint test statistic of Dickey-Fuller test with drift and trend (lecture 5 slides 36ff)
      • estimates the model \(\Delta x_t = a + b t + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t\); drift and trend
      • \(H_0: \theta = b = 0\) and \(H_1: \neg H_0\)$
      • phi3 is an \(F\)-statistic on the estimates of \(\theta\) and \(b\)
    • tau3: test statistic of Dickey-Fuller test with drift and trend
      • same model as phi3, but only test the AR coefficient
      • \(H_0: \theta = 0\), \(H_1: \theta < 0\)
  • interpretation
    • AR1
      • the \(\operatorname{AR}(1)\) coefficient \(\phi\)
      • close to 1 for all model specifications
    • lags
      • number of lagged differences included in the ADF tests, selected by information criteria
    • tau1:
      • greater than all critical values (smaller in absolute value)
      • can not reject Null of unit root in the no-drift, no-trend model
    • phi1:
      • smaller than all critical values (also in absolute value)
      • can not reject the Null of unit root and no drift in the drift, no-trend model
    • tau2:
      • greater than all critical values (smaller in absolute value)
      • can not reject the Null of unit root in the drift, no-trend model
    • phi3:
      • smaller than all critical values (also in absolute value)
      • can not reject the Null of unit root and no trend in the unrestricted model
    • tau3:
      • greater than all critical values (smaller in absolute value)
      • can not reject the null of unit root in the unrestricted model
  • that’s a lot, but we could never reject the Null of unit root in any of the models
  • we can be pretty confident that the model is non-stationary
  • for completeness test the first differenced series
ADF.test(dm1)
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    -0.79   -0.78   -0.78 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3    -12.94 -3.99 -3.43 -3.13
## phi3     83.77  8.43  6.49  5.47
## tau2    -12.95 -3.46 -2.88 -2.57
## phi1     83.84  6.52  4.63  3.81
## tau1    -12.99 -2.58 -1.95 -1.62
  • the estimated \(\phi\)s are all far from 1
  • the test statistics are all far beyond the critical values
    • strong rejection of the unit root Null in the first-differenced series
    • further support of the order of integration being \(d = 1\)
c)

Identify, estimate and contol at least one ARIMA-model (not the model selected by auto.arima()).

  • we know from b) that \(d = 1\), so now find some way of determining \(p\) and \(q\)
  • visual inspection is the first method of choice
    • look at both ACF and PACF of the differenced series
# split the plot into 1 row and 2 columns
par(mfrow = c(1, 2))

acf(dm1)
pacf(dm1)

  • pretty large negative autocorrelation with 1 lag, and decaying partial autocorrelation
    • hints at \(\operatorname{MA} (1)\), or \(\operatorname{ARIMA} (0, 1, 1)\) here
  • let’s estimate some models
model_011 <- arima(m1, order = c(0, 1, 1))
model_111 <- arima(m1, c(1, 1, 1))
model_012 <- arima(m1, c(0, 1, 2))
  • look at coefficients
    • load lmtest package for easy overview
coeftest(model_011)
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.565614   0.063691 -8.8806 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(model_111)
## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.16328    0.14278 -1.1435 0.2528158    
## ma1 -0.44407    0.13244 -3.3529 0.0007996 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(model_012)
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.604621   0.076843 -7.8682 3.597e-15 ***
## ma2  0.090762   0.084707  1.0715     0.284    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • \(\operatorname{MA} (1)\) term is the only significant coefficient in all models
    • indicative, but no “real” criterion
d)

Use the Ljung-Box and Jarque-Bera tests to motivate which model is the best. Is it possible to improve the models you first selected by changing the model order or model outliers in the residuals?

  • now look at the models’ residuals
    • Ljung-Box test to test for autocorrelation
    • Jarque-Bera test to test for normality (if the residuals follow a Normal/Gaussian distribution)
Autocorrelation
  • start with autocorrelation, Ljung-Box test of the residuals
    • load portes package
    • portes::LjungBox() works with results of arima()
    • check class of the result, and the help file of the function
class(model_011)
## [1] "Arima"
?LjungBox
  • for the obj argument it lists class Arima objects as inputs
# rule of thumb lag order
q <- floor(0.75 * nobs(model_011)^(1/3))

LjungBox(model_011, lags = 1:q)
##  lags statistic df   p-value
##     1 0.3492421  0 0.0000000
##     2 1.2981118  1 0.2545584
##     3 1.3456665  2 0.5102608
##     4 1.5377723  3 0.6735803
  • results are the same as
LjungBox(resid(model_011), lags = 1:q, order = 1)
##  lags statistic df   p-value
##     1 0.3492421  0 0.0000000
##     2 1.2981118  1 0.2545584
##     3 1.3456665  2 0.5102608
##     4 1.5377723  3 0.6735803
  • high \(p\)-values, can not reject the null of no autocorrelation
  • test the other models as well
LjungBox(model_111, lags = 1:q)
##  lags   statistic df   p-value
##     1 0.000568582  0        NA
##     2 0.001191449  0 0.0000000
##     3 0.001273873  1 0.9715285
##     4 0.071279074  2 0.9649881
LjungBox(model_012, lags = 1:q)
##  lags   statistic df   p-value
##     1 0.003186927  0        NA
##     2 0.020036528  0 0.0000000
##     3 0.043943566  1 0.8339586
##     4 0.086467415  2 0.9576875
  • again, for relevant lags \(p\)-values are high
  • all models fare well with regards to autocorrelation in the residuals
Normality
  • another metric is normality of the residuals
  • in an ideal world, residuals follow a Normal/Gaussian distribution
    • often made technical assumption to derive results of different models and estimators
    • typically not satisfied in reality
  • first test is visual again:
    • Q-Q plot, compare quantiles of empirical distribution of residuals to theoretical counterparts
    • identify the range of values of outliers
# split the plot into 1 row and 3 columns
par(mfrow = c(1, 3))

qqnorm(resid(model_011), main = "Residuals ARIMA(0, 1, 1)")
qqline(resid(model_011))

qqnorm(resid(model_111), main = "Residuals ARIMA(1, 1, 1)")
qqline(resid(model_111))

qqnorm(resid(model_012), main = "Residuals ARIMA(0, 1, 2)")
qqline(resid(model_012))

  • residuals of all models “pretty normal” around the zero mean, but more extreme both for very positive and very negative values
  • all models look very similar, difficult to judge
  • formal test is Jarque-Bera
    • implemented in tseries packages
    • the Null is normality of the residuals
jarque.bera.test(resid(model_011))
## 
##  Jarque Bera Test
## 
## data:  resid(model_011)
## X-squared = 4.5068, df = 2, p-value = 0.105
  • \(p\)-value large, so we can not reject the Null of normality
jarque.bera.test(resid(model_111))
## 
##  Jarque Bera Test
## 
## data:  resid(model_111)
## X-squared = 4.8522, df = 2, p-value = 0.08838
jarque.bera.test(resid(model_012))
## 
##  Jarque Bera Test
## 
## data:  resid(model_012)
## X-squared = 5.0357, df = 2, p-value = 0.08063
  • \(p\)-values smaller, at the 10% significance level we do reject normality of the residuals for these models
    • indicative of the first model being “more correctly” specified
Controlling for Outliers
  • ad-hoc procedure to add dummy variables for outliers in the original timeseries
  • can only be done ex-post (we have to know the outliers)
  • interpretation can be difficult, how do we know they are really outliers?
  • Jarque-Bera test is sensitive to outliers though, so let’s try
  • approach is to control for the time of the minimum and maximum of the residuals
# ARIMA(0, 1, 1)
res_011 <- resid(model_011)

# dummy series, FALSE at all times, except for the when the minimum is attained
min_011 <- res_011 == min(res_011)
max_011 <- res_011 == max(res_011)

# cbind() to combine both vectors into a 2-column matrix
minmax_011 <- cbind(min_011, max_011)

# ARIMA(1, 1, 1)
res_111 <- resid(model_111)

min_111 <- res_111 == min(res_111)
max_111 <- res_111 == max(res_111)

minmax_111 <- cbind(min_111, max_111)

# ARIMA(0, 1, 2)
res_012 <- resid(model_012)

min_012 <- res_012 == min(res_012)
max_012 <- res_012 == max(res_012)

minmax_012 <- cbind(min_012, max_012)
  • estimate models again, including these regressors
model_011o <- arima(
  m1,
  order = c(0, 1, 1),
  xreg = minmax_011
  )

model_111o <- arima(
  m1,
  order = c(1, 1, 1),
  xreg = minmax_111
  )

model_012o <- arima(
  m1,
  order = c(0, 1, 2),
  xreg = minmax_012
  )
  • and run the Jarque-Bera tests again
jarque.bera.test(resid(model_011o))
## 
##  Jarque Bera Test
## 
## data:  resid(model_011o)
## X-squared = 0.56017, df = 2, p-value = 0.7557
jarque.bera.test(resid(model_111o))
## 
##  Jarque Bera Test
## 
## data:  resid(model_111o)
## X-squared = 0.39482, df = 2, p-value = 0.8209
jarque.bera.test(resid(model_012o))
## 
##  Jarque Bera Test
## 
## data:  resid(model_012o)
## X-squared = 0.41489, df = 2, p-value = 0.8127
  • \(p\)-values are much larger now, can not reject the Null of normality for any model
  • looking at how the test statistic of the Jarque-Bera is calculated link
    • estimated kurtosis (the fourth moment) squared
    • this number gets relatively huge for the outliers, removing them drastically decreases the value of the test statistic
e)

Now analyse the model selected by auto.arima() with default settings. Which model is selected? Use the Ljung-Box and Jarque-Bera tests to analyse the model. How does it compare to the model(s) you selected?

  • auto.arima() selected \(\operatorname{ARIMA} (0, 1, 1) (1, 0, 0)_4\)
  • let’s run the tests
    • there is a bug in the LjungBox() function that prevents it from working with output from models estimated with the forecast package
LjungBox(
  resid(model_auto),
  lags = 1:q,
  order = length(coef(model_auto))
  )
##  lags statistic df   p-value
##     1 0.2916394  0        NA
##     2 1.2594192  0 0.0000000
##     3 1.3322025  1 0.2484138
##     4 1.3322176  2 0.5137036
jarque.bera.test(resid(model_auto))
## 
##  Jarque Bera Test
## 
## data:  resid(model_auto)
## X-squared = 4.9381, df = 2, p-value = 0.08466
  • performance comparable to the other models, slightly worse than the non-seasonal \(\operatorname{ARIMA} (0, 1, 1)\)
f)

Rank the models (your model(s) and the model selected by auto.arima()) with AIC and BIC. Which model should we select? Comment on any difference compared to the choice of auto.arima(). If there is a difference, explain why.

  • information criteria as a more formal model selection
  • allows easy comparison of nested models (which all our models are to each other)
  • AIC and BIC just two criteria with slightly different properties
ic_rank <- data.frame(
  model = c("auto", "011", "111", "012"),
  aic = c(AIC(model_auto), AIC(model_011), AIC(model_111), AIC(model_012)),
  bic = c(BIC(model_auto), BIC(model_011), BIC(model_111), BIC(model_012))
)

ic_rank[order(ic_rank$aic), ]
##   model      aic      bic
## 2   011 592.1781 598.3656
## 3   111 592.9723 602.2536
## 4   012 593.0403 602.3216
## 1  auto 593.9788 603.2601
  • \(\operatorname{ARIMA} (0, 1, 1)\) is the best by both AIC and BIC
  • why is does auto.arima() not return the best model?
  • devil is in the detail (literally)
    • check help page of auto.arima(), Details section:
      • “The default arguments are designed for rapid estimation of models for many time series. If you are analysing just one time series, and can afford to take some more time, it is recommended that you set stepwise=FALSE and approximation=FALSE.”
      • so let’s try that
      • the function is cutting some corners to produce results more quickly
model_auto_full <- auto.arima(
  m1,
  stepwise = FALSE,
  approximation = FALSE
  )

summary(model_auto_full)
## Series: m1 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.5656
## s.e.   0.0637
## 
## sigma^2 = 2.169:  log likelihood = -294.09
## AIC=592.18   AICc=592.25   BIC=598.37
## 
## Training set error measures:
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0008987197 1.463818 1.134902 -35.13617 61.64933 0.7408974
##                     ACF1
## Training set -0.04572789
  • note that it now takes a few seconds to estimate
    • the function now brute-force estimates all models for \(p = q = 0, 1, ..., 5\); \(d = 0, 1, 2\); \(P = Q = 0, 1, 2\); and \(D = 0, 1\) and compares them
    • that’s \(6 \times 6 \times 3 \times 3 \times 2 = 648\) models
  • result is \(\operatorname{ARIMA} (0, 1, 1)\), nice!
g)

Write a commented script that performs the analysis and documents the results.

  • write our interpretation as comments in the code and run it
R CMD BATCH --vanilla "exercise-3.R" "output-3.txt"

  1. ↩︎