Econometrics 3b

Computer Lab 4: Exercises

Exercise 3

In package Ecdat there is the dataset MoneyUS, M1 in the US 1954:Q1–1994Q4 in the variable m.

Use auto.arima() with default settings and make a note about the model order.
Which is the order of integration according to ACF and KPSS?
Identify, estimate and contol at least one ARIMA-model (not the model selected by auto.arima()).
Use the Ljung-Box and Jarque-Bera tests to motivate which model is the best. Is it possible to improve the models you first selected by changing the model order or model outliers in the residuals?
Now analyse the model selected by auto.arima() with default settings. Which model is selected? Use the Ljung-Box and Jarque-Bera tests to analyse the model. How does it compare to the model(s) you selected?
Rank the models (your model(s) and the model selected by auto.arima()) with AIC and BIC. Which model should we select? Comment on any difference compared to the choice of auto.arima(). If there is a difference, explain why.
Write a commented script that performs the analysis and documents the results.

Solution

a)

Use auto.arima() with default settings and make a note about the model order.

auto.arima() is in the forecast package
load the package and the data, and look at it

library(forecast)
library(urca)
library(TS)
library(lmtest)
library(portes)
library(tseries)

data(MoneyUS, package = "Ecdat")
m1 <- MoneyUS[, "m"]

information on the timeseries are in the Ecdat documentation
look at it like this
- ? to call the help function
- Ecdat:: to specify in which namespace to look
  - by default MoneyUS is not in R’s namespace
  - only when loading it with data(...) it enters
  - it is defined inside Ecdat’s namespace
- MoneyUS is what we want info on

?Ecdat::MoneyUS

plot the series

plot(m1)

looks non-stationary (“long-term” movements), maybe some seasonal component (higher-frequency ups and downs)
now estimate the model

model_auto <- auto.arima(m1)

summary(model_auto)

## Series: m1 
## ARIMA(0,1,1)(1,0,0)[4] 
## 
## Coefficients:
##           ma1    sar1
##       -0.5683  0.0355
## s.e.   0.0650  0.0795
## 
## sigma^2 = 2.18:  log likelihood = -293.99
## AIC=593.98   AICc=594.13   BIC=603.26
## 
## Training set error measures:
##                       ME     RMSE     MAE       MPE     MAPE      MASE
## Training set 0.001053475 1.462914 1.13101 -35.92468 62.39479 0.7383566
##                     ACF1
## Training set -0.04178698

what does ARIMA(0,1,1)(1,0,0)[4] mean?
- lecture 6 slide 39
- information about model selection
- notation on the slides and in documentation is ARIMA(p, d, q)(P, D, Q)[s]
- first parentheses (0, 1, 1) means
  - $p = 0$ order of the autoregressive term
  - $d = 1$ order of integration is 1 (becomes stationary in first difference)
  - $q = 1$ order of the moving-average term
- next parentheses, seasonal structure (1, 0, 0)[4]
  - $P = 1$ order of seasonal autoregressive term
  - $D = 0$ order of integration of the seasonal structure
  - $Q = 0$ order of seasonal moving-average term
  - $s = 4$ frequency of seasonal structure, quarterly in this case

b)

Which is the order of integration according to ACF and KPSS?

auto.arima() estimates order of integration $d = 1$, let us check
- order of integration is the number of times we have to difference to get a stationary series
sidenote:
- stationary process are by definition $d = 0$, they look “flat”
- unit-root processes ($x_t = x_{t - 1} + e_t$ with $\mathbb{E}[e_t] = 0$) are $d = 1$, they look increasing, or decreasing (or both over periods of time)
- higher-order integration rarely analysed, but they typically look like exponential increases/decreases
test for stationarity of the original series

ACF

first, visually using acf()

acf(m1)

very high autocorrelation over a long horizon in the series, indicative of non-stationarity
compare to acf() of first difference of the series

dm1 <- diff(m1)

acf(dm1)

large negative autocorrelation at one lag, but essentially zero for longer horizons
indicative of first difference stationarity
let’s plot the first difference

plot(dm1)

looks stationary, maybe some residual seasonal component

KPSS test

test more formally with KPSS test
- the Null hypothesis is stationarity
- allows for constant (type = "mu"), and/or linear trend (type = "tau")
- implemented in the urca package

ur.kpss(dm1, type = "mu") |> summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.0924 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

|> “pipes” the output of the left-hand side into the function given on the right
interpretation:
- test-statistic is smaller than all critical values, we can not reject the Null
- assume the series is stationary
compare with test result of the original series

ur.kpss(m1, type = "mu") |> summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.7744 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

test-statistic slightly bigger than critical value at 1% level
- reject the Null of stationarity
in either case we choose type = "mu", because there seems to be no linear trend in either series

ADF test

another possibility is the augmented Dickey-Fuller test (ADF-test; lecture 5, slide 46)
estimates the model

\[ \Delta x_t = a + b t + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t \]

parameter of interest is $\theta = \phi - 1$
more robust to autocorrelation by including past differences as regressors
Michael wrote a function in his TS package
this function estimates 3 versions of the model above:
- $a = b = 0$
- $b = 0$
- full model
presents test statistics and critical values

ADF.test(m1)

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.9   0.9   0.97 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3     -1.88 -3.99 -3.43 -3.13
## phi3      1.88  8.43  6.49  5.47
## tau2     -1.95 -3.46 -2.88 -2.57
## phi1      1.93  6.52  4.63  3.81
## tau1     -1.15 -2.58 -1.95 -1.62

explanation:
- tau1: test statistic of basic Dickey-Fuller test (lecture 5, slides 25ff)
  - estimates the model $\Delta x_t = \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t$; no drift (constant), no trend
  - $H_0: \theta = 0$ and $H_1: \theta < 0$, where $\theta = \phi - 1$
  - tau1 is essentially the $t$-statistic of the estimate of $\theta$
  - but we have to compare to a different distribution (the DF-distribution)
- phi1: joint test statistic of Dickey-Fuller test with drift (lecture 5, slides 38f)
  - estimates the model $\Delta x_t = a + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t$; drift, no trend
  - $H_0: \theta = a = 0$ and $H_1: \neg H_0$
  - phi1 is an $F$-statistic on the estimates of $\theta$ and $a$
  - again with special critical values
- tau2: test statistic of Dickey-Fuller test with drift
  - same model as for phi1, but now test only for AR coefficient
  - $H_0: \theta = 0$, $H_1: \theta < 0$
- phi3: joint test statistic of Dickey-Fuller test with drift and trend (lecture 5 slides 36ff)
  - estimates the model $\Delta x_t = a + b t + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t$; drift and trend
  - $H_0: \theta = b = 0$ and $H_1: \neg H_0$$
  - phi3 is an $F$-statistic on the estimates of $\theta$ and $b$
- tau3: test statistic of Dickey-Fuller test with drift and trend
  - same model as phi3, but only test the AR coefficient
  - $H_0: \theta = 0$, $H_1: \theta < 0$
interpretation
- AR1
  - the $\operatorname{AR}(1)$ coefficient $\phi$
  - close to 1 for all model specifications
- lags
  - number of lagged differences included in the ADF tests, selected by information criteria
- tau1:
  - greater than all critical values (smaller in absolute value)
  - can not reject Null of unit root in the no-drift, no-trend model
- phi1:
  - smaller than all critical values (also in absolute value)
  - can not reject the Null of unit root and no drift in the drift, no-trend model
- tau2:
  - greater than all critical values (smaller in absolute value)
  - can not reject the Null of unit root in the drift, no-trend model
- phi3:
  - smaller than all critical values (also in absolute value)
  - can not reject the Null of unit root and no trend in the unrestricted model
- tau3:
  - greater than all critical values (smaller in absolute value)
  - can not reject the null of unit root in the unrestricted model
that’s a lot, but we could never reject the Null of unit root in any of the models
we can be pretty confident that the model is non-stationary
for completeness test the first differenced series

ADF.test(dm1)

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    -0.79   -0.78   -0.78 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3    -12.94 -3.99 -3.43 -3.13
## phi3     83.77  8.43  6.49  5.47
## tau2    -12.95 -3.46 -2.88 -2.57
## phi1     83.84  6.52  4.63  3.81
## tau1    -12.99 -2.58 -1.95 -1.62

the estimated $\phi$s are all far from 1
the test statistics are all far beyond the critical values
- strong rejection of the unit root Null in the first-differenced series
- further support of the order of integration being $d = 1$

c)

Identify, estimate and contol at least one ARIMA-model (not the model selected by auto.arima()).

we know from b) that $d = 1$, so now find some way of determining $p$ and $q$
visual inspection is the first method of choice
- look at both ACF and PACF of the differenced series

# split the plot into 1 row and 2 columns
par(mfrow = c(1, 2))

acf(dm1)
pacf(dm1)

pretty large negative autocorrelation with 1 lag, and decaying partial autocorrelation
- hints at $\operatorname{MA} (1)$, or $\operatorname{ARIMA} (0, 1, 1)$ here
let’s estimate some models

model_011 <- arima(m1, order = c(0, 1, 1))
model_111 <- arima(m1, c(1, 1, 1))
model_012 <- arima(m1, c(0, 1, 2))

look at coefficients
- load lmtest package for easy overview

coeftest(model_011)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.565614   0.063691 -8.8806 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(model_111)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.16328    0.14278 -1.1435 0.2528158    
## ma1 -0.44407    0.13244 -3.3529 0.0007996 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(model_012)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.604621   0.076843 -7.8682 3.597e-15 ***
## ma2  0.090762   0.084707  1.0715     0.284    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

$\operatorname{MA} (1)$ term is the only significant coefficient in all models
- indicative, but no “real” criterion

d)

Use the Ljung-Box and Jarque-Bera tests to motivate which model is the best. Is it possible to improve the models you first selected by changing the model order or model outliers in the residuals?

now look at the models’ residuals
- Ljung-Box test to test for autocorrelation
- Jarque-Bera test to test for normality (if the residuals follow a Normal/Gaussian distribution)

Autocorrelation

start with autocorrelation, Ljung-Box test of the residuals
- load portes package
- portes::LjungBox() works with results of arima()
- check class of the result, and the help file of the function

class(model_011)

## [1] "Arima"

?LjungBox

for the obj argument it lists class Arima objects as inputs

# rule of thumb lag order
q <- floor(0.75 * nobs(model_011)^(1/3))

LjungBox(model_011, lags = 1:q)

##  lags statistic df   p-value
##     1 0.3492421  0 0.0000000
##     2 1.2981118  1 0.2545584
##     3 1.3456665  2 0.5102608
##     4 1.5377723  3 0.6735803

results are the same as

LjungBox(resid(model_011), lags = 1:q, order = 1)

##  lags statistic df   p-value
##     1 0.3492421  0 0.0000000
##     2 1.2981118  1 0.2545584
##     3 1.3456665  2 0.5102608
##     4 1.5377723  3 0.6735803

high $p$-values, can not reject the null of no autocorrelation
test the other models as well

LjungBox(model_111, lags = 1:q)

##  lags   statistic df   p-value
##     1 0.000568582  0        NA
##     2 0.001191449  0 0.0000000
##     3 0.001273873  1 0.9715285
##     4 0.071279074  2 0.9649881

LjungBox(model_012, lags = 1:q)

##  lags   statistic df   p-value
##     1 0.003186927  0        NA
##     2 0.020036528  0 0.0000000
##     3 0.043943566  1 0.8339586
##     4 0.086467415  2 0.9576875

again, for relevant lags $p$-values are high
all models fare well with regards to autocorrelation in the residuals

Normality

another metric is normality of the residuals
in an ideal world, residuals follow a Normal/Gaussian distribution
- often made technical assumption to derive results of different models and estimators
- typically not satisfied in reality
first test is visual again:
- Q-Q plot, compare quantiles of empirical distribution of residuals to theoretical counterparts
- identify the range of values of outliers

# split the plot into 1 row and 3 columns
par(mfrow = c(1, 3))

qqnorm(resid(model_011), main = "Residuals ARIMA(0, 1, 1)")
qqline(resid(model_011))

qqnorm(resid(model_111), main = "Residuals ARIMA(1, 1, 1)")
qqline(resid(model_111))

qqnorm(resid(model_012), main = "Residuals ARIMA(0, 1, 2)")
qqline(resid(model_012))

residuals of all models “pretty normal” around the zero mean, but more extreme both for very positive and very negative values
all models look very similar, difficult to judge
formal test is Jarque-Bera
- implemented in tseries packages
- the Null is normality of the residuals

jarque.bera.test(resid(model_011))

## 
##  Jarque Bera Test
## 
## data:  resid(model_011)
## X-squared = 4.5068, df = 2, p-value = 0.105

$p$-value large, so we can not reject the Null of normality

jarque.bera.test(resid(model_111))

## 
##  Jarque Bera Test
## 
## data:  resid(model_111)
## X-squared = 4.8522, df = 2, p-value = 0.08838

jarque.bera.test(resid(model_012))

## 
##  Jarque Bera Test
## 
## data:  resid(model_012)
## X-squared = 5.0357, df = 2, p-value = 0.08063

$p$-values smaller, at the 10% significance level we do reject normality of the residuals for these models
- indicative of the first model being “more correctly” specified

Controlling for Outliers

ad-hoc procedure to add dummy variables for outliers in the original timeseries
can only be done ex-post (we have to know the outliers)
interpretation can be difficult, how do we know they are really outliers?
Jarque-Bera test is sensitive to outliers though, so let’s try
approach is to control for the time of the minimum and maximum of the residuals

# ARIMA(0, 1, 1)
res_011 <- resid(model_011)

# dummy series, FALSE at all times, except for the when the minimum is attained
min_011 <- res_011 == min(res_011)
max_011 <- res_011 == max(res_011)

# cbind() to combine both vectors into a 2-column matrix
minmax_011 <- cbind(min_011, max_011)

# ARIMA(1, 1, 1)
res_111 <- resid(model_111)

min_111 <- res_111 == min(res_111)
max_111 <- res_111 == max(res_111)

minmax_111 <- cbind(min_111, max_111)

# ARIMA(0, 1, 2)
res_012 <- resid(model_012)

min_012 <- res_012 == min(res_012)
max_012 <- res_012 == max(res_012)

minmax_012 <- cbind(min_012, max_012)

estimate models again, including these regressors

model_011o <- arima(
  m1,
  order = c(0, 1, 1),
  xreg = minmax_011
  )

model_111o <- arima(
  m1,
  order = c(1, 1, 1),
  xreg = minmax_111
  )

model_012o <- arima(
  m1,
  order = c(0, 1, 2),
  xreg = minmax_012
  )

and run the Jarque-Bera tests again

jarque.bera.test(resid(model_011o))

## 
##  Jarque Bera Test
## 
## data:  resid(model_011o)
## X-squared = 0.56017, df = 2, p-value = 0.7557

jarque.bera.test(resid(model_111o))

## 
##  Jarque Bera Test
## 
## data:  resid(model_111o)
## X-squared = 0.39482, df = 2, p-value = 0.8209

jarque.bera.test(resid(model_012o))

## 
##  Jarque Bera Test
## 
## data:  resid(model_012o)
## X-squared = 0.41489, df = 2, p-value = 0.8127

$p$-values are much larger now, can not reject the Null of normality for any model
looking at how the test statistic of the Jarque-Bera is calculated link
- estimated kurtosis (the fourth moment) squared
- this number gets relatively huge for the outliers, removing them drastically decreases the value of the test statistic

e)

Now analyse the model selected by auto.arima() with default settings. Which model is selected? Use the Ljung-Box and Jarque-Bera tests to analyse the model. How does it compare to the model(s) you selected?

auto.arima() selected $\operatorname{ARIMA} (0, 1, 1) (1, 0, 0)_4$
let’s run the tests
- there is a bug in the LjungBox() function that prevents it from working with output from models estimated with the forecast package

LjungBox(
  resid(model_auto),
  lags = 1:q,
  order = length(coef(model_auto))
  )

##  lags statistic df   p-value
##     1 0.2916394  0        NA
##     2 1.2594192  0 0.0000000
##     3 1.3322025  1 0.2484138
##     4 1.3322176  2 0.5137036

jarque.bera.test(resid(model_auto))

## 
##  Jarque Bera Test
## 
## data:  resid(model_auto)
## X-squared = 4.9381, df = 2, p-value = 0.08466

performance comparable to the other models, slightly worse than the non-seasonal $\operatorname{ARIMA} (0, 1, 1)$

f)

Rank the models (your model(s) and the model selected by auto.arima()) with AIC and BIC. Which model should we select? Comment on any difference compared to the choice of auto.arima(). If there is a difference, explain why.

information criteria as a more formal model selection
allows easy comparison of nested models (which all our models are to each other)
AIC and BIC just two criteria with slightly different properties

ic_rank <- data.frame(
  model = c("auto", "011", "111", "012"),
  aic = c(AIC(model_auto), AIC(model_011), AIC(model_111), AIC(model_012)),
  bic = c(BIC(model_auto), BIC(model_011), BIC(model_111), BIC(model_012))
)

ic_rank[order(ic_rank$aic), ]

##   model      aic      bic
## 2   011 592.1781 598.3656
## 3   111 592.9723 602.2536
## 4   012 593.0403 602.3216
## 1  auto 593.9788 603.2601

$\operatorname{ARIMA} (0, 1, 1)$ is the best by both AIC and BIC
why is does auto.arima() not return the best model?
devil is in the detail (literally)
- check help page of auto.arima(), Details section:
  - “The default arguments are designed for rapid estimation of models for many time series. If you are analysing just one time series, and can afford to take some more time, it is recommended that you set stepwise=FALSE and approximation=FALSE.”
  - so let’s try that
  - the function is cutting some corners to produce results more quickly

model_auto_full <- auto.arima(
  m1,
  stepwise = FALSE,
  approximation = FALSE
  )

summary(model_auto_full)

## Series: m1 
## ARIMA(0,1,1) 
## 
## Coefficients:
##           ma1
##       -0.5656
## s.e.   0.0637
## 
## sigma^2 = 2.169:  log likelihood = -294.09
## AIC=592.18   AICc=592.25   BIC=598.37
## 
## Training set error measures:
##                        ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 0.0008987197 1.463818 1.134902 -35.13617 61.64933 0.7408974
##                     ACF1
## Training set -0.04572789

note that it now takes a few seconds to estimate
- the function now brute-force estimates all models for $p = q = 0, 1, ..., 5$; $d = 0, 1, 2$; $P = Q = 0, 1, 2$; and $D = 0, 1$ and compares them
- that’s $6 \times 6 \times 3 \times 3 \times 2 = 648$ models
result is $\operatorname{ARIMA} (0, 1, 1)$, nice!

g)

Write a commented script that performs the analysis and documents the results.

write our interpretation as comments in the code and run it

R CMD BATCH --vanilla "exercise-3.R" "output-3.txt"

Econometrics 3b

Thore Petersen¹

06.04.2022

Computer Lab 4: Exercises

Exercise 3

Solution

a)

b)

ACF

KPSS test

ADF test

c)

d)

Autocorrelation

Normality

Controlling for Outliers

e)

f)

g)

Econometrics 3b

Thore Petersen1

06.04.2022

Computer Lab 4: Exercises

Exercise 3

Solution

a)

b)

ACF

KPSS test

ADF test

c)

d)

Autocorrelation

Normality

Controlling for Outliers

e)

f)

g)

Thore Petersen¹