Stockholm University, VT2022
In package Ecdat there is the dataset
MoneyUS, M1 in the US 1954:Q1–1994Q4 in the variable
m.
auto.arima() with default settings and make a note
about the model order.auto.arima()).auto.arima() with
default settings. Which model is selected? Use the Ljung-Box and
Jarque-Bera tests to analyse the model. How does it compare to the
model(s) you selected?auto.arima()) with AIC and BIC. Which model should we
select? Comment on any difference compared to the choice of
auto.arima(). If there is a difference, explain why.Use auto.arima() with default settings and make a
note about the model order.
auto.arima() is in the forecast
packagelibrary(forecast)
library(urca)
library(TS)
library(lmtest)
library(portes)
library(tseries)
data(MoneyUS, package = "Ecdat")
m1 <- MoneyUS[, "m"]
Ecdat
documentation? to call the help functionEcdat:: to specify in which namespace to
look
MoneyUS is not in R’s
namespacedata(...) it entersEcdat’s
namespaceMoneyUS is what we want info on?Ecdat::MoneyUS
plot(m1)
model_auto <- auto.arima(m1)
summary(model_auto)
## Series: m1
## ARIMA(0,1,1)(1,0,0)[4]
##
## Coefficients:
## ma1 sar1
## -0.5683 0.0355
## s.e. 0.0650 0.0795
##
## sigma^2 = 2.18: log likelihood = -293.99
## AIC=593.98 AICc=594.13 BIC=603.26
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.001053475 1.462914 1.13101 -35.92468 62.39479 0.7383566
## ACF1
## Training set -0.04178698
ARIMA(0,1,1)(1,0,0)[4] mean?
ARIMA(p, d, q)(P, D, Q)[s](0, 1, 1) means
(1, 0, 0)[4]
Which is the order of integration according to ACF and KPSS?
auto.arima() estimates order of integration \(d = 1\), let us check
acf()acf(m1)
acf() of first difference of the seriesdm1 <- diff(m1)
acf(dm1)
plot(dm1)
type = "mu"), and/or linear trend
(type = "tau")urca packageur.kpss(dm1, type = "mu") |> summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0924
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
|> “pipes” the output of the left-hand side into the
function given on the rightur.kpss(m1, type = "mu") |> summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.7744
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
type = "mu", because there
seems to be no linear trend in either series\[ \Delta x_t = a + b t + \theta x_{t - 1} + \sum_{i = 2}^k \delta_i \Delta x_{t - i} + u_t \]
TS packageADF.test(m1)
## # # # # # # # # # # # # # # # # # # # #
## # # # # # # # # # # # # # # # # # # # #
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # #
## type: trend drift none
## AR1: 0.9 0.9 0.97
## lags: 12 12 12
## # # # # # # # # # # # # # # # # # # # #
## statistic 1pct 5pct 10pct
## tau3 -1.88 -3.99 -3.43 -3.13
## phi3 1.88 8.43 6.49 5.47
## tau2 -1.95 -3.46 -2.88 -2.57
## phi1 1.93 6.52 4.63 3.81
## tau1 -1.15 -2.58 -1.95 -1.62
tau1: test statistic of basic Dickey-Fuller test
(lecture 5, slides 25ff)
tau1 is essentially the \(t\)-statistic of the estimate of \(\theta\)phi1: joint test statistic of Dickey-Fuller test with
drift (lecture 5, slides 38f)
phi1 is an \(F\)-statistic on the estimates of \(\theta\) and \(a\)tau2: test statistic of Dickey-Fuller test with drift
phi1, but now test only for AR
coefficientphi3: joint test statistic of Dickey-Fuller test with
drift and trend (lecture 5 slides 36ff)
phi3 is an \(F\)-statistic on the estimates of \(\theta\) and \(b\)tau3: test statistic of Dickey-Fuller test with drift
and trend
phi3, but only test the AR
coefficientAR1
lags
tau1:
phi1:
tau2:
phi3:
tau3:
ADF.test(dm1)
## # # # # # # # # # # # # # # # # # # # #
## # # # # # # # # # # # # # # # # # # # #
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # #
## type: trend drift none
## AR1: -0.79 -0.78 -0.78
## lags: 12 12 12
## # # # # # # # # # # # # # # # # # # # #
## statistic 1pct 5pct 10pct
## tau3 -12.94 -3.99 -3.43 -3.13
## phi3 83.77 8.43 6.49 5.47
## tau2 -12.95 -3.46 -2.88 -2.57
## phi1 83.84 6.52 4.63 3.81
## tau1 -12.99 -2.58 -1.95 -1.62
Identify, estimate and contol at least one ARIMA-model (not the
model selected by auto.arima()).
# split the plot into 1 row and 2 columns
par(mfrow = c(1, 2))
acf(dm1)
pacf(dm1)
model_011 <- arima(m1, order = c(0, 1, 1))
model_111 <- arima(m1, c(1, 1, 1))
model_012 <- arima(m1, c(0, 1, 2))
lmtest package for easy overviewcoeftest(model_011)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.565614 0.063691 -8.8806 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(model_111)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 -0.16328 0.14278 -1.1435 0.2528158
## ma1 -0.44407 0.13244 -3.3529 0.0007996 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(model_012)
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ma1 -0.604621 0.076843 -7.8682 3.597e-15 ***
## ma2 0.090762 0.084707 1.0715 0.284
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Use the Ljung-Box and Jarque-Bera tests to motivate which model is the best. Is it possible to improve the models you first selected by changing the model order or model outliers in the residuals?
portes packageportes::LjungBox() works with results of
arima()class of the result, and the help file of the
functionclass(model_011)
## [1] "Arima"
?LjungBox
obj argument it lists class
Arima objects as inputs# rule of thumb lag order
q <- floor(0.75 * nobs(model_011)^(1/3))
LjungBox(model_011, lags = 1:q)
## lags statistic df p-value
## 1 0.3492421 0 0.0000000
## 2 1.2981118 1 0.2545584
## 3 1.3456665 2 0.5102608
## 4 1.5377723 3 0.6735803
LjungBox(resid(model_011), lags = 1:q, order = 1)
## lags statistic df p-value
## 1 0.3492421 0 0.0000000
## 2 1.2981118 1 0.2545584
## 3 1.3456665 2 0.5102608
## 4 1.5377723 3 0.6735803
LjungBox(model_111, lags = 1:q)
## lags statistic df p-value
## 1 0.000568582 0 NA
## 2 0.001191449 0 0.0000000
## 3 0.001273873 1 0.9715285
## 4 0.071279074 2 0.9649881
LjungBox(model_012, lags = 1:q)
## lags statistic df p-value
## 1 0.003186927 0 NA
## 2 0.020036528 0 0.0000000
## 3 0.043943566 1 0.8339586
## 4 0.086467415 2 0.9576875
# split the plot into 1 row and 3 columns
par(mfrow = c(1, 3))
qqnorm(resid(model_011), main = "Residuals ARIMA(0, 1, 1)")
qqline(resid(model_011))
qqnorm(resid(model_111), main = "Residuals ARIMA(1, 1, 1)")
qqline(resid(model_111))
qqnorm(resid(model_012), main = "Residuals ARIMA(0, 1, 2)")
qqline(resid(model_012))
tseries packagesjarque.bera.test(resid(model_011))
##
## Jarque Bera Test
##
## data: resid(model_011)
## X-squared = 4.5068, df = 2, p-value = 0.105
jarque.bera.test(resid(model_111))
##
## Jarque Bera Test
##
## data: resid(model_111)
## X-squared = 4.8522, df = 2, p-value = 0.08838
jarque.bera.test(resid(model_012))
##
## Jarque Bera Test
##
## data: resid(model_012)
## X-squared = 5.0357, df = 2, p-value = 0.08063
# ARIMA(0, 1, 1)
res_011 <- resid(model_011)
# dummy series, FALSE at all times, except for the when the minimum is attained
min_011 <- res_011 == min(res_011)
max_011 <- res_011 == max(res_011)
# cbind() to combine both vectors into a 2-column matrix
minmax_011 <- cbind(min_011, max_011)
# ARIMA(1, 1, 1)
res_111 <- resid(model_111)
min_111 <- res_111 == min(res_111)
max_111 <- res_111 == max(res_111)
minmax_111 <- cbind(min_111, max_111)
# ARIMA(0, 1, 2)
res_012 <- resid(model_012)
min_012 <- res_012 == min(res_012)
max_012 <- res_012 == max(res_012)
minmax_012 <- cbind(min_012, max_012)
model_011o <- arima(
m1,
order = c(0, 1, 1),
xreg = minmax_011
)
model_111o <- arima(
m1,
order = c(1, 1, 1),
xreg = minmax_111
)
model_012o <- arima(
m1,
order = c(0, 1, 2),
xreg = minmax_012
)
jarque.bera.test(resid(model_011o))
##
## Jarque Bera Test
##
## data: resid(model_011o)
## X-squared = 0.56017, df = 2, p-value = 0.7557
jarque.bera.test(resid(model_111o))
##
## Jarque Bera Test
##
## data: resid(model_111o)
## X-squared = 0.39482, df = 2, p-value = 0.8209
jarque.bera.test(resid(model_012o))
##
## Jarque Bera Test
##
## data: resid(model_012o)
## X-squared = 0.41489, df = 2, p-value = 0.8127
Now analyse the model selected by auto.arima() with
default settings. Which model is selected? Use the Ljung-Box and
Jarque-Bera tests to analyse the model. How does it compare to the
model(s) you selected?
auto.arima() selected \(\operatorname{ARIMA} (0, 1, 1) (1, 0,
0)_4\)LjungBox() function that prevents
it from working with output from models estimated with the
forecast packageLjungBox(
resid(model_auto),
lags = 1:q,
order = length(coef(model_auto))
)
## lags statistic df p-value
## 1 0.2916394 0 NA
## 2 1.2594192 0 0.0000000
## 3 1.3322025 1 0.2484138
## 4 1.3322176 2 0.5137036
jarque.bera.test(resid(model_auto))
##
## Jarque Bera Test
##
## data: resid(model_auto)
## X-squared = 4.9381, df = 2, p-value = 0.08466
Rank the models (your model(s) and the model selected by
auto.arima()) with AIC and BIC. Which model should we
select? Comment on any difference compared to the choice of
auto.arima(). If there is a difference, explain
why.
ic_rank <- data.frame(
model = c("auto", "011", "111", "012"),
aic = c(AIC(model_auto), AIC(model_011), AIC(model_111), AIC(model_012)),
bic = c(BIC(model_auto), BIC(model_011), BIC(model_111), BIC(model_012))
)
ic_rank[order(ic_rank$aic), ]
## model aic bic
## 2 011 592.1781 598.3656
## 3 111 592.9723 602.2536
## 4 012 593.0403 602.3216
## 1 auto 593.9788 603.2601
auto.arima() not return the best
model?auto.arima(), Details section:
stepwise=FALSE and approximation=FALSE.”model_auto_full <- auto.arima(
m1,
stepwise = FALSE,
approximation = FALSE
)
summary(model_auto_full)
## Series: m1
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.5656
## s.e. 0.0637
##
## sigma^2 = 2.169: log likelihood = -294.09
## AIC=592.18 AICc=592.25 BIC=598.37
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.0008987197 1.463818 1.134902 -35.13617 61.64933 0.7408974
## ACF1
## Training set -0.04572789
Write a commented script that performs the analysis and documents the results.
R CMD BATCH --vanilla "exercise-3.R" "output-3.txt"