Stockholm University, VT2022

Computer Labs 5 and 6: Exercises

Exercise 4

In package Ecdat there is the data set Macrodat, used in exercise 2 above, where variable lhur is the unemployment rate and punew is a quarterly inflation index.

  1. Define a data set for the unemployment rate and the yearly inflation rate.
  2. Determine the order of integration for the two variables using an ADF-test.
  3. Determine whether the variables are cointegrated or not using the Engle-Granger ADF-test. Hint: Estimate S&W 16.24 (in 3. ed; 17.24 in 4. ed).
  4. Estimate a VAR model where the lag order is determined by AIC.
  5. Determine whether any of the variables Granger-causes the other. Hint: One does, the other not.
  6. Plot all four possible impulse response functions using the orthogonal innovations.
  7. Produce a (one) plot, a so-called fan chart, of the forecasts for ten years ahead for the variable that is Granger-caused. The title of the plot should read “Forecasts for the XXX rate.”, where XXX is replaced with the variable’s name and the label on the vertical axis should be a “%”.

Solution

a)

Define a data set for the unemployment rate and the yearly inflation rate.

  • we are pros at loading data by now
  • looking at ?Ecdat::Macrodat, punew is the CPI
    • we have to calculate the inflation rate from the index
  • there are different concepts of “yearly inflation rate”:
    • annualized quarter-on-quarter inflation rate
      • “annual inflation rate, if every quarter has this quarter’s inflation rate”
      • takes care of compounding
    • inflation rate over same quarter last year
      • simpler, but different interpretation
    • let us compare
  • but first, plot the CPI
library(TS)
library(dynlm)
library(vars)
library(forecast)

data(Macrodat, package = "Ecdat")

cpi <- Macrodat[, "punew"]

plot(cpi)

Quarter-on-quarter inflation
  • calculate inflation rate quarter-on-quarter (QoQ), i.e. percentage change to last quarter
infl_qoq <- diff(log(cpi))
Sidenote

Why diff(log())?

It’s a computationally efficient method to approximate percentage changes.

\[ \Delta \log(x_t) = \log(x_t) - \log(x_{t - 1}) = \log \left(\frac{x_t}{x_{t - 1}} \right) = \log \left( 1 + \frac{x_t - x_{t - 1}}{x_{t - 1}} \right) \approx \frac{x_t - x_{t - 1}}{x_{t - 1}} \]

The approximation is better, the smaller the percentage change.

Annualizing
  • note that infl_qoq is in decimal scale of percentage (\(1\% \widehat{=} 0.01\))
  • what is then today’s price level \(P_t\) given \(P_{t - 1}\) and \(\pi_t\)?

\[ P_t = P_{t - 1} + \pi_t P_{t - 1} = (1 + \pi_t) P_{t - 1} \]

  • where \(\pi_t\) has the same scale as infl_qoq
  • consider \(t\) as quarters, then what is the yearly inflation rate?

\[ P_t = (1 + \pi_t) P_{t - 1} = (1 + \pi_t) (1 + \pi_{t - 1}) P_{t - 2} = (1 + \pi_t) (1 + \pi_{t - 1}) (1 + \pi_{t - 2}) P_{t - 3} = (1 + \pi_t) (1 + \pi_{t - 1}) (1 + \pi_{t - 2}) (1 + \pi_{t - 3}) P_{t - 4} \]

  • we have compounding of the inflation rates
  • the annualized inflation rate answers: What is the yearly inflation rate, if we have the same rate every quarter?
  • so suppose here \(\pi_t = \pi_{t - 1} = \pi_{t - 2} = \pi_{t - 2} = \pi_{t - 3}\), then

\[ P_t =(1 + \pi_t) (1 + \pi_{t - 1}) (1 + \pi_{t - 2}) (1 + \pi_{t - 3}) P_{t - 4} = (1 + \pi_t)^4 P_{t - 4} \]

  • then the formula to annualize is
infl_ann <- (1 + infl_qoq)^4 - 1
  • we multiply by 100 to get percentage scale
infl_ann <- infl_ann * 100
infl_qoq <- infl_qoq * 100
Inflation over past year
  • alternative question: What is the inflation rate over this quarter last year?
  • takes care of seasonality
infl_y <- diff(log(cpi), lag = 4)

infl_y <- infl_y * 100
  • compare the rates
ts.plot(
    infl_qoq, infl_ann, infl_y,
    lty = 1:3,
    main = "Inflation rate comparison"
)
legend(
    "topright",
    legend = c("QoQ", "QoQ annualized", "Over last year"),
    lty = 1:3
)

  • QoQ is “too small”, it measures price changes over shorter time horizons
  • annualized and over last year similar
  • the default is to use annualized values
    • easy to compare magnitude with yearly inflation rates
  • now to main exercise, combining into data set
    • when calculating the inflation rate we lose one observation
    • ts.intersect() takes care of that, creating a multivariate time series, restricting the longer series
dat <- ts.intersect(infl = infl_ann, unemp = Macrodat[, "lhur"])

ts.plot(dat, lty = 1:2)
legend(
  "topright", 
  legend = c("Inflation", "Unemployment"), 
  lty = 1:2
  )

b)

Determine the order of integration for the two variables using an ADF-test.

  • we are pros at this after last session
  • remember the ADF-test
    • Null hypothesis is unit root (non-stationarity)
    • again load Michael’s TS library
Inflation
ADF.test(dat[, "infl"])
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.88   0.89   0.97 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3     -2.72 -3.99 -3.43 -3.13
## phi3      3.88  8.43  6.49  5.47
## tau2     -2.62 -3.46 -2.88 -2.57
## phi1      3.44  6.52  4.63  3.81
## tau1     -1.31 -2.58 -1.95 -1.62
  • test statistics smaller in absolute value than all critical values
    • except for tau2 at 10% significance
    • in model with drift we can reject unit root at 10% level, but not at 5%
  • now test first difference
ADF.test(diff(dat[, "infl"]))
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    -1.2   -0.62   -0.61 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3     -5.45 -3.99 -3.43 -3.13
## phi3     14.84  8.43  6.49  5.47
## tau2     -5.16 -3.46 -2.88 -2.57
## phi1     13.32  6.52  4.63  3.81
## tau1     -5.18 -2.58 -1.95 -1.62
  • we reject unit root in all specifications and levels of significance
  • no rejection in original series, rejection in first difference
    • inflation is first-difference stationary, i.e. order of integration is 1
Unemployment
ADF.test(dat[, "unemp"])
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.97   0.97   1 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3     -2.51 -3.99 -3.43 -3.13
## phi3      3.31  8.43  6.49  5.47
## tau2     -2.58 -3.46 -2.88 -2.57
## phi1      3.34  6.52  4.63  3.81
## tau1     -0.65 -2.58 -1.95 -1.62
  • exactly the same story as for inflation, tau2 rejects at the 10% level
  • first difference
ADF.test(diff(dat[, "unemp"]))
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.42   0.43   0.43 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3     -5.16 -3.99 -3.43 -3.13
## phi3     13.33  8.43  6.49  5.47
## tau2     -5.12 -3.46 -2.88 -2.57
## phi1     13.12  6.52  4.63  3.81
## tau1     -5.13 -2.58 -1.95 -1.62
  • reject the Null of unit root in all specifications at 1% significance
  • first-difference stationary, order of integration is 1
c)

Determine whether the variables are cointegrated or not using the Engle-Granger ADF-test. Hint: Estimate S&W 16.24.

Cointegration

  • cointegration (lecture 7, slide 7):
    • two variables \(x_t\) and \(y_t\) are cointegrated, if
      • both have the same order of integration
      • some linear combination of them is stationary
  • the residuals of a regression of \(y_t\) on \(x_t\) are a linear combination (lecture 7, slide 12):

\[ \hat{e}_t = y_t - \hat{\beta}_0 - \hat{\beta}_1 x_t \]

  • so we estimate the regression, and test the residuals for their order of integration
    • this is the Engle-Granger test for cointegration
Engle-Granger test
  • first condition is satisfied: in b) we show that both are integrated of order 1
  • now for the second condition
    • note that we are using dynlm() to preserve the ts class for the residuals
    • this regression is not dynamic though (no lags), so normal lm() works correctly (remember lab 3)
model_eg <- dynlm(infl ~ unemp, data = dat)

summary(model_eg)
## 
## Time series regression with "ts" data:
## Start = 1959(2), End = 2000(4)
## 
## Call:
## dynlm(formula = infl ~ unemp, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6776 -1.9511 -0.9432  1.4511 11.9018 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   2.7489     1.0150   2.708  0.00748 **
## unemp         0.2778     0.1655   1.679  0.09509 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.195 on 165 degrees of freedom
## Multiple R-squared:  0.01679,    Adjusted R-squared:  0.01083 
## F-statistic: 2.818 on 1 and 165 DF,  p-value: 0.09509
  • which way around we regress does not matter, it changes the values of the residuals, but not their time series properties
res_eg <- resid(model_eg)

plot(res_eg)

ADF.test(res_eg)
## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.87   0.87   0.87 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #
##      statistic  1pct  5pct 10pct
## tau3     -3.01 -3.99 -3.43 -3.13
## phi3      4.64  8.43  6.49  5.47
## tau2     -2.90 -3.46 -2.88 -2.57
## phi1      4.20  6.52  4.63  3.81
## tau1     -2.89 -2.58 -1.95 -1.62
  • use tau2 here, the ADF specification with intercept (drift), but no trend
  • the EG-ADF-test again has different critical values, taken from the book

Table 17.1 from Stock and Watson (4. ed, p. 665)

10% 5% 1%
One regressor −3.12 −3.41 −3.96
Two regressors −3.52 −3.80 −4.36
Three regressors −3.84 −4.16 −4.73
Four regressors −4.20 −4.49 −5.07
  • we want to test at the 5% level, and have a single cointegration relationship, so the critical value is -3.41
  • test statistic is -2.90 > -3.41, we can not reject the Null of unit root
    • the residuals of the regression are non-stationary
    • the linear combination of inflation and unemployment that minimizes the residuals does not produce stationary residuals
    • this means inflation and unemployment are not cointegrated
Sidenote

You commented in the lecture that the EG-ADF test is kind of tedious to do. I have not found a good implementation of it to run as a single function. This test does however have some undesirable theoretical properties as well, which is why the Johansen procedure is typically used to test for cointegration. We cover this in the last exercise in a later lab.

d)

Estimate a VAR model where the lag order is determined by AIC.

  • \(\operatorname{VAR}(p)\) model is conceptually similar to \(\operatorname{AR}(p)\), but now with a vector of variables, that can all influence each other over time
  • implemented in the vars package
  • the default for the VAR() function is \(p = 1\)
  • VARselect() to determine optimal lag order
  • since the variables are not cointegrated, estimate the VAR in first differences, not levels!
ddat <- diff(dat)

VARselect(ddat, type = "const")
## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      4      2      4 
## 
## $criteria
##                1          2          3          4         5          6
## AIC(n) -1.883240 -2.0364174 -2.0474062 -2.1208223 -2.106279 -2.0618174
## HQ(n)  -1.835597 -1.9570123 -1.9362390 -1.9778931 -1.931587 -1.8553642
## SC(n)  -1.765938 -1.8409138 -1.7737011 -1.7689158 -1.676171 -1.5535081
## FPE(n)  0.152098  0.1305011  0.1290848  0.1199637  0.121747  0.1273212
##                 7          8          9         10
## AIC(n) -2.0736933 -2.0713295 -2.0391405 -2.0009913
## HQ(n)  -1.8354780 -1.8013522 -1.7374011 -1.6674899
## SC(n)  -1.4871825 -1.4066173 -1.2962268 -1.1798762
## FPE(n)  0.1258705  0.1262373  0.1304572  0.1356465
  • some variation in the criteria as always, we go with AIC, so \(p = 4\)
model_var <- VAR(ddat, p = 4, type = "const")

summary(model_var)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: infl, unemp 
## Deterministic variables: const 
## Sample size: 162 
## Log Likelihood: -275.682 
## Roots of the characteristic polynomial:
## 0.7735 0.7735 0.7724 0.7724 0.6014 0.6014 0.5258 0.5258
## Call:
## VAR(y = ddat, p = 4, type = "const")
## 
## 
## Estimation results for equation infl: 
## ===================================== 
## infl = infl.l1 + unemp.l1 + infl.l2 + unemp.l2 + infl.l3 + unemp.l3 + infl.l4 + unemp.l4 + const 
## 
##          Estimate Std. Error t value Pr(>|t|)    
## infl.l1  -0.35181    0.07839  -4.488 1.41e-05 ***
## unemp.l1 -2.78566    0.45227  -6.159 6.16e-09 ***
## infl.l2  -0.24230    0.08320  -2.912  0.00412 ** 
## unemp.l2  1.21188    0.57085   2.123  0.03537 *  
## infl.l3   0.05618    0.08063   0.697  0.48696    
## unemp.l3  0.53037    0.57256   0.926  0.35574    
## infl.l4  -0.07154    0.07573  -0.945  0.34629    
## unemp.l4 -1.58340    0.47966  -3.301  0.00120 ** 
## const    -0.00260    0.10988  -0.024  0.98115    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.398 on 153 degrees of freedom
## Multiple R-Squared: 0.3953,  Adjusted R-squared: 0.3637 
## F-statistic:  12.5 on 8 and 153 DF,  p-value: 1.025e-13 
## 
## 
## Estimation results for equation unemp: 
## ====================================== 
## unemp = infl.l1 + unemp.l1 + infl.l2 + unemp.l2 + infl.l3 + unemp.l3 + infl.l4 + unemp.l4 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## infl.l1   0.042168   0.013675   3.084  0.00243 ** 
## unemp.l1  0.728097   0.078896   9.229  < 2e-16 ***
## infl.l2   0.003257   0.014513   0.224  0.82274    
## unemp.l2  0.001224   0.099582   0.012  0.99021    
## infl.l3   0.017220   0.014065   1.224  0.22270    
## unemp.l3 -0.080847   0.099879  -0.809  0.41952    
## infl.l4   0.021240   0.013211   1.608  0.10994    
## unemp.l4 -0.028794   0.083675  -0.344  0.73123    
## const    -0.004894   0.019168  -0.255  0.79883    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.2438 on 153 degrees of freedom
## Multiple R-Squared: 0.4778,  Adjusted R-squared: 0.4505 
## F-statistic:  17.5 on 8 and 153 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##          infl    unemp
## infl   1.9539 -0.02480
## unemp -0.0248  0.05946
## 
## Correlation matrix of residuals:
##           infl    unemp
## infl   1.00000 -0.07276
## unemp -0.07276  1.00000
  • first, note the roots of the characteristic polynomial:
    • 8 roots: 4 lags, and a 2x2 matrix for every lag (each has 2 roots)
  • results presented in the two-linear-equations style, not the matrix representation
  • let’s plot the residuals
ts.plot(resid(model_var), main = "Residuals of VAR(4)", lty = 1:2)
legend(
  "bottomright",
  legend = c("Inflation", "Unemployment"),
  lty = 1:2
)

  • do not look too good!
Autocorrelation of the residuals
  • the vars package has some functions for diagnostics, e.g. the multivariate Ljung-Box test for autocorrelation in the residuals
serial.test(model_var)
## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 50.826, df = 48, p-value = 0.3629
  • can not reject the Null of no autocorrelation
Normality of the residuals
  • multivariate Jarque-Bera test for normality of residuals
normality.test(model_var)
## $JB
## 
##  JB-Test (multivariate)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 34.938, df = 4, p-value = 4.784e-07
## 
## 
## $Skewness
## 
##  Skewness only (multivariate)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 14.303, df = 2, p-value = 0.0007838
## 
## 
## $Kurtosis
## 
##  Kurtosis only (multivariate)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 20.635, df = 2, p-value = 3.305e-05
  • tiny \(p\)-value: reject the Null of normality
e)

Determine whether any of the variables Granger-causes the other. Hint: One does, the other not.

  • Granger-causality is a very particular thing, not necessarily related to causality on a more philosophical level
  • the question is essentially: is \(x_t\) useful to forecast \(y_t\)? if so, \(x_t\) is said to Granger-cause \(y_t\)
  • implemented in vars package
    • note that the residuals did not show autocorrelation, but non-normality, so we use a robust variance estimator
# from inflation to unemployment
cause_i_u <- causality(
  model_var, 
  cause = "infl", 
  vcov. = sandwich::vcovHC(model_var)
  )

# from unemployment to inflation
cause_u_i <- causality(
  model_var, 
  cause = "unemp", 
  vcov. = sandwich::vcovHC(model_var)
  )
  • causality() runs two different tests, we only want the Granger-causality
cause_i_u$Granger
## 
##  Granger causality H0: infl do not Granger-cause unemp
## 
## data:  VAR object model_var
## F-Test = 1.8569, df1 = 4, df2 = 306, p-value = 0.1179
  • we can not reject the Null that inflation does not Granger-cause unemployment
  • assume Null: inflation does not Granger-cause unemployment
cause_u_i$Granger
## 
##  Granger causality H0: unemp do not Granger-cause infl
## 
## data:  VAR object model_var
## F-Test = 9.7371, df1 = 4, df2 = 306, p-value = 1.999e-07
  • we reject the Null that unemployment does not Granger-cause inflation
  • work with the alternative: unemployment Granger-causes inflation
f)

Plot all four possible impulse response functions using the orthogonal innovations.

  • impulse response functions are a pretty and commonly used result of VAR estimation
  • easy to interpret
  • the default for irf() is to compute all IRFs (all variables on all other, 2x2 in our case) with orthogonal shocks
  • plot() takes objects of class varirf and creates nice plots
var_irf <- irf(model_var, ortho = TRUE)

plot(var_irf)

  • sanity check of the Granger-causality results:
    • when inflation is shocked, essentially no effect of unemployment
    • when unemployment is shocked, inflation goes down
g)

Produce a (one) plot, a so-called fan chart, of the forecasts for ten years ahead for the variable that is Granger-caused. The title of the plot should read “Forecasts for the XXX rate.”, where XXX is replaced with the variable’s name and the label on the vertical axis should be a “%”.

  • we can use the VAR to make forecasts by iteratively computing 1-step forecasts
  • intuitively in a VAR(1):
    • use today’s data to make a forecast for tomorrow, then use tomorrow’s data to make a forecast for the day after, and so on
    • the \(n\)-step ahead forecast is then just the parameter matrix to the power of \(n\) multiplied with today’s values
    • the variance estimates of the residuals are then used to calculate confidence bands
  • easily programmed: predict() produces a forecast for n.ahead time steps
  • fanchart() from the vars package produces exactly the plot we want
hmax <- 10 * 4

var_forecast <- predict(model_var, n.ahead = hmax)

fanchart(
  var_forecast,
  names = "infl",
  main = "Forecast of the inflation rate",
  ylab = "%"
  )

  • alternative way using the forecast package
var_forecast2 <- forecast(
  model_var, 
  h = hmax,
  fan = TRUE
  )

plot(
  var_forecast2$forecast$infl,
  main = "Forecast of the inflation rate",
  ylab = "%"
  )


  1. ↩︎