Econometrics 3b

Computer Labs 5 and 6: Exercises

Exercise 4

In package Ecdat there is the data set Macrodat, used in exercise 2 above, where variable lhur is the unemployment rate and punew is a quarterly inflation index.

Define a data set for the unemployment rate and the yearly inflation rate.
Determine the order of integration for the two variables using an ADF-test.
Determine whether the variables are cointegrated or not using the Engle-Granger ADF-test. Hint: Estimate S&W 16.24 (in 3. ed; 17.24 in 4. ed).
Estimate a VAR model where the lag order is determined by AIC.
Determine whether any of the variables Granger-causes the other. Hint: One does, the other not.
Plot all four possible impulse response functions using the orthogonal innovations.
Produce a (one) plot, a so-called fan chart, of the forecasts for ten years ahead for the variable that is Granger-caused. The title of the plot should read “Forecasts for the XXX rate.”, where XXX is replaced with the variable’s name and the label on the vertical axis should be a “%”.

Solution

a)

Define a data set for the unemployment rate and the yearly inflation rate.

we are pros at loading data by now
looking at ?Ecdat::Macrodat, punew is the CPI
- we have to calculate the inflation rate from the index
there are different concepts of “yearly inflation rate”:
- annualized quarter-on-quarter inflation rate
  - “annual inflation rate, if every quarter has this quarter’s inflation rate”
  - takes care of compounding
- inflation rate over same quarter last year
  - simpler, but different interpretation
- let us compare
but first, plot the CPI

library(TS)
library(dynlm)
library(vars)
library(forecast)

data(Macrodat, package = "Ecdat")

cpi <- Macrodat[, "punew"]

plot(cpi)

Quarter-on-quarter inflation

calculate inflation rate quarter-on-quarter (QoQ), i.e. percentage change to last quarter

infl_qoq <- diff(log(cpi))

Sidenote

Why diff(log())?

It’s a computationally efficient method to approximate percentage changes.

\[ \Delta \log(x_t) = \log(x_t) - \log(x_{t - 1}) = \log \left(\frac{x_t}{x_{t - 1}} \right) = \log \left( 1 + \frac{x_t - x_{t - 1}}{x_{t - 1}} \right) \approx \frac{x_t - x_{t - 1}}{x_{t - 1}} \]

The approximation is better, the smaller the percentage change.

Annualizing

note that infl_qoq is in decimal scale of percentage (\(1\% \widehat{=} 0.01\))
what is then today’s price level \(P_t\) given \(P_{t - 1}\) and \(\pi_t\)?

\[ P_t = P_{t - 1} + \pi_t P_{t - 1} = (1 + \pi_t) P_{t - 1} \]

where \(\pi_t\) has the same scale as infl_qoq
consider \(t\) as quarters, then what is the yearly inflation rate?

\[ P_t = (1 + \pi_t) P_{t - 1} = (1 + \pi_t) (1 + \pi_{t - 1}) P_{t - 2} = (1 + \pi_t) (1 + \pi_{t - 1}) (1 + \pi_{t - 2}) P_{t - 3} = (1 + \pi_t) (1 + \pi_{t - 1}) (1 + \pi_{t - 2}) (1 + \pi_{t - 3}) P_{t - 4} \]

we have compounding of the inflation rates
the annualized inflation rate answers: What is the yearly inflation rate, if we have the same rate every quarter?
so suppose here \(\pi_t = \pi_{t - 1} = \pi_{t - 2} = \pi_{t - 2} = \pi_{t - 3}\), then

\[ P_t =(1 + \pi_t) (1 + \pi_{t - 1}) (1 + \pi_{t - 2}) (1 + \pi_{t - 3}) P_{t - 4} = (1 + \pi_t)^4 P_{t - 4} \]

then the formula to annualize is

infl_ann <- (1 + infl_qoq)^4 - 1

we multiply by 100 to get percentage scale

infl_ann <- infl_ann * 100
infl_qoq <- infl_qoq * 100

Inflation over past year

alternative question: What is the inflation rate over this quarter last year?
takes care of seasonality

infl_y <- diff(log(cpi), lag = 4)

infl_y <- infl_y * 100

compare the rates

ts.plot(
    infl_qoq, infl_ann, infl_y,
    lty = 1:3,
    main = "Inflation rate comparison"
)
legend(
    "topright",
    legend = c("QoQ", "QoQ annualized", "Over last year"),
    lty = 1:3
)

QoQ is “too small”, it measures price changes over shorter time horizons
annualized and over last year similar
the default is to use annualized values
- easy to compare magnitude with yearly inflation rates
now to main exercise, combining into data set
- when calculating the inflation rate we lose one observation
- ts.intersect() takes care of that, creating a multivariate time series, restricting the longer series

dat <- ts.intersect(infl = infl_ann, unemp = Macrodat[, "lhur"])

ts.plot(dat, lty = 1:2)
legend(
  "topright", 
  legend = c("Inflation", "Unemployment"), 
  lty = 1:2
  )

b)

Determine the order of integration for the two variables using an ADF-test.

we are pros at this after last session
remember the ADF-test
- Null hypothesis is unit root (non-stationarity)
- again load Michael’s TS library

Inflation

ADF.test(dat[, "infl"])

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.88   0.89   0.97 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3     -2.72 -3.99 -3.43 -3.13
## phi3      3.88  8.43  6.49  5.47
## tau2     -2.62 -3.46 -2.88 -2.57
## phi1      3.44  6.52  4.63  3.81
## tau1     -1.31 -2.58 -1.95 -1.62

test statistics smaller in absolute value than all critical values
- except for tau2 at 10% significance
- in model with drift we can reject unit root at 10% level, but not at 5%
now test first difference

ADF.test(diff(dat[, "infl"]))

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    -1.2   -0.62   -0.61 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3     -5.45 -3.99 -3.43 -3.13
## phi3     14.84  8.43  6.49  5.47
## tau2     -5.16 -3.46 -2.88 -2.57
## phi1     13.32  6.52  4.63  3.81
## tau1     -5.18 -2.58 -1.95 -1.62

we reject unit root in all specifications and levels of significance
no rejection in original series, rejection in first difference
- inflation is first-difference stationary, i.e. order of integration is 1

Unemployment

ADF.test(dat[, "unemp"])

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.97   0.97   1 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3     -2.51 -3.99 -3.43 -3.13
## phi3      3.31  8.43  6.49  5.47
## tau2     -2.58 -3.46 -2.88 -2.57
## phi1      3.34  6.52  4.63  3.81
## tau1     -0.65 -2.58 -1.95 -1.62

exactly the same story as for inflation, tau2 rejects at the 10% level
first difference

ADF.test(diff(dat[, "unemp"]))

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.42   0.43   0.43 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3     -5.16 -3.99 -3.43 -3.13
## phi3     13.33  8.43  6.49  5.47
## tau2     -5.12 -3.46 -2.88 -2.57
## phi1     13.12  6.52  4.63  3.81
## tau1     -5.13 -2.58 -1.95 -1.62

reject the Null of unit root in all specifications at 1% significance
first-difference stationary, order of integration is 1

c)

Determine whether the variables are cointegrated or not using the Engle-Granger ADF-test. Hint: Estimate S&W 16.24.

Cointegration

cointegration (lecture 7, slide 7):
- two variables \(x_t\) and \(y_t\) are cointegrated, if
  - both have the same order of integration
  - some linear combination of them is stationary
the residuals of a regression of \(y_t\) on \(x_t\) are a linear combination (lecture 7, slide 12):

\[ \hat{e}_t = y_t - \hat{\beta}_0 - \hat{\beta}_1 x_t \]

so we estimate the regression, and test the residuals for their order of integration
- this is the Engle-Granger test for cointegration

Engle-Granger test

first condition is satisfied: in b) we show that both are integrated of order 1
now for the second condition
- note that we are using dynlm() to preserve the ts class for the residuals
- this regression is not dynamic though (no lags), so normal lm() works correctly (remember lab 3)

model_eg <- dynlm(infl ~ unemp, data = dat)

summary(model_eg)

## 
## Time series regression with "ts" data:
## Start = 1959(2), End = 2000(4)
## 
## Call:
## dynlm(formula = infl ~ unemp, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6776 -1.9511 -0.9432  1.4511 11.9018 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   2.7489     1.0150   2.708  0.00748 **
## unemp         0.2778     0.1655   1.679  0.09509 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.195 on 165 degrees of freedom
## Multiple R-squared:  0.01679,    Adjusted R-squared:  0.01083 
## F-statistic: 2.818 on 1 and 165 DF,  p-value: 0.09509

which way around we regress does not matter, it changes the values of the residuals, but not their time series properties

res_eg <- resid(model_eg)

plot(res_eg)

ADF.test(res_eg)

## # # # # # # # # # # # # # # # # # # # # 
## # # # # # # # # # # # # # # # # # # # # 
## Augmented Dickey--Fuller test
## # # # # # # # # # # # # # # # # # # # # 
## type:   trend  drift  none
## AR1:    0.87   0.87   0.87 
## lags:   12     12     12 
## # # # # # # # # # # # # # # # # # # # #

##      statistic  1pct  5pct 10pct
## tau3     -3.01 -3.99 -3.43 -3.13
## phi3      4.64  8.43  6.49  5.47
## tau2     -2.90 -3.46 -2.88 -2.57
## phi1      4.20  6.52  4.63  3.81
## tau1     -2.89 -2.58 -1.95 -1.62

use tau2 here, the ADF specification with intercept (drift), but no trend
the EG-ADF-test again has different critical values, taken from the book

Table 17.1 from Stock and Watson (4. ed, p. 665)

	10%	5%	1%
One regressor	−3.12	−3.41	−3.96
Two regressors	−3.52	−3.80	−4.36
Three regressors	−3.84	−4.16	−4.73
Four regressors	−4.20	−4.49	−5.07

we want to test at the 5% level, and have a single cointegration relationship, so the critical value is -3.41
test statistic is -2.90 > -3.41, we can not reject the Null of unit root
- the residuals of the regression are non-stationary
- the linear combination of inflation and unemployment that minimizes the residuals does not produce stationary residuals
- this means inflation and unemployment are not cointegrated

Sidenote

You commented in the lecture that the EG-ADF test is kind of tedious to do. I have not found a good implementation of it to run as a single function. This test does however have some undesirable theoretical properties as well, which is why the Johansen procedure is typically used to test for cointegration. We cover this in the last exercise in a later lab.

d)

Estimate a VAR model where the lag order is determined by AIC.

\(\operatorname{VAR}(p)\) model is conceptually similar to \(\operatorname{AR}(p)\), but now with a vector of variables, that can all influence each other over time
implemented in the vars package
the default for the VAR() function is \(p = 1\)
VARselect() to determine optimal lag order
since the variables are not cointegrated, estimate the VAR in first differences, not levels!

ddat <- diff(dat)

VARselect(ddat, type = "const")

## $selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      4      2      4 
## 
## $criteria
##                1          2          3          4         5          6
## AIC(n) -1.883240 -2.0364174 -2.0474062 -2.1208223 -2.106279 -2.0618174
## HQ(n)  -1.835597 -1.9570123 -1.9362390 -1.9778931 -1.931587 -1.8553642
## SC(n)  -1.765938 -1.8409138 -1.7737011 -1.7689158 -1.676171 -1.5535081
## FPE(n)  0.152098  0.1305011  0.1290848  0.1199637  0.121747  0.1273212
##                 7          8          9         10
## AIC(n) -2.0736933 -2.0713295 -2.0391405 -2.0009913
## HQ(n)  -1.8354780 -1.8013522 -1.7374011 -1.6674899
## SC(n)  -1.4871825 -1.4066173 -1.2962268 -1.1798762
## FPE(n)  0.1258705  0.1262373  0.1304572  0.1356465

some variation in the criteria as always, we go with AIC, so \(p = 4\)

model_var <- VAR(ddat, p = 4, type = "const")

summary(model_var)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: infl, unemp 
## Deterministic variables: const 
## Sample size: 162 
## Log Likelihood: -275.682 
## Roots of the characteristic polynomial:
## 0.7735 0.7735 0.7724 0.7724 0.6014 0.6014 0.5258 0.5258
## Call:
## VAR(y = ddat, p = 4, type = "const")
## 
## 
## Estimation results for equation infl: 
## ===================================== 
## infl = infl.l1 + unemp.l1 + infl.l2 + unemp.l2 + infl.l3 + unemp.l3 + infl.l4 + unemp.l4 + const 
## 
##          Estimate Std. Error t value Pr(>|t|)    
## infl.l1  -0.35181    0.07839  -4.488 1.41e-05 ***
## unemp.l1 -2.78566    0.45227  -6.159 6.16e-09 ***
## infl.l2  -0.24230    0.08320  -2.912  0.00412 ** 
## unemp.l2  1.21188    0.57085   2.123  0.03537 *  
## infl.l3   0.05618    0.08063   0.697  0.48696    
## unemp.l3  0.53037    0.57256   0.926  0.35574    
## infl.l4  -0.07154    0.07573  -0.945  0.34629    
## unemp.l4 -1.58340    0.47966  -3.301  0.00120 ** 
## const    -0.00260    0.10988  -0.024  0.98115    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.398 on 153 degrees of freedom
## Multiple R-Squared: 0.3953,  Adjusted R-squared: 0.3637 
## F-statistic:  12.5 on 8 and 153 DF,  p-value: 1.025e-13 
## 
## 
## Estimation results for equation unemp: 
## ====================================== 
## unemp = infl.l1 + unemp.l1 + infl.l2 + unemp.l2 + infl.l3 + unemp.l3 + infl.l4 + unemp.l4 + const 
## 
##           Estimate Std. Error t value Pr(>|t|)    
## infl.l1   0.042168   0.013675   3.084  0.00243 ** 
## unemp.l1  0.728097   0.078896   9.229  < 2e-16 ***
## infl.l2   0.003257   0.014513   0.224  0.82274    
## unemp.l2  0.001224   0.099582   0.012  0.99021    
## infl.l3   0.017220   0.014065   1.224  0.22270    
## unemp.l3 -0.080847   0.099879  -0.809  0.41952    
## infl.l4   0.021240   0.013211   1.608  0.10994    
## unemp.l4 -0.028794   0.083675  -0.344  0.73123    
## const    -0.004894   0.019168  -0.255  0.79883    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.2438 on 153 degrees of freedom
## Multiple R-Squared: 0.4778,  Adjusted R-squared: 0.4505 
## F-statistic:  17.5 on 8 and 153 DF,  p-value: < 2.2e-16 
## 
## 
## 
## Covariance matrix of residuals:
##          infl    unemp
## infl   1.9539 -0.02480
## unemp -0.0248  0.05946
## 
## Correlation matrix of residuals:
##           infl    unemp
## infl   1.00000 -0.07276
## unemp -0.07276  1.00000

first, note the roots of the characteristic polynomial:
- 8 roots: 4 lags, and a 2x2 matrix for every lag (each has 2 roots)
results presented in the two-linear-equations style, not the matrix representation
let’s plot the residuals

ts.plot(resid(model_var), main = "Residuals of VAR(4)", lty = 1:2)
legend(
  "bottomright",
  legend = c("Inflation", "Unemployment"),
  lty = 1:2
)

do not look too good!

Autocorrelation of the residuals

the vars package has some functions for diagnostics, e.g. the multivariate Ljung-Box test for autocorrelation in the residuals

serial.test(model_var)

## 
##  Portmanteau Test (asymptotic)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 50.826, df = 48, p-value = 0.3629

can not reject the Null of no autocorrelation

Normality of the residuals

multivariate Jarque-Bera test for normality of residuals

normality.test(model_var)

## $JB
## 
##  JB-Test (multivariate)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 34.938, df = 4, p-value = 4.784e-07
## 
## 
## $Skewness
## 
##  Skewness only (multivariate)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 14.303, df = 2, p-value = 0.0007838
## 
## 
## $Kurtosis
## 
##  Kurtosis only (multivariate)
## 
## data:  Residuals of VAR object model_var
## Chi-squared = 20.635, df = 2, p-value = 3.305e-05

tiny \(p\)-value: reject the Null of normality

e)

Determine whether any of the variables Granger-causes the other. Hint: One does, the other not.

Granger-causality is a very particular thing, not necessarily related to causality on a more philosophical level
the question is essentially: is \(x_t\) useful to forecast \(y_t\)? if so, \(x_t\) is said to Granger-cause \(y_t\)
implemented in vars package
- note that the residuals did not show autocorrelation, but non-normality, so we use a robust variance estimator

# from inflation to unemployment
cause_i_u <- causality(
  model_var, 
  cause = "infl", 
  vcov. = sandwich::vcovHC(model_var)
  )

# from unemployment to inflation
cause_u_i <- causality(
  model_var, 
  cause = "unemp", 
  vcov. = sandwich::vcovHC(model_var)
  )

causality() runs two different tests, we only want the Granger-causality

cause_i_u$Granger

## 
##  Granger causality H0: infl do not Granger-cause unemp
## 
## data:  VAR object model_var
## F-Test = 1.8569, df1 = 4, df2 = 306, p-value = 0.1179

we can not reject the Null that inflation does not Granger-cause unemployment
assume Null: inflation does not Granger-cause unemployment

cause_u_i$Granger

## 
##  Granger causality H0: unemp do not Granger-cause infl
## 
## data:  VAR object model_var
## F-Test = 9.7371, df1 = 4, df2 = 306, p-value = 1.999e-07

we reject the Null that unemployment does not Granger-cause inflation
work with the alternative: unemployment Granger-causes inflation

f)

Plot all four possible impulse response functions using the orthogonal innovations.

impulse response functions are a pretty and commonly used result of VAR estimation
easy to interpret
the default for irf() is to compute all IRFs (all variables on all other, 2x2 in our case) with orthogonal shocks
plot() takes objects of class varirf and creates nice plots

var_irf <- irf(model_var, ortho = TRUE)

plot(var_irf)

sanity check of the Granger-causality results:
- when inflation is shocked, essentially no effect of unemployment
- when unemployment is shocked, inflation goes down

g)

Produce a (one) plot, a so-called fan chart, of the forecasts for ten years ahead for the variable that is Granger-caused. The title of the plot should read “Forecasts for the XXX rate.”, where XXX is replaced with the variable’s name and the label on the vertical axis should be a “%”.

we can use the VAR to make forecasts by iteratively computing 1-step forecasts
intuitively in a VAR(1):
- use today’s data to make a forecast for tomorrow, then use tomorrow’s data to make a forecast for the day after, and so on
- the \(n\)-step ahead forecast is then just the parameter matrix to the power of \(n\) multiplied with today’s values
- the variance estimates of the residuals are then used to calculate confidence bands
easily programmed: predict() produces a forecast for n.ahead time steps
fanchart() from the vars package produces exactly the plot we want

hmax <- 10 * 4

var_forecast <- predict(model_var, n.ahead = hmax)

fanchart(
  var_forecast,
  names = "infl",
  main = "Forecast of the inflation rate",
  ylab = "%"
  )

alternative way using the forecast package

var_forecast2 <- forecast(
  model_var, 
  h = hmax,
  fan = TRUE
  )

plot(
  var_forecast2$forecast$infl,
  main = "Forecast of the inflation rate",
  ylab = "%"
  )

Econometrics 3b

Thore Petersen¹

13.04.2022/21.04.2022

Computer Labs 5 and 6: Exercises

Exercise 4

Solution

a)

Quarter-on-quarter inflation

Sidenote

Annualizing

Inflation over past year

b)

Inflation

Unemployment

c)

Engle-Granger test

Sidenote

d)

Autocorrelation of the residuals

Normality of the residuals

e)

f)

g)

Econometrics 3b

Thore Petersen1

13.04.2022/21.04.2022

Computer Labs 5 and 6: Exercises

Exercise 4

Solution

a)

Quarter-on-quarter inflation

Sidenote

Annualizing

Inflation over past year

b)

Inflation

Unemployment

c)

Engle-Granger test

Sidenote

d)

Autocorrelation of the residuals

Normality of the residuals

e)

f)

g)

Thore Petersen¹