Stockholm University, VT2022
Write a commented script, that does the following:
Let us start by creating a new project for this lab.
Create two normally distributed realizations with length 100. Both with mean one, but one should have variance one, and the other variance two.
two normally distributed realizations of length 100 = two samples of random draws from a Normal/Gaussian distribution
computers can not be “truly” random, they only execute whatever we ask them to
we speak of “pseudorandomness” when generating “random” numbers on a computer
any sequence or sample of such numbers seem random (e.g. satisfy some statistical properties), but are in fact deterministic
random numbers are useful for lots of modelling, and you need a lot of numbers (in the millions if not more to solve some models)
R’s default algorithm (Mersenne Twister) has a period
of \(2^{19937} - 1\) (that’s a lot of
numbers)nice side effect of the numbers being deterministic: models are (theoretically) perfectly reproducible
how it works:
R)?Random in RSo let’s generate the samples.
set.seed(2)
n <- 100
mu <- 1
sigx <- 1
sigy <- 2
x <- rnorm(n = n, mean = mu, sd = sqrt(sigx))
y <- rnorm(n, mu, sqrt(sigy))
rnorm() to generate/draw the samples
sqrt() function inside: it is
evaluated first, and the result used for the sd
argumentLet’s have a look at the random samples
plot(x)
plot(y)
Looks pretty random, but both centered around 1, as expected.
Look at histograms (we overlay the theoretical density):
probability = TRUE puts relative frequencies on the
\(y\)-axiscurve() takes a function, and overlays over the
previous plot, if add = TRUEhist(x, probability = TRUE)
curve(dnorm(x, mean = mu, sd = sqrt(sigx)), from = -2, to = 4, add = TRUE)
hist(y, probability = TRUE)
curve(dnorm(x, mean = mu, sd = sqrt(sigy)), from = -3, to = 5, add = TRUE)
Test whether sample means are different from zero.
t.test(x, mu = 0)
##
## One Sample t-test
##
## data: x
## t = 8.3547, df = 99, p-value = 4.148e-13
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.7390951 1.1995086
## sample estimates:
## mean of x
## 0.9693018
t = 8.3547: value of the \(t\)-statistic (\(\frac{\bar{x}}{\hat{\sigma} /
\sqrt{n}}\))df = 99: degrees of freedom of the \(t\)-distributionp-value = 4.148e-13: probability mass of the \(t_{\nu = 99}\) distribution beyond
8.3547t.test(y, mu = 0)
##
## One Sample t-test
##
## data: y
## t = 7.4727, df = 99, p-value = 3.175e-11
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.764817 1.317814
## sample estimates:
## mean of x
## 1.041316
Test whether the sample means are the same or not.
R using the same function, but now with
different arguments
mu = 0R “understands” we want to test the mean of that single
samplet.test(x, y)
##
## Welch Two Sample t-test
##
## data: x and y
## t = -0.39716, df = 191.71, p-value = 0.6917
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4296611 0.2856333
## sample estimates:
## mean of x mean of y
## 0.9693018 1.0413158
p-value = 0.6917
Run the script in batch mode, so that a result file is created.
# Econometrics 3b
# Exercise 1
# parameters
set.seed(2)
n <- 100
mu <- 1
sigx <- 1
sigy <- 2
# generate samples
x <- rnorm(n = n, mean = mu, sd = sqrt(sigx))
y <- rnorm(n, mu, sqrt(sigy))
# testing
t.test(x, mu = 0)
t.test(y, mu = 0)
t.test(x, y)
exercise-1.RR CMD BATCH more flexible than RScript
from last time--vanilla means “plain vanilla”, or the basic
R configuration (no special user-defined options etc.)"exercise-1.R" to call our script, then space and
"output.txt" that is the file to which everything will be
printed (you can omit the “” if there are no spaces in the names)R CMD BATCH --vanilla "exercise-1.R" "output.txt"
output.txt
Write a commented script that estimates \(\Delta \text{gdpjp}_t = \phi_0 + \phi_1 \Delta
\text{gdpjp}_{t - 1} + e_t\) with OLS, where \(\text{gdpjp}\) is found in the data set
Macrodat in package Ecdat. Test estimated
residuals for autocorrelation. Run the script in batch mode so that a
result file is created.
lm() and
dynlm().pxweb. Write a script that documents the source
of the data and saves the data. Estimate a deterministic trend model and
compare the results with the HP-filter and with Hamilton’s
solution.dynlm packageMacrodat is listed (check its
class)dynlm defines functions like d() and
L() for inside formulaelibrary(dynlm)
library(portes)
library(lmtest)
library(sandwich)
data(Macrodat, package = "Ecdat")
Macrodat, check it out
head() prints the first elements of any objecthead(Macrodat)
## lhur punew fyff fygm3 fygt1 exruk gdpjp
## 1959 Q1 5.833333 28.99333 2.80 2.80 3.61 281.271 14.3195
## 1959 Q2 5.100000 29.04333 3.39 3.21 4.07 281.254 14.8032
## 1959 Q3 5.266667 29.19333 3.76 4.04 5.00 280.370 15.5764
## 1959 Q4 5.600000 29.37000 3.99 4.49 5.14 279.845 15.4222
## 1960 Q1 5.133333 29.39667 3.84 3.31 4.02 280.586 16.5126
## 1960 Q2 5.233333 29.57333 3.32 2.46 3.36 280.268 16.5398
class(Macrodat)
## [1] "mts" "ts"
mts and ts: multivariate timeseries
ts objects as its
columnscolnames(Macrodat)
## [1] "lhur" "punew" "fyff" "fygm3" "fygt1" "exruk" "gdpjp"
Now to specify the model.
formula1 <- d(gdpjp) ~ L(d(gdpjp))
model1 <- dynlm(formula1, data = Macrodat)
lists?
model1 is now a named list containing all the results
of the regressionsummary(model1)
##
## Time series regression with "ts" data:
## Start = 1959(3), End = 1999(2)
##
## Call:
## dynlm(formula = formula1, data = Macrodat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8825 -0.3535 -0.0066 0.4351 2.3532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.64608 0.07645 8.451 1.79e-14 ***
## L(d(gdpjp)) -0.01733 0.07966 -0.218 0.828
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.723 on 158 degrees of freedom
## (0 observations deleted due to missingness)
## Multiple R-squared: 0.0002995, Adjusted R-squared: -0.006028
## F-statistic: 0.04733 on 1 and 158 DF, p-value: 0.8281
Having a look at residuals is always a good idea.
class of residuals(model1)
is ts
plot(residuals(model1))
Test for autocorrelation
acf()acf(residuals(model1))
portes libraryresiduals() function to get the timeseries of
residualslags the lag order up to which we want to testorder is the number of parameters in the model
(intercept + slope)nobs() function (short for number
of observations) to get \(T\)q <- floor(0.75 * nobs(model1)^(1/3))
m <- length(coef(model1))
LjungBox(residuals(model1), lags = 1:q, order = m)
## lags statistic df p-value
## 1 0.001115598 0 NA
## 2 3.156573290 0 0.0000000000
## 3 10.280790212 1 0.0013442236
## 4 17.395024245 2 0.0001670008
Repeat, but with the variable \(\Delta
\text{Irates1}\), but use now both lm() and
dynlm().
Load the data
data(Irates, package = "Ecdat")
lm()
d() and L() are functions
internal to dynlmbase R counterparts
diff() and lag() (they behave the same by
default)model2_lm <- lm(diff(r1) ~ lag(diff(r1)), data = Irates)
summary(model2_lm)
## Warning in summary.lm(model2_lm): essentially perfect fit: summary may be
## unreliable
##
## Call:
## lm(formula = diff(r1) ~ lag(diff(r1)), data = Irates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.311e-14 1.710e-17 2.400e-17 3.170e-17 5.717e-16
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.439e-18 2.491e-17 -3.390e-01 0.735
## lag(diff(r1)) 1.000e+00 4.106e-17 2.435e+16 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.733e-16 on 528 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 5.931e+32 on 1 and 528 DF, p-value: < 2.2e-16
Alarm bells must ring now!
R prints a warning message “essentially perfect
fit”hist(residuals(model2_lm))
plot(Irates[, "r1"])
dynlm(), which is designed for
timeseries (“dynamic”) regressionsmodel2 <- dynlm(d(r1) ~ L(d(r1)), data = Irates)
summary(model2)
##
## Time series regression with "ts" data:
## Start = 1947(2), End = 1991(2)
##
## Call:
## dynlm(formula = d(r1) ~ L(d(r1)), data = Irates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7172 -0.1668 0.0169 0.2111 2.7171
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.009889 0.026439 0.374 0.709
## L(d(r1)) 0.021999 0.043559 0.505 0.614
##
## Residual standard error: 0.608 on 527 degrees of freedom
## Multiple R-squared: 0.0004838, Adjusted R-squared: -0.001413
## F-statistic: 0.2551 on 1 and 527 DF, p-value: 0.6137
Check the same diagnostics
plot(residuals(model2))
acf(residuals(model2))
q <- floor(0.75 * nobs(model2)^(1/3))
m <- length(coef(model2))
LjungBox(residuals(model2), lags = 1:q, order = m)
## lags statistic df p-value
## 1 1.041116e-04 0 NA
## 2 1.644574e-01 0 0.000000000
## 3 7.238289e+00 1 0.007136495
## 4 1.017624e+01 2 0.006169598
## 5 1.027875e+01 3 0.016339371
## 6 1.269716e+01 4 0.012854346
Repeat but estimate the covariance matrix with the Newey-West estimator.
lm() and dynlm() by default use covariance
estimators that are not robust to heteroskedasticity and autocorrelation
sandwich librarylmtest library to calculate them for our
modelReproduce the non-robust variance that is default, and then estimate robust version
coeftest(model2)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0098888 0.0264394 0.374 0.7085
## L(d(r1)) 0.0219992 0.0435593 0.505 0.6137
coeftest(model2, vcov. = NeweyWest(model2))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0098888 0.0253651 0.3899 0.6968
## L(d(r1)) 0.0219992 0.0910067 0.2417 0.8091
NeweyWest()Download 12-month inflation rate according to KPIF from SCB using
the package pxweb. Write a script that documents the source
of the data and saves the data. Estimate a deterministic trend model and
compare the results with the HP-filter and with Hamilton’s
solution.
RWe are now going to use R interactively. The script
should look something like this (and nothing more):
# Econometrics 3b
# Exercise 2c
# Download data
library(pxweb)
dl <- pxweb_interactive()
# sequence to get to desired data:
# 1 1 2 1 18 1 3 1 3 * y n y y n
kpif <- dl$data
# head(kpif, 20)
# first year of data is missing (NA), remove it
kpif <- na.exclude(kpif)
write.csv(kpif, file = "kpif.csv", row.names = FALSE)
# end of script
Execute this one directly in RStudio. It requires interaction, so no sense in running it in batch mode.
Now for the non-interactive/“normal” part. Open a new script, and save it.
library(dynlm)
library(mFilter)
df <- read.csv("kpif.csv")
head(df)
## månad KPIF..12.månadsförändring..1987.100
## 1 1988M01 4.5
## 2 1988M02 4.9
## 3 1988M03 5.2
## 4 1988M04 5.9
## 5 1988M05 6.0
## 6 1988M06 6.4
data.frame, but we want to do timeseries
stuff, so transform itinfl <- ts(df[, 2], start = c(1988, 1), frequency = 12)
plot(infl)
dynlmmodel_dt <- dynlm(infl ~ trend(infl))
ts.plot() to plot multiple timeseries in the same
figurefitted() function returns the fitted values of the
model, i.e. the model’s predictionlty = c(1, 2) argument to specify the
line types of the series: 1 solid, 2
dashed, … (see here)ts.plot(infl, fitted(model_dt), lty = c(1, 2))
mFilter packagemodel_hp <- hpfilter(infl)
ts.plot(infl, fitted(model_hp), lty = c(1, 2))
legend(
"topright",
legend = c("KPIF", "HP-filtered trend"),
lty = c(1, 2)
)
model_hp_quarter <- hpfilter(infl, freq = 1600, type = "lambda")
ts.plot(infl, fitted(model_hp_quarter), lty = c(1, 2))
legend(
"topright",
# legend = c("KPIF", "HP-filtered trend, lambda=1600"),
legend = c("KPIF", expression(list(plain("HP-filtered trend"), lambda == 1600))),
lty = c(1, 2)
)
model_hp_year <- hpfilter(infl, freq = 6.25, type = "lambda")
ts.plot(infl, fitted(model_hp_year), lty = c(1, 2))
legend(
"topright",
# legend = c("KPIF", "HP-filtered trend, lambda=6.25"),
legend = c("KPIF", expression(list(plain("HP-filtered trend"), lambda == 6.25))),
lty = c(1, 2)
)
\[ x_{t + h} = \beta_0 + \beta_1 x_t + \beta_2 x_{t - 1} + \beta_3 x_{t - 2} + \beta_4 x_{t - 3} + \epsilon_{t + h} \]
\[ x_t = \beta_0 + \beta_1 x_{t - 24} + \beta_2 x_{t - 25} + \beta_3 x_{t - 26} + \beta_4 x_{t - 27} + \epsilon_t \]
model_hamilton <- dynlm(infl ~ L(infl, 24:27))
# shorthand for:
# model_hamilton <- dynlm(infl ~ L(infl, 24) + L(infl, 25) + L(infl, 26) + L(infl, 27))
ts.plot(infl, fitted(model_hamilton), lty = c(1, 2))
legend(
"topright",
legend = c("KPIF", "Hamilton's trend"),
lty = c(1, 2)
)
Compare all:
ts.plot(
infl, fitted(model_dt), fitted(model_hp), fitted(model_hamilton),
lty = 1:4,
col = 1:4,
lwd = c(1, 1, 2, 2)
)
legend(
"topright",
legend = c("KPIF", "Deterministic trend", "HP-filtered trend", "Hamilton's trend"),
lty = 1:4,
col = 1:4
)