# Number of observations
n <- 50
# How many times to loop
reps <- 400
# Storage for OLS results (2 entries per replication)
beta.store <- matrix(NA, nrow=reps, ncol=2)
# Create a directory to store plot pictures
dir.create(paste(getwd(), "/pics/", sep=""))
# Monte Carlo loop
for (i in 1:reps)
{
X.t <- rbinom(n, 1, 0.3) # Generate X
eps.t <- (rnorm(n, 0, 4))*(X.t == 1)+(rnorm(n, 0, 1))*(X.t == 0)
Y.t <- 3 + 2*X.t + eps.t # Generate Y
temp <- lm(Y.t ~ X.t)
beta.store[i,] <- coef(temp)
filename <- paste(getwd(), "/pics/", i, ".png", sep="")
png(filename)
plot(X.t, Y.t)
abline(temp)
graphics.off()
}
Exercise Set 03
Technical exercise 1: a step towards understanding the regression slope as comparisons
Let \(X\) be a variable indexed by \(t=1,\ldots,n\). It can be shown that, in general, we have
\[ \sum_{t=1}^n \left(X_t-\overline{X}\right)\left(Y_t-\overline{Y}\right) = \sum_{i=1}^n\sum_{j=1}^n \left(X_j-X_i\right)\left(Y_j-Y_i\right) \tag{1}\] What you are going to do is prove this for \(n=2\) only so that you get a feel for Equation 1 could come to be. Make sure to cite which of the properties you have used (labels for the properties are available).
- Can you distinguish among the following expressions: \[\sum_{t=3}^6 X_t \ \ , \ \ \sum_{j=3}^6 X_j \ \ \ , \ \ \sum_{j=3}^6 X_t\] Which are equal to each other? Which are very different from each other?
- Write out or flesh out the components of the expression \[\sum_{i=1}^3\sum_{j=2}^3 X_{ij}\]
- Set \(n=2\) in Equation 1 and prove the equality.1
Technical exercise 2: Linear regression with linearly transformed variables
Suppose a linear regression of \(Y\) on \(X_1\) (with an intercept) was computed. As always, let \(X_{1t}\) and \(Y_t\) be the \(t\)th observation of the regressand and the regressor, respectively, for \(t=1, 2,\ldots, n\).
Suppose we linearly transformed the data. Let \(W_t=aX_{1t}+b\), where \(a\) and \(b\) are constants. In addition, let \(Z_t=cY_t+d\), where \(c\) and \(d\) are constants.
- Write down the formula or expression for the regression slope for the linear regression of \(Y\) on \(X_1\) in this context. There is no need to derive it.
- After transforming both \(X_1\) and \(Y\) to \(W\) and \(Z\), respectively, write down the formula or expression for the regression slope for the linear regression of \(Z\) on \(W\) in this context. There is no need to derive it.
- Focus on the formulas in items #1 and #2. Use the results of Technical Exercise 2 in Exercise Set 02 (with the appropriate modifications to the variables and constants) to show that after transforming both \(X_1\) and \(Y\) to \(W\) and \(Z\), respectively, and determine how the regression slope in #2 is related to the regression slope in #1.2
Technical exercise 3: Another special case
You are going to be working out the details of regression with only one regressor and without an intercept. Let \(Y_t\) be the \(t\)th observation of the regressand. Recall that lm()
is OLS and that we are minimizing a sum of squared residuals.
Since our regression line for this case is just \(\widehat{Y}_t=\widehat{\beta}_1X_{1t}\), where \(\widehat{\beta}_1\) is just some constant to be determined, you should be able to use what you learned in mathematical economics to minimize \[\sum_{t=1}^n \left(Y_t-\widehat{Y}_t\right)^2=\sum_{t=1}^n \left(Y_t-\widehat{\beta}_1X_{1t}\right)^2 \tag{2}\] with respect to \(\widehat{\beta}_1\).
- Find the optimal value of \(\widehat{\beta}_1\).
- What will be the average of the residuals? Do you think it is zero? Prove your finding.
What you will be expected to do
You will be submitting to my email a zip file (not rar, not 7z) with filename surname_exset03.zip
containing
- Scanned PDF solutions to the technical exercises (do be mindful of the size of the file, keep under 15 MB if possible) with filename
surname_tech03.pdf
- Your qmd file with filename
surname_exset03.qmd
and replacesurname
with your actual surname. - The HTML file associated with your qmd file.
Footnotes
Completely optional for this exercise set: Try \(n=4\). Pretty soon, you should be able to figure out what happens for general \(n\).↩︎
Completely optional for this exercise set, but for additional practice: There are many things to explore such as, what happens to the intercepts, what happens if only one of the variables were transformed, what happens if you standardize. You could also explore what happens to R-squared and other quantities from the output of
lm()
.↩︎You might need to load files from previous lessons related to the
pima
dataset.↩︎Alternatively, you could copy and paste the entire code onto the R console to give you a sense of what is happening, so that you can plan out how you would complete the exercises.↩︎
To find your working directory, type
getwd()
in the R console.↩︎