Exercise Set 08

Technical Exercises

Technical Exercise 1: CEF and BLP calculations

Let \(X\) and \(Y\) be random variables having the following joint distribution. \(X\) and \(Y\) both take on three possible values \(\{0,1,2\}\). This joint distribution encodes joint probabilities, for example, \(\mathbb{P}\left(X=0, Y=2\right)=0\).

			Y
		2	1	0
	2	0.2	0.1	0
X	1	0.1	0.2	0.1
	0	0	0.1	0.2

Compute the CEF of \(Y\) from \(X\).
Compute the BLP of \(Y\) given \(X\).
Derive the distribution of the CEF error \(\varepsilon=Y-\mathbb{E}\left(Y|X\right)\).
Is it the case that \(\mathbb{E}\left(X^2\varepsilon\right)=0\)?¹ Show why or why not.

Technical Exercise 2: Past exam question on optimal prediction

Suppose a population has the following characteristics of interest to a researcher: \(\mathbb{E}\left(Y|X\right)=X^{2}\) and that \(X\sim N\left(0,1\right)\). Note that \(\mathbb{E}\left(X\right)=0\), \(\mathbb{E}\left(X^{2}\right)=1\), \(\mathbb{E}\left(X^{3}\right)=0\), \(\mathbb{E}\left(X^{4}\right)=3\). Let \(\varepsilon\) be the corresponding CEF error \(Y-\mathbb{E}\left(Y|X\right)\).

Determine whether each statement is true or false. After doing so, either verify or correct the mistake.
- The best prediction under mean squared error loss for a unit’s \(Y\) when \(X=2\) is \(4\).
- The best prediction under mean squared error loss for a unit’s \(Y\) is \(4\).
- The best prediction under mean squared error loss for a unit’s \(\varepsilon\) when \(X=2\) is \(2\).
Find the best linear predictor \(\beta_{0}^{*}+\beta_{1}^{*}X\). Show your work.
Suppose a researcher obtains a random sample from the described population. What does this researcher end up learning or recovering in large samples when running a regression of \(Y\) on \(X\) along with an intercept? Be specific in your response as you will use the given information along with your response in #2.
Let us change the question a bit in #3. What do you think this researcher will end up learning or recovering in large samples when running a regression of \(Y\) on \(X\) and \(X^2\) along with an intercept?

Technical Exercise 3: Past exam question about CEFs

Suppose a researcher specifies an equation where \(Y=\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}+v\), where \(v\) is some unobserved error term. Consider the following two assumptions:

A1 \(\mathbb{E}\left(v|X_{1},X_{2}\right)=0\)

A2 \(\mathbb{E}\left(v|X_{1},X_{2}\right)=\mathbb{E}\left(v|X_{2}\right)=\delta_{0}+\delta_{1}X_{2}\)

Without imposing any assumptions like A1 or A2 on the equation, can she conclude that \(\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}\) is the CEF of \(Y\) given \(X_{1}\) and \(X_{2}\)?
If the researcher imposes A1, can she conclude that \(\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}\) is the CEF of \(Y\) given \(X_{1}\) and \(X_{2}\)? Show your work.
The CEF of \(Y\) given \(X_{1}\) and \(X_{2}\) is actually the same as the BLP of \(Y\) given \(X_{1}\) and \(X_{2}\) the same under Item 2. Explain why without actually calculating the optimal coefficients of the BLP.²
Suppose the researcher finds A1 implausible and instead assumes A2. Can we still conclude that \(\beta_{0}+\beta_{1}X_{1}+\beta_{2}X_{2}\) is the CEF of \(Y\) given \(X_{1}\) and \(X_{2}\)? Show your work.
Calculate the following predicted comparisons \[\begin{eqnarray*} \mathbb{E}\left(Y|X_{1}=x_{1}+1,X_{2}=x_{2}\right)-\mathbb{E}\left(Y|X_{1}=x_{1},X_{2}=x_{2}\right) \\ \mathbb{E}\left(Y|X_{1}=x_{1},X_{2}=x_{2}+1\right)-\mathbb{E}\left(Y|X_{1}=x_{1},X_{2}=x_{2}\right)\end{eqnarray*}\] under assumptions A1 and A2.
Given what you find in Items 2, 4, and 5, what does this researcher end up learning or recovering in large samples when running a regression of \(Y\) on \(X_{1}\) and \(X_{2}\) along with an intercept? Answer this question for two cases: one where A1 alone is imposed and one where A2 alone is imposed.
Comment on your findings in #5 and #6. What do you expect to learn when you execute lm(Y ~ X1 + X2) under A1? Under A2?

Optional Exercises

Optional Technical Exercise 2: Conditional variances

The notion of the unconditional variance can be extended to the case of the conditional variance. Recall that the unconditional variance of \(Y\) is just \(\mathsf{Var}\left(Y\right)=\mathbb{E}\left[\left(Y-\mathbb{E}\left(Y\right)\right)^2\right]\). The conditional variance can be defined as \[\mathsf{Var}\left(Y|X=x\right)=\mathbb{E}\left[\left(Y-\mathbb{E}\left(Y|X=x\right)\right)^2|X=x\right]\]

Return to I SEE THE MOUSE. Let \(Y\) be the number of letters in the selected word. Let \(X\) denote the number of E’s in the word. Calculate \(\mathsf{Var}\left(Y|X=0\right)\), \(\mathsf{Var}\left(Y|X=1\right)\), and \(\mathsf{Var}\left(Y|X=2\right)\) by applying the definition. Do you find something curious about your results, especially in light of optimal prediction?
Just like the conditional expectation, the conditional variance can be thought of as a random variable \(\mathsf{Var}\left(Y|X\right)\) having its own distribution. Obtain the distribution and compute \(\mathbb{E}\left(\mathsf{Var}\left(Y|X\right)\right)\).
Verify the law of total variance found here for I SEE THE MOUSE.

Optional Technical Exercise 3:

Suppose for the sake of argument that \(\mathbb{E}\left(Y|X\right)=\beta_{0}+\beta_{1}X+\beta_{2}X^{2}\). You are going to work out predicted differences in \(Y\) when groups differ in their \(X\) by a very small amount.

Find \(\dfrac{\partial\mathbb{E}\left(Y|X=x\right)}{\partial x}\). Compare and discuss what happens when \(\beta_{2}=0\) against \(\beta_{2}\neq0\).
In some applications, practitioners are interested in turning points, i.e. points where the relationship between \(Y\) and \(X\) changes. When will \(\dfrac{\partial\mathbb{E}\left(Y|X=x\right)}{\partial x}>0\) (meaning that \(\mathbb{E}\left(Y|X=x\right)\) is an increasing function of \(x\)) and \(\dfrac{\partial\mathbb{E}\left(Y|X=x\right)}{\partial x}<0\)? Derive the value of \(x\) which represents the turning point given this information.

Optional R Exercise 1: Applying CLT versus Monte Carlo simulation

Write down R code which will enable you to use Monte Carlo simulation to approximate \(\mathbb{P}\left(\overline{X}\leq 0.4\right)\) using the details found here. Compare your result with the large-sample approximation found here.

Optional R Exercise 2: Exploring the sampling distribution of \(S_n\)

This exercise intends for you to explore the mean-squared convergence of \(S_n\) to \(\sigma\). Although we did not discuss it in class, intuitively, you might have realized that there might be a connection between mean-squared convergence and convergence in probability. In fact, mean-squared convergence implies convergence in probability.

As you recall, \(S_n\) is the sample standard deviation of \(X_1,\ldots, X_n\). Write R code which draws random numbers from some distribution (you decide), and then compute \(S_n\). Do this repeatedly in order to produce the simulated sampling distribution of \(S_n\). Try different values of \(n\) and explore what is happening to the center and the spread of the simulated sampling distribution of \(S_n\).

Do you observe that there is evidence from the Monte Carlo that \(\mathbb{E}\left(S_n\right) \to \sigma\) and \(\mathsf{Var}\left(S_n\right) \to 0\)?

What you will be expected to do

You will be submitting to my email a zip file (not rar, not 7z) with filename surname_exset08.zip, replacing surname with your actual surname, and making sure it contains scanned PDF solutions to the technical exercises (do be mindful of the size of the file, keep under 15 MB if possible) with filename surname_tech08.pdf. You are not required to submit the optional technical exercises, but feel free to work on them.

Footnotes

There are many ways to do this. Try out many ways of answering this question.↩︎
Note that I never gave the most general formula for deriving the optimal coefficients of the BLP.↩︎