Exercise Set 07

Preamble

This exercise set is slightly different from the styles of past exercise sets. Your submission requires you to write down the R codes you have used. Why do I ask you to do this? I want you get used to writing code for the exam and I think there are situations (say, interviews), where you might be asked to write code on-the-fly, especially if you claim you know how to use R.

Technical Exercise 1: standard normal probabilities and Monte Carlo simulation of probabilities

Suppose that \(Y\) is distributed as \(N(1,4)\), denoted as \(Y\sim N(1,4)\). Your task is to find \(\mathbb{P}\left(-2 \leq Y \leq 2.5 \right)\). Let \(Z\) be the standardized form of \(Y\).

Express \(\mathbb{P}\left(-2 \leq Y \leq 2.5 \right)\) as a probability involving \(Z\). Make sure to show the details.
Write down the R code involved in calculating the desired probability.
What if you do not know how to standardize or use R to calculate this probability? You can use Monte Carlo simulation to estimate this probability very well. One starting point would be to draw 10000 IID random samples from the distribution of \(Y\), which is known to you. Write down R code needed to obtain this estimate.
Run your code and report your estimate and its standard error.
The Monte Carlo simulation you will be implementing is justified by the theory here. Can you determine what plays the role of “IID random variables \(X_1,\ldots,X_n\)”? Can you also determine the actual value of \(\mu\) and \(\sigma^2\) in the context of estimating \(\mathbb{P}\left(-2 \leq Y \leq 2.5 \right)\)?
Can you check whether your R code in #3 is functioning correctly? You can use replicate() or write a loop if you wish.

Technical Exercise 2

Let me change the main question of Technical Exercise 1. What if you only just know that there is some random variable \(Y\) with some distribution. Your task is to find \(\mathbb{P}\left(-2 \leq Y \leq 2.5 \right)\) again.

Are you able to accomplish this task? Can you use Monte Carlo simulation to estimate this probability? Explain why or why not.
Suppose you were able to obtain the following random sample from the distribution of \(Y\). What would be your estimate of \(\mathbb{P}\left(-2 \leq Y \leq 2.5 \right)\)? Report an appropriate standard error for your estimate.

 [1] -4.933  6.452  3.613 -5.303 11.764 11.756 -6.159  9.336  1.296  3.100
[11]  3.159 -3.744  7.731 -5.022 -0.084  9.778 12.481 -4.032  0.786 -7.350
[21]  5.562 -0.474  9.412 -5.689 -1.360  1.753 -5.717 -1.145 12.178 -6.088
[31] -8.771 -5.378  8.824 10.115  2.314  4.798  9.577 -2.733  5.679 -5.690

Comment on whether you think the distribution of \(Y\) is normal.

Technical Exercise 3: proving the unbiasedness of \(\overline{X}\) for \(\mu=\mathbb{E}\left(X_t\right)\)

Note that a consequence of the axioms is that \[\mathbb{E}\left(\sum_{j=1}^n c_jX_j\right)=\sum_{j=1}^n c_j \mathbb{E}\left(X_j\right)\]

Recall our setup in class where \(X_1,\ldots,X_n\) are IID random variables with a common population mean \(\mu=\mathbb{E}\left(X_t\right)\) and common population variance \(\sigma^2=\mathsf{Var}\left(X_t\right)\).

Use this result to show that \(\mathbb{E}\left(\overline{X}\right)=\mu\). Indicate the assumptions or results you have applied in every step of your proof. What do you notice?
Is your proof valid for any value of \(n\)? Explain.

Technical Exercise 4: the problem with unbiasedness

Suppose we now want to learn \(\mu^2\), instead of \(\mu\) (like we did in class). You may think that a natural estimator for \(\mu^2\) is \(\left(\overline{X}\right)^2\). You are going to explore whether this is a good idea from an unbiasedness perspective.

Retain lines 1-5 in this slide. Modify lines 6-9 to reflect the fact that you are interested in \(\left(\overline{X}\right)^2\). Write down your modifications.
Write down R code which will help you evaluate whether or not \(\left(\overline{X}\right)^2\) is unbiased for \(\mu^2\). Describe your findings. What happens for a fixed value of \(n\)? Compare your findings across different values of \(n\).

Technical Exercise 5: back to problems with logarithms

This exercise is based on the California test score dataset. The description of the dataset may be found here. Below you will find regressions involving different transformations of per capita income and test scores.

mcas <- read.csv("mcas.csv")
lm(log(totsc4) ~ percap, data = mcas)


Call:
lm(formula = log(totsc4) ~ percap, data = mcas)

Coefficients:
(Intercept)       percap  
    6.52186      0.00229

lm(totsc4 ~ log(percap), data = mcas)


Call:
lm(formula = totsc4 ~ log(percap), data = mcas)

Coefficients:
(Intercept)  log(percap)  
      600.8         37.7

Use the results of the first regression. When we compare districts with per capita incomes that differ by 1000 dollars, what would be the difference in the average log test scores of these districts?
Use the results of the first regression. When we compare districts with per capita incomes that differ by 1000 dollars, what would be the difference in the average test scores of these districts? (Take note of the difference with the previous item.) Use this exercise to calculate both the exact and the approximated difference.
Use the results of the second regression. When we compare districts with log per capita income that differ by 0.01, what would be the difference in the average test scores of these districts?
Use the results of the second regression. When we compare districts with per capita income that differ by 1 percent, what would be the difference in the average test scores of these districts?
The answers to #3 and #4 are related and should roughly be the same. Explain why. How would your answers to #3, and #4 change if we compare districts with log per capita income that differ by 0.5?

What you will be expected to do

You will be submitting to my email a zip file (not rar, not 7z) with filename surname_exset07.zip, replacing surname with your actual surname, and making sure it contains scanned PDF solutions to the technical exercises (do be mindful of the size of the file, keep under 15 MB if possible) with filename surname_tech07.pdf.