Exercise Set 02

Technical exercise 1

Let \(X\) and \(Y\) be variables indexed by \(t=1,\ldots,n\). Let \(a\) and \(b\) be constants. Then,

\[ \sum_{t=1}^n b = nb \tag{1}\]

\[ \sum_{t=1}^n aX_t = a\sum_{t=1}^n X_t \tag{2}\]

\[ \sum_{t=1}^n \left(X_t+Y_t\right) = \sum_{t=1}^n X_t + \sum_{t=1}^n Y_t \tag{3}\]

In the following, make sure to cite which of the properties you have used (labels for the properties are available).

  1. Prove that \[ \sum_{t=1}^n \left(aX_t+bY_t\right) = a\sum_{t=1}^n X_t + b\sum_{t=1}^n Y_t \]

  2. Let \(\overline{X}\) and \(\overline{Y}\) be the corresponding means of both sets of variables. Prove that you have three ways of expressing the following sum: \[ \begin{eqnarray} \sum_{t=1}^n \left(X_t-\overline{X}\right)\left(Y_t-\overline{Y}\right) &=& \sum_{t=1}^n \left(X_t-\overline{X}\right)Y_t \\ &=& \sum_{t=1}^n X_t\left(Y_t-\overline{Y}\right) \\ &=& \sum_{t=1}^n X_tY_t -n\overline{X}\cdot\overline{Y} \end{eqnarray} \]

Technical exercise 2

Suppose we have a variable represented by \(X\) and the \(t\)th observation is given by \(X_t\) for \(t=1, 2,\ldots, n\). Let \(Y_t=aX_t+b\), where \(a\) and \(b\) are constants. Let \(\overline{X}\) and \(\overline{Y}\) be the corresponding means of both sets of variables.

  1. Is \(Y_t\) a linear transformation of \(X_t\)? Explain.
  2. Show that, in general, we must have \[\begin{eqnarray}\overline{Y} &=& a\cdot \overline{X} + b \\ \frac{1}{n}\sum_{t=1}^n \left(Y_t-\overline{Y}\right)^2 &=& a^2\cdot \frac{1}{n} \sum_{t=1}^n \left(X_t-\overline{X}\right)^2\end{eqnarray}\]
  3. If \(Y_t\) is on a standardized scale, what would \(a\) and \(b\) be equal to?
  4. If \(Y_t\) is on a standardized scale, use some of your previous answers to find the mean and the standard deviation of the \(Y_t\)’s.

Technical exercise 3

You are going to be working out the details of regression with only an intercept. Let \(Y_t\) be the \(t\)th observation of the regressand. Recall that lm() is OLS and that we are minimizing a sum of squared residuals.

Since our regression line for this case is just \(\widehat{Y}_t=\widehat{\beta}_0\), where \(\widehat{\beta}_0\) is just some constant to be determined, you should be able to use what you learned in mathematical economics to minimize \[\sum_{t=1}^n \left(Y_t-\widehat{Y}_t\right)^2=\sum_{t=1}^n \left(Y_t-\widehat{\beta}_0\right)^2=\sum_{t=1}^n Y_t^2-2\widehat{\beta}_0\sum_{t=1}^n Y_t+n\widehat{\beta}_0^2 \tag{4}\] with respect to \(\widehat{\beta}_0\).

  1. Provide the details as to how to obtain the expression after the second equality sign in Equation 4.
  2. Find the optimal value of \(\widehat{\beta}_0\). Does it agree with our example from the slides?
  3. Given #2, what is the minimized value of Equation 4? Is there a known name for this quantity?
  4. What will be the fitted values for each observation in this case?
  5. What will be the residuals for each observation in this case?
  6. What will be the average of the fitted values? Prove your finding.
  7. What will be the average of the residuals? Prove your finding.

Learning to install packages in R and to work with Quarto

In this exercise, you will be installing your first R package. Take note that so far I have never asked you to install any R package. So far, we have been doing all things in base R.1 To learn more about R packages, proceed to fasteR Lesson 25.

You will be installing the authoring system called Quarto. This allows you to create documents in HTML, PDF, Word formats using only a text file. In addition, you can blend math and R output together into your document.

  1. Go to the Get Started with Quarto website and then download the Quarto CLI for your operating system. Install Quarto CLI.
  2. Next, use the directions at the course website to install an R package called quarto.
  3. To test whether your installation works, download the template named template.qmd. Make sure you know the path to this file in your computer. You can open this in a text editor that is available in your computer! There is no need to install anything else. Try it (Windows has Open With, and choose Notepad). Reach out if you are having trouble opening the file in a text editor.
  4. Open R. Load the package quarto. Run the command quarto_render("template.qmd"). Modify the path to suit your situation. The important thing is to point to template.qmd.
  5. In the directory where you have template.qmd, you will find an HTML file called template.html. You can open this in any browser and you could see how Quarto transformed your text file into something that looks professional.
  6. Every time you edit the template, you would have to do quarto_render() again to regenerate the HTML file.

Authoring your first Quarto document

You will create your first Quarto document containing answers to a short data analysis of teaching evaluations. The dataset is based on the paper by Hamermesh and Parker (2005) who study the relationship between instructional ratings and the “beauty” of the instructor. For this exercise, there is no need to read the paper, but you will be asked to refer to Table 1 of the paper.

Things to do first

  1. The dataset is in Stata format (file extension is .dta) and it could be downloaded here.

  2. Install the R package called foreign. The package foreign has a command called read.dta(). Consult the help file about what Stata versions it can support. Find out what is the current version of Stata. This is something to keep in mind, in case you encounter newer Stata datasets.2

  3. The description of the variables can be downloaded here.

What your first Quarto document should contain

You will be using R, along with Quarto, for this exercise.

  1. Using a text editor, modify template.qmd to suit your situation and to put in your answers to the questions below. Make sure to include lines of code for loading the foreign library and use read.dta() to load the dataset. You can always generate the readable HTML file using quarto_render() and then opening the HTML file.
  2. Fit a least squares regression of course evaluations (the regressand) on beauty (the regressor). Adapt the code from the slides to produce a scatterplot and display the fitted regression line. Explain the meaning of the regression coefficients obtained.
  3. Now fit a least squares regression of course evaluations on the sex of the instructor. Adapt the code from the slides to produce a scatterplot and display the fitted regression line. Compute the average course evaluation for males and females separately. How do these connect with the interpretation of least squares regression coefficients? Discuss.
  4. Recall the compensation-sales relationship in class. Why would it be harder to do something similar to #3? Discuss.

What you will be expected to do

You will be submitting to my email a zip file containing

  1. Scanned PDF solutions to the technical exercises (do be mindful of the size of the file, keep under 15 MB if possible)
  2. Your qmd file: rename it to surname_exset02.qmd and replace surname with your actual surname.

Footnotes

  1. If you search for tutorials in the internet, you will see other ways of doing things but they sometimes are more complicated to work on. For now, stick to base R unless otherwise instructed.↩︎

  2. Alternatively, you could directly load from the URL just like in Lesson 10 of fasteR.↩︎