Demonstrating some of the math behind `lm()`

Solutions

Preamble

rm(list=ls())
library(foreign)
ratings <- read.dta("TeachingRatings.dta")

Item 1

lm(course_eval ~ beauty - 1, data = ratings)


Call:
lm(formula = course_eval ~ beauty - 1, data = ratings)

Coefficients:
beauty  
 0.133

mean(residuals(lm(course_eval ~ beauty - 1, data = ratings)))

[1] 3.998272

The residuals do not have mean zero.

Item 2

adj.course_eval <- residuals(lm(course_eval ~ beauty, data = ratings))
adj.female <- residuals(lm(female ~ beauty, data = ratings))
lm(adj.course_eval ~ adj.female - 1, data = ratings)


Call:
lm(formula = adj.course_eval ~ adj.female - 1, data = ratings)

Coefficients:
adj.female  
   -0.1978

lm(course_eval ~ beauty + female, data = ratings)


Call:
lm(formula = course_eval ~ beauty + female, data = ratings)

Coefficients:
(Intercept)       beauty       female  
     4.0816       0.1486      -0.1978

Thus, the slope of adj.female matches the slope obtained from the slope of female in the linear regression of course evaluations on beauty and female.

When we compare courses which have instructors with the same level of beauty rating, the average course evaluations of female instructors is 0.2 points lower than the average course evaluations of male instructors

Item 3

results <- lm(course_eval ~ beauty + female, data = ratings)
mean(fitted(results))

[1] 3.998272

mean(ratings$course_eval)

[1] 3.998272

mean(residuals(results))

[1] 9.819289e-18

cor(fitted(results), residuals(results))

[1] -1.611174e-15

cor(ratings$beauty, residuals(results))

[1] -6.408526e-17

cor(ratings$female, residuals(results))

[1] -1.168853e-16

Remarks for Item 3

These do not have to be part of the solution. You may choose to assign names to the objects generated by fitted() and residuals(). You can also make these fitted values and residuals part of the original data set. You can also use the with() to help you reduce typing ratings so many times. Code related to these remarks, which I left unevaluated, is presented below.

results <- lm(course_eval ~ beauty + female, data = ratings)
results.f <- fitted(results)
# or something like ratings$fitted <- fitted(results)
mean(results.f)
results.r <- residuals(results)
# or something like ratings$residuals <- residuals(results)
with(ratings, mean(course_eval))
mean(results.r)
cor(results.f, results.r)
# or something like with(ratings, cor(fitted, residuals))
cor(ratings$beauty, results.r)
cor(ratings$female, results.r)