Welcome to Applied Econometrics 1!
Someone once said that, "There is nothing more dangerous than an undergrad armed with OLS." Unfortunately, this is also true for graduate students up to experienced researchers. In response to this, I revisit both the key conceptual and mathematical ideas behind probability theory, statistical theory, and econometric theory. In this course, we try to answer some research questions related to classic questions in economics with the help of economic theory, the data, and econometric methods.
Information about using the slides
Lecture Materials for Applied Econometrics 1 (2022 version) by Andrew Adrian Pua is licensed under Attribution-ShareAlike 4.0 International
To cite these slides, please use
Pua, Andrew Adrian. 2022. "Lecture Materials for Applied Econometrics 1 (2022 Version)." https://applied-metrics.neocities.org/.
Individual paper details
Practice exercises here
Week 10:
For this week, I will talk about how linear regression, which is about making predictive comparisons, may provide causal effects and how instrumental variables could be a strategy in case linear regression does not work.
- Week 10 slides more suitable for your own annotations, first version 2022.12.01, current version 2022.12.03
- Week 10 slides, with class annotations up to 2022.12.02
- 2022.12.01: Working out some of the details of Mankiw, Romer, and Weil (1992) (paper and dataset)
- 2022.12.02: The difference between a statistical model and a structural / causal model, under what circumstances can OLS recover a causal effect of interest?, instrumental variable strategies, work out the core ideas and computational issues with Angrist and Krueger (1991) (paper and dataset)
Week 9:
For this week, I will talk about expressing uncertainty and conducting inference in linear regression settings. I also introduce the bootstrap principle (if only we have time: but we didn't).
- Week 9 slides more suitable for your own annotations, first version 2022.11.17
- Week 9 slides, with class annotations up to 2022.12.01
- 2022.11.17: Wrapping up conditional expectations, convergence in distribution, central limit theorem (CLT), how to apply the CLT, key idea behind confidence intervals
- 2022.11.18: Simulations of confidence intervals, key ideas behind hypothesis testing
- 2022.11.24: Class cancelled, postponed to 2022.12.02
- 2022.11.25: Working out the details of Exercise Set 8, the different flavors of the linear regression model
- 2022.12.01: More on the different flavors of the regression model, key asymptotic results, practice with Hamermesh and Parker (2005), wrap up
Weeks 5 to 8:
For these weeks, I will talk about what we could learn from running regressions, while introducing basic concepts of probability and statistical inference.
- How to work with the annotated version of the slides: a tutorial
- Week 5 slides more suitable for your own annotations, first version 2022.09.22, current version 2022.10.27
- Week 5 slides, with class annotations up to 2022.11.11, (sorry, I forgot to save the related annotations for 2022.11.17!), CEF/BLP calculation for Aronow and Miller example here
- 2022.10.20: Recap of weeks 1 to 4, short and long regression coefficients, Monte Carlo simulation, long-run behavior of procedures
- 2022.10.21: Recap, more on behavior of procedures, random variables, expected values, discrete vs continuous cases
- 2022.10.27: Recap, moments (expected value, variance, standard deviation of random variables), covariances, fake vs real coin sequences (more simulation), independence vs uncorrelatedness
- 2022.10.28: Class cancelled, postponed to 2022.12.01
- 2022.11.03: Recap, special distributions (Bernoulli, binomial, uniform, normal), learning a population mean
- 2022.11.04: Learning a population mean, learning parameters of a classic linear regression model
- 2022.11.10: Convergence concepts, best linear prediction, what does lm() help you learn in the long run?, "modern" version of the linear regression model
- 2022.11.11: The concepts behind conditional expectations, CEF versus best linear prediction, estimating CEFs vs estimating BLPs
Weeks 1 to 4:
For these weeks, I talk about what the course is all about and move on to the concept of a distribution and the different ways of describing them. This also serves as a crash course in linear regressions from descriptive point of view.
- Before we meet on 2022.09.22: Install R (instructions below), finish surveys, work on fasteR Lessons 2 to 15, 21 to 23.
- Give yourself about two to three days to do all these. You may not necessarily understand everything on the first try.
- When you work on fasteR, my suggestion is to type the commands as much as you could. This is for muscle memory, for building practical skills, and to expose yourself to syntax errors.
- Weeks 1 to 4 slides, first version 2022.09.13, current version 2022.10.07
- 2022.09.22: Housekeeping, getting to know each other's experiences, expectations for the class and the environment
- 2022.09.23 (Board notes): Describing the data you have, analyzing pre-course survey results (show some R commands in the process, cleaned survey data here, R code here), cleaning small datasets, executive compensation case study (histograms, boxplots, numerical summaries)
- 2022.09.29 (Board notes): recap, continue executive compensation study (numerical summaries, z-scores, transformations, review summation notation, logarithms, typical algebraic mistakes)
- 2022.09.30 (Board notes): recap, more on effects of transformations, move on to linear regressions (how to interpret, what does lm() do, terminology, and properties of objects from lm())
- 2022.10.06 (Board notes): recap, feedback on Exercise Set 02, correlation and the meaning of the word "regression", ANOVA, reading some more of the output from lm(), starting the mathematics behind lm() while reviewing some matrix algebra
- 2022.10.07 (Board notes): recap, finish most of the mathematics behind lm(), situations of perfect multicollinearity, more on linear regressions with dummy variables, different ways of writing the regression slope (covariance divided by variance, weighted average of the regressand, weighted average of slopes of point pairs), Frisch-Waugh-Lovell theorem (turn multiple regression to simple regression), started linear regressions with logarithmically transformed variables
- 2022.10.13 (Board notes): recap, finished linear regressions with logarithmically transformed variables (illustrated some issues with interpretation of coefficients), linear regressions with dummy variables and interaction terms
- 2022.10.14 (Board notes): recap, more on interaction terms, more illustrations of improving interpretation of regression coefficients, move on to the behavior of regression coefficients
Highlighted resources and links:
- Exercise Set 08, first version 2022.11.18, deadline 2022.11.24 1130 UTC+1 (pay attention to the time!!), solutions to technical exercises
- Datasets used in Weeks 5 to 8: CPS5, California test scores dataset (mcas), Pearson heights dataset
- Exercise Set 07, first version 2022.11.05, deadline 2022.11.10 1130 UTC+2, solutions to technical exercises, qmd file
- Exercise Set 06, first version 2022.10.29, deadline 2022.11.03 1130 UTC+2, solutions to technical exercises, possible solution to Technical Exercise 4 about interpretations, Excel file containing balls in a container worksheet
- Exercise Set 05, first version 2022.10.22, deadline 2022.10.31 1730 UTC+2, solutions to technical exercises
- Optional: Proof of FWL here
- Exercise Set 04, first version 2022.10.15, deadline 2022.10.19 1730 UTC+2, solutions to technical exercises, a possible solution to the R exercise, qmd file
- Exercise Set 03, first version 2022.10.08, current version 2022.10.11 (added required HTML submission, fixed typo), deadline 2022.10.13 1730 UTC+2, solutions to technical exercises, solutions to the first part of the R exercises, qmd file
- Exercise Set 02, first version 2022.10.01, current version 2022.10.04 (fixed typo), deadline 2022.10.05 1730 UTC+2, solutions to technical exercises, solutions to R exercises, qmd file
- Exercise Set 01, current version 2022.09.23, deadline 2022.09.28 1730 UTC+2
- CEO compensation datasets from Stine and Foster (2004): File 1, File 2
- Pre-course survey
- How to install R for the first time? Instructions here
- Survey to check whether you have successfully installed R
- fasteR: Fast lane to learning R!
- Typo in Lesson 3: Instead of 1951-1971, it should be 1951-1970.
- Typo in Lesson 4: Instead of "So the 4th, 8th, 9th etc. elements in Nile had the queried property. (Note that those were years 1875, 1879 and so on.)", it should have been 1870 + which(Nile > 1200). The fourth entry of the Nile series is the year 1874, as the data started from 1871.
- A problem with "Your turn" in Lesson 10: There are no missing years, at least based on the documentation and the Nile dataset. The gap means that we don't have any year that has an annual flow between 500 and 600.
Study and time management resources