Individual paper

Changelog

2022-12-04: Details of the third and fourth milestones are released. Deadline for the third milestone is extended for an extra week.

2022-11-18: Updated article selections, added some questions raised by students.

2022-11-11: Details of the second milestone are released.

2022-11-04 to 2022-11-10: Some article selections were made.

2022-11-03: A first version of this page was created.

Some questions which were asked

Major tasks for the individual paper

The individual paper is mainly a report written on Quarto documenting and discussing your attempts to reproduce and extend the analyses conducted by the authors of a chosen article from the list below. You will be applying everything that you have learned in the course (econometrics both theory and applied, perhaps some Monte Carlo simulations, coding in R, writing Quarto documents) and perhaps more, depending on your interests and learning goals. Because it is a formal report, references and citations are required.

You have recently done an exercise to reproduce the findings in Hamermesh and Parker (2005). But the exercise is based on a dataset which was already cleaned and made available to you. You will now be pursuing something more challenging.

There are two major aspects of the individual paper:

  1. You will be creating the dataset from scratch based on the descriptions given by the authors of the article. You are not allowed to contact the authors of the article. The major reason is so that you would have the chance to really dig into data cleaning, learn about the dataset you are using, and perhaps learn some other tools depending on your circumstance. It is desirable to achieve a perfect reproduction of the results and your attempts should be in line with that goal. But because I want you to think of the individual paper as more of a learning experience, the grading does not depend on whether you have achieved a perfect reproduction of the results. This also does not mean that you can haphazardly do the reproduction, as it will become apparent from your report and code.
  2. You will be extending the analyses made by the authors. The extensions could include but are not limited to: changing the sample considered, adding additional years to the dataset, changing to another country’s dataset, using a different dataset from the same country, considering a different set of regressands or regressors, changing techniques, or carrying out crucial checks which are suitable for the situation. It does not mean that all of these extensions could be done. In fact, it would be desirable to have an extension that is well-motivated and perhaps supported by other findings (whether based on theory or in light of developing facts about the situation) beyond that found in the article. Of course, you should be able to conduct your analyses based on your selected extensions.

Some of the papers have code and data available online in some repository. You may consult these materials, but all processing, cleaning, and analysis have to be done in R. This means that your Quarto document should start from loading the rawest data (meaning that I could be able to go to IPUMS download your rawest data following your documentation), has explicit commands which can trace the processing of the rawest data to the data used for the reproduction and extension. Furthermore, you still have to construct the data from scratch depending on the descriptions in the article.

Where to get the data

Remember the account I asked you to make at IPUMS? All the articles feature data obtainable from IPUMS. It might be a good idea to confine your search to this big source. For the extension, please inform me if you decide to use data that is not readily available at IPUMS. I still want you to be able to finish writing the report.

Article list

You are to select a paper from the list below. Since there are more students than papers, some students are likely to share papers. Only a maximum of 3 students will be allowed to work on a specific article. Priority will be given to those who choose early. I will be updating this page regularly to indicate which articles are more in demand.

Every student still has to work individually and there should be no duplications in the code, the approach to achieve reproducibility, the extensions to be pursued, and of course, the final report.

Here is the list of articles:

Milestones

The individual paper is worth 30 percent of the final grade. There are four milestones. Each of the milestones are graded and contribute to the 30 percent.

  • The first milestone is about carefully selecting the paper you want to work on. You will send an email indicating your choice of paper. This will be due on 2022-11-10, 1730 UTC+1. This milestone is graded on an integer scale from 0 to 2.
  • The second milestone is to do a bit of a writeup about the paper and the planned extensions. The details about this second milestone and will be available soon. This will be due on 2022-11-25, 1730 UTC+1. This milestone is graded on an integer scale from 0 to 3. The writeup here will eventually make its way into your first draft.
  • The third milestone is your first draft. By this point, you should already have reproduced most of the results of the paper you selected, documented the code properly, and perhaps have started working on your extensions. More details related to this first draft will be available on this webpage. This will be due on 2022-12-16 2022-12-23, 1730 UTC+1. This milestone is graded on an integer scale from 0 to 10. While this third milestone is being assessed, you should continue working on the draft and to catch up if you are falling behind. Feedback will be released a week later so that you have time to carry out the revisions and to continue your analysis.
  • The fourth milestone is your revised final version of the individual paper. More details related to this will be available on this webpage. This will be due on 2023-01-09, 1400 UTC+1. This milestone is graded on an integer scale from 0 to 15.

Fourth milestone

The fourth milesetone is your final draft. At the minimum, your submission should be that if I render your qmd file (along with your bib and data) and on my computer, I should be able to reproduce your rendered HTML file with minimal adjustments.

You are also not graded for how much you perfectly reproduced the results of the paper, as it is possible that you do not have complete information based on the article you have chosen.

As for the tables or visualizations, you can use any R package (say, modelsummary, sjPlot, and others) to produce your table or visualization. This table does NOT have to be literally the same tables produced by the authors of your chosen article.

Grading

You are graded for (in no particular order):

  1. Everything you have related to the second and third milestone, along with the revisions
  2. Documentation of how you have constructed the data used in your chosen extension
  3. Discussion of your findings related to the results of your chosen extension: For example, how are things similar or different compared to the findings of your chosen article?
  4. The compatibility of your documentation, findings, tables, and visualizations with your R code (ensure readability and make sure to have enough comments)
  5. At least one well-designed table or well-designed visualization of the most important results which convey the contrast among the original findings, the findings from trying to reproduce the original findings, and the findings from your chosen extension
  6. Cohesiveness and conciseness of your entire individual paper
  7. Completeness of the submission: Quarto document (qmd, bib), your rendered HTML file, and data in a compressed format (zip, gz, 7z, rar are all acceptable)

You get 0 if you do not submit on time. If there are no citations and there are indications of plagiarism, you also automatically get 0 and you will be reported to the administration. Starting from a maximum total integer score of 15, every element that is:

  1. lacking will lead to a deduction of 1
  2. moderately lacking will lead to a deduction of 2
  3. extremely lacking will lead to a deduction of 3

Third milestone

The third milestone is your first draft. At the minimum, your submission should be that if I render your qmd file (along with your bib and data) on my computer, I should be able to reproduce your rendered HTML file with only minimal adjustments.

You are also not graded for how much you perfectly reproduced the results of the paper, as it is possible that you do not have complete information based on the article you have chosen.

Submission

I set up a private folder in Tresorit for each student to upload their files. I will send a link to each student within the week. This private folder is only shared between each student and myself. It is unavoidable that an account will have to be created to work with this private folder. You will upload your files into that folder.

Note that you may also use it as your project cloud for the duration of your research on your individual paper. This is a good way to backup and sync your research files into a cloud. Of course, I am able to see the contents of what you upload. I will delete this folder one month after January 9, 2023.

Grading

You are graded for (in no particular order):

  1. Documentation of how you have reconstructed the data used in your chosen article
  2. Discussion of your findings related to reproducing the results of your chosen article: For example, how are things similar or different? What choices did you make when the details in the chosen article were not complete?
  3. The compatibility of your documentation and findings with your R code (ensure readability and make sure to have enough comments)
  4. Completeness of the submission: Quarto document (qmd, bib), your rendered HTML file, and data in a compressed format (zip, gz, 7z, rar are all acceptable)
  5. Cohesiveness and conciseness of your entire individual paper

You get 0 if you do not submit on time. If there are no citations and there are indications of plagiarism, you also automatically get 0 and you will be reported to the administration. Starting from a maximum total integer score of 10, every element that is:

  1. lacking will lead to a deduction of 1
  2. moderately lacking will lead to a deduction of 2
  3. extremely lacking will lead to a deduction of 3

Optional part of third milestone

If you have a similar writeup, covering aspects of the graded items listed earlier, for your chosen extension, feel free to make it part of your first draft. But you are NOT graded on this aspect. Therefore, it is optional to include aspects related to your chosen extension for now.

Second milestone

The second milestone is a brief writeup about the article you have chosen and the planned extensions. You are expected to hand in a Quarto document containing answers to two sets of questions listed below. These answers have to be written as prose rather than an enumeration. Furthermore, you will now be citing sources in your Quarto document. This Quarto document will now be the main document upon which your final report would be built.

Here is another template which will allow for references.

Guide questions about the article you have chosen

  1. What do you think were the questions being asked by the author of the article?
  2. Why is it important to find out the answers to these questions? Why should anyone care?
  3. What is/are the data source(s) used to answer the questions?
  4. What did the author choose to present in order to provide answers to the main question?
  5. Are there any details of measurement you want to take note of?
  6. How is the effect the authors care about estimated? Do the available data work for this purpose?
  7. Do you think the results are believable?

Guide questions about the planned extension

You may have to cite sources when you answer these questions.

  1. What kind of extension are you planning to do?
  2. What is the driving motivation behind your extension? What is that you want to know?
  3. Why is your extension different? What have you done that previous authors did not do? Why should anyone care about your extension?
  4. What is/are the data source(s) used for the extension? Are there measurement issues you want to take note of?

Grading

  1. You get 0 if you do not submit on time.
  2. If there are no citations and there are indications of plagiarism, you also automatically get 0 and you will be reported to the administration.
  3. There are three elements that matter in this milestone: level of understanding with respect to your chosen article, the suitability and maturity of the proposed extension(s), and the overall writing. If we start from a maximum score of 3, then every element that is extremely lacking will lead to a deduction of 1.
  4. Feedback will be provided so that you can incorporate them in your next milestone.