14th April 2023
Updated: 6th August 2023
Note: In this blog post, I do not deal with developing research ideas using randomized controlled trials (RCTs) or lab experiments or using primary data. I restrict myself to using quasi-experimental methods and secondary/observational data only.
Embarking on a PhD in Economics can be an exhilarating yet challenging journey. As a senior year PhD student, I've spent a significant amount of time refining my approach to developing viable research ideas in applied micro development economics using quasi-experimental methods and secondary data. One of the most important skills a PhD Economics student in applied micro needs to learn is how to reject bad research ideas. The logic is that you must spend the least amount of time thinking about a research idea that is not viable due to any number of reasons. This is because when you are starting out, every research idea seems novel to you. But the truth is that no matter how good a student you are, most of your inital research ideas are going to be crap! This happened with me and happens with everyone else too, simply because mastering the skill takes time and practice. In this blog post, I will share a workhorse method I've developed to help you create impactful causal research questions (or more importantly, reject bad research questions quicker), especially for those who are just starting out with their PhD in Economics.
This method focuses on eight essential components:
(i) selecting the developing country(s) for your research,
(ii) understanding several data sets,
(iii) leveraging exogenous shocks and identification strategies,
(iv) relating shocks to outcome variables using causal inference,
(v) making sure that your research question has not already been answered by existing papers,
(vi) motivating your research question,
(vii) discussing your idea with peers, and
(viii) testing the identifying assumptions and conducting falsification tests.
By following this approach, you will be well on your way to generating compelling research questions that contribute to the field of development economics.
This method is a little (actually, quite) different from the canonical method which calls for first thinking of an interesting research question and then looking for exogenous shocks and data sets that fit that question. That's because in my experience, the canonical method works well for developed economy contexts like the USA, but for developing country contexts, it often leads to disappointment due to data limitations. This is something that wastes time and effort. It is only after a lot of trial and error that I have come up with my method that started showing great results.
Please bear in mind that steps 2 through 6 delineated in the method below need to be worked out simultaneously. And if even one of them doesn't work out, then you must reject your current research idea and move on to the next one.
Following this approach will reduce the amount of time you take to reject a bad research idea, which is one of the most important skills a researcher in applied micro needs to develop. This is because when it comes to research, time is of the essence. Please also note that steps 2 and 3 can take some time, like up to 6 months or even more, before they start showing results. Therefore, patience and persistence are essential throughout this process, which is a general requirement for research anyway.
At this stage, I also wish to clarify that this blog post assumes that you have some knowledge of causal research. Causal inference aims to estimate the causal effect of a treatment (in our case, the exogenous shock) on an outcome variable. Some commonly used causal research methods include difference-in-differences (DID), instrumental variables (IV), and regression discontinuity designs (RDD). Familiarize yourself with these methods to ensure that your research question is grounded in rigorous causal analysis. You can watch videos from this playlist of a short course on Causal Inference and this video on regression discontinuity if you are not familiar with these methods.
Having said that, let's dive right into the method!
Selecting the Developing Country(s) for Your Research
Before diving into the research process, it's essential to be clear about the developing country(s) in the context of which you will be conducting your research. This decision will significantly impact not only the policy implications of your work but also, and more importantly, the data sets and exogenous shocks that you can exploit for your research. Consider factors such as your familiarity with the country, available data sources, and potential research gaps to guide your choice. This is a crucial step without which the steps that follow will be of no use.
Deep Knowledge of Data Sets
The first step in generating viable research ideas is to have a deep understanding of the different secondary data sets available for your chosen context or country. Start by creating a knowledge database that includes information about each data set, such as sample size, variables, data collection methods, and accessibility. This comprehensive database will serve as a valuable resource throughout your research journey, allowing you to make informed decisions about which data sets are best suited for which research question. The best way to get to know about existing data sets is to read the abstracts of several dev econ papers written in the context of your choice, skipping ahead to their 'Data' section, and looking up the data set(s) used by those papers and then finally, adding information related to them in your database. You can also supplement this process by using the new Bing chat, Google Bard, or even good old Google Search.
Read the documentation of these data sets and go over the various variables included in them, and keep doing this repeatedly so that you familiarize yourself with all the data sets. Doing this will help you identify a host of possible outcome variables which will be useful in the steps that follow (it is better to work with outcome variables that are available in existing data sets than fantasizing about outcome variables of your choice and coming up with research questions but then getting disheartened later due to their non-availability).
Another thing to keep in mind is that you want to know about as many different kinds of data sets as possible, ranging from socioeconomic to civic to geologic to geographic and geolocation to judicial to just about anything and everything. This is because you never know what kind of data set could come in handy at what point in time.
As a bonus, if you are conducting research in the context of India, you can use my database on Indian data sets to your benefit. And if your context is a country other than India, maybe you can use it as a possible template for your own database.
Identifying Exogenous Shocks and Developing Identification Strategies
Next, focus on understanding various exogenous shocks, such as government policies or natural occurrences/disasters, that can serve as potential sources of variation in your study. This is essential to get rid of endogeneity to be able to pin down causality as opposed to simple correlation.
To help with this task, consider creating a database of existing research papers that discuss exogenous shocks and the identification strategies used in them, specific to your chosen country(s) or context. Sometimes you can even get to know about such exogenous shocks by watching YouTube videos, listening to podcasts, and watching movies. Thus, keep your eyes and ears open!
The way I go about it is that I maintain folders on my hard drive for different exogenous shocks. Within each of them, I create three different subfolders, namely
(i) About the Shock
(iii) ID Strategy
In the first subfolder, as the name suggests, I store documents that tell one about the shock or policy i.e., the when, the where, and the how. In the second one, I store data sets that could help me identify the locations of the shock (for example, a list of districts where the policy was implemented which could be used in a DID setup, or data on the scores assigned to each district based on which the policy was implemented in some districts which could be used in an RDD setup). This kind of information/data may not always be readily available (you are lucky if it is) because of which you may have to generate such a data set by reading several government documents. I know this is a tedious process, but the returns are huge in the long term. In the third subfolder, I store either research papers that have already used the said exogenous shock or a text document which explains a possible identifiaction strategy that I came up with.
As an illustration, this paper uses the Integrated Child Development Services or ICDS policy (a government policy in India) as an exogenous shock to find its effects on women's health and nutrition using propensity score matching (PSM). As another example, this paper uses the locust attack in northwestern India in early 2020 (a natural occurrence/disaster) to estimate its effect on agricultural outcomes using DID. Both these exogenous schocks (ICDS and the locust attacks) in conjunction with their respective identification strategies could be used as exogenous shocks to assess their effects on a host of other outcome variables.
Once you have a solid grasp of different exogenous shocks and identification strategies, you can begin to develop your own research question. Ensure that your question is framed in terms of the causal effect of the exogenous shock (X) on specific outcome variables (Y).
Relating Exogenous Shocks to Outcome and other Variables using Causal Inference
With your knowledge of data sets and exogenous shocks in hand, you can now begin relating the chosen exogenous shock and its identification strategy to a group of outcome variables using the existing data sets.
Draw upon your databases to find potential links between the shock and the outcome variables, and ensure that the research question you develop is both relevant and innovative within the field. Apart from that, ensure that the variables required for identification are availabile either within the secondary data sets that are available with you or can be generated as a separate data set based off of information from the internet and other sources. Also think about other potential variables that could act as mediating channels between the exogenous shock and your chosen outcome variables. For instance, the effect of a computer-based teaching intervention (exogenous shock) on students' test scores (an outcome) could arise due to students' interest in academics (mechanism or mediating channel).
Keep in mind that working with the outcome variables you really care about may not always be possible because of data limitations specifically in the context of developing countries. Hence, be broad-minded about outcome variables outside of your liking too.
As soon as you think your research idea is taking shape at this point, immediately note it down somewhere with details such as the main research question, identification strategy(s), robustness checks, falsification tests (it is imperative to design them at this stage), mechanisms, and potential data sets to be used. Maintain this document, keep incorporating new ideas into it, and keep refining those ideas over time.
You need to make sure that the question you just came up with is not something that has already been researched by other authors in the context (or sub-context) that is the same as yours. If it has and you still embark on answering that question, then it amounts to plagiarism which is a highly unethical practice in academia. It is better to give up the research idea at this point than pursuing it further.
However, there are some nuances. For instance, if the question has been answered in the context of your choice, let's say, India, at the state government level, but you wish to assess the question at the village council level, then it may not be considered plagiarism since there are several reasons to believe that there are huge institutional differences between the two subcontexts. That is why it is important to choose a context and gain mastery over its institutional setup. With respect to ascertaining if you are committing plagiarism by still pursuing the project, it is best to take advice from senior PhD students and professors working in the context of your choice.
Consider using services like Google Scholar, www.ideas.repec.org, the new AI-powered Bing chat, and others to help streamline your literature review process.
Motivating Your Research Question
Finally, you must provide a strong motivation for your research question. This step involves conducting a thorough literature review to identify gaps in existing knowledge and demonstrate the importance of your research. Consider using AI services like www.elicit.org and others to help streamline the literature review process.
Your motivation should answer two crucial questions: Why should we care about your research question, and what does society gain from answering it? A well-motivated research question will not only contribute to the academic field but also have real-world implications, helping to shape policy and improve lives.
Discussing Your Idea with Peers
Once you are sure that all the boxes above have been checked, it is time for the first litmus test. Identify senior PhD students and development economics professors in your network whom you trust wouldn't steal your idea and discuss it with them.
Try to gauge the level of their excitement when you give them a brief of your research question along with its viability in terms of the identification strategy and data availability. If you can get most of them excited enough, then you know that you have finally done it. This is because senior PhD students and professors generally have a much borader idea of the literature in development economics than junior PhD students do and hence, know what the returns would be to a truly novel and viable idea in the subfield.
Since you want to maximize your returns from investing time and effort in the idea, it becomes almost imperative to get peer feedback. You could try doing this at the weekly brownbag student seminars that might be taking place in your department. If you aren't aware of anything of that sort, try asking senior PhD students about where they get feedback from for their ideas and build your own network from there.
Identifying Assumptions and Falsification Tests
Once you have checked everything above, the last thing to do is to carry out some initial analysis on your data. This is the final litmus test for your idea to work. Get your data cleaned up, glance over the summary statistics to get an idea of the setup and see if all looks good, and then without thinking too much, test whether the identifying assumptions of your chosen causal inference method are satisfied. For example, if you are using DID, then you need to test the parallel trends assumption, and if you are using RDD, then you need to satisfy the McCrary (2008) test. One last thing to do is to conduct the falsification tests that you would have designed in step 4. It is useful to conduct them before running the main regressions on your outcome variables because if you first run the latter and then the falsification tests don't work out, then you would have wasted more time in rejecting the research idea than you should have.
Satisfying both the identifyiing assumptions and the falsification tests is necessary before delving into the meat of your research project. If either of these conditions are not met then reject the research idea and move on to the next one. However, consult your advisors before completely chucking the idea because
(i) you would have already invested a small amount of time and energy into it and because of that you don't want to waste the opportunity to revive the project if it can be saved, and
(ii) sometimes there are ways around the identifying assumptions and falsification tests not working out which your advisors may be more aware of than you are.
It is only after you pass this stage is your idea ready to finally take off.
Developing viable research ideas in applied micro development economics using quasi-experimental methods and secondary data may seem daunting, but by focusing on these seven key components, you can create a workhorse method that guides you through the process. Remember to select your developing country(s) of interest, invest time in understanding data sets and exogenous shocks, relate these shocks to outcome variables using causal inference, make sure your question is novel, motivate your research question, obtain peer feedback, and test the identifying assumptions of your chosen method and conduct falsification tests. Keep in mind that creating the necessary databases can take time and patience, but with persistence, you will be well-equipped to contribute to the field of development economics and make a lasting impact with your research.