Impact evaluation: Difference between revisions
imported>Nick Bagnall |
imported>Nick Bagnall |
||
Line 4: | Line 4: | ||
There are multiple methods for conducting rigorous impact evaluations, yet all necessarily rely on simulating the counterfactual—in other words, estimating what would have happened to the scrutinized group in the absence of the intervention. [[Counterfactual analysis]] thus requires a ‘control’ group—people unaffected by the policy intervention—to compare to the program’s beneficiaries, who comprise the ‘treatment’ group of a [[Sample_(statistics)/Definition|population sample]]. The ability to draw causal inferences from the impact evaluation crucially depends on the two groups being statistically identical, meaning there are no systematic differences between them. Systematic differences are best minimized through [[Simple_random_sampling|random assignment]]: a law of [[statistics]] guarantees that large enough sample sizes of people randomly assigned will generate statistically identical comparison groups. Thus, the control group mimics the counterfactual, and any differences that arise between the two groups after the program is implemented may be reliably attributed to the program. | There are multiple methods for conducting rigorous impact evaluations, yet all necessarily rely on simulating the counterfactual—in other words, estimating what would have happened to the scrutinized group in the absence of the intervention. [[Counterfactual analysis]] thus requires a ‘control’ group—people unaffected by the policy intervention—to compare to the program’s beneficiaries, who comprise the ‘treatment’ group of a [[Sample_(statistics)/Definition|population sample]]. The ability to draw causal inferences from the impact evaluation crucially depends on the two groups being statistically identical, meaning there are no systematic differences between them. Systematic differences are best minimized through [[Simple_random_sampling|random assignment]]: a law of [[statistics]] guarantees that large enough sample sizes of people randomly assigned will generate statistically identical comparison groups. Thus, the control group mimics the counterfactual, and any differences that arise between the two groups after the program is implemented may be reliably attributed to the program. | ||
While random assignment helps maximize the impact evaluation’s internal validity (the ability to generalize the study’s results to the population the sample was drawn from) there remain inherent limitations to the impact evaluation’s external validity (the ability to generalize the study’s results to other populations). Testing policy programs in multiple disparate contexts is a great way to determine whether their results are generally replicable and thus worth “scaling up.” Knowledge of different contexts can help determine whether the results can indeed be replicated there. For example, consider a policy intervention designed to increase school enrollment by informing parents about the positive correlation between additional schooling and increased wages: If, in a given school system, parents are inclined to | While random assignment helps maximize the impact evaluation’s internal validity (the ability to generalize the study’s results to the population the sample was drawn from) there remain inherent limitations to the impact evaluation’s external validity (the ability to generalize the study’s results to other populations). Testing policy programs in multiple disparate contexts is a great way to determine whether their results are generally replicable and thus worth “scaling up.” Knowledge of different contexts can help determine whether the results can indeed be replicated there. For example, consider a policy intervention designed to increase school enrollment by informing parents about the positive correlation between additional schooling and increased wages: If, in a given school system, parents are inclined to underestimate the effects of additional schooling on wages, they will more likely be influenced by the information than parents in another school system who generally overestimate the effects of additional schooling on wages. |
Revision as of 08:21, 13 February 2011
An impact evaluation is a study designed to estimate the effects that can be attributed to a policy program or intervention. Impact evaluation is a useful tool for measuring a program’s effectiveness because it doesn’t merely examine whether the program’s goals were met; it determines whether those goals would have been met in the absence of the program. Reliably quantifying impact is in particular necessary to increase the effectiveness of aid delivery and public spending in improving living standards among people in developing nations.
How is a policy’s impact measured?
There are multiple methods for conducting rigorous impact evaluations, yet all necessarily rely on simulating the counterfactual—in other words, estimating what would have happened to the scrutinized group in the absence of the intervention. Counterfactual analysis thus requires a ‘control’ group—people unaffected by the policy intervention—to compare to the program’s beneficiaries, who comprise the ‘treatment’ group of a population sample. The ability to draw causal inferences from the impact evaluation crucially depends on the two groups being statistically identical, meaning there are no systematic differences between them. Systematic differences are best minimized through random assignment: a law of statistics guarantees that large enough sample sizes of people randomly assigned will generate statistically identical comparison groups. Thus, the control group mimics the counterfactual, and any differences that arise between the two groups after the program is implemented may be reliably attributed to the program.
While random assignment helps maximize the impact evaluation’s internal validity (the ability to generalize the study’s results to the population the sample was drawn from) there remain inherent limitations to the impact evaluation’s external validity (the ability to generalize the study’s results to other populations). Testing policy programs in multiple disparate contexts is a great way to determine whether their results are generally replicable and thus worth “scaling up.” Knowledge of different contexts can help determine whether the results can indeed be replicated there. For example, consider a policy intervention designed to increase school enrollment by informing parents about the positive correlation between additional schooling and increased wages: If, in a given school system, parents are inclined to underestimate the effects of additional schooling on wages, they will more likely be influenced by the information than parents in another school system who generally overestimate the effects of additional schooling on wages.