If we could observe, for each individual, <math>y_{1}(i)</math> and <math>y_{0}(i)</math> among a large representative sample of the population, we could estimate the ATE simply by taking the average value of <math>y_{1}(i)-y_{0}(i)</math> across the sample. However, we can not observe both <math>y_{1}(i)</math> and <math>y_{0}(i)</math> for each individual since an individual cannot be both treated and not treated. For example, in the drug example, we can only observe <math>y_{1}(i)</math> for individuals who have received the drug and <math>y_{0}(i)</math> for those who did not receive it. This is the main problem faced by scientists in the evaluation of treatment effects and has triggered a large body of estimation techniques. | If we could observe, for each individual, <math>y_{1}(i)</math> and <math>y_{0}(i)</math> among a large representative sample of the population, we could estimate the ATE simply by taking the average value of <math>y_{1}(i)-y_{0}(i)</math> across the sample. However, we can not observe both <math>y_{1}(i)</math> and <math>y_{0}(i)</math> for each individual since an individual cannot be both treated and not treated. For example, in the drug example, we can only observe <math>y_{1}(i)</math> for individuals who have received the drug and <math>y_{0}(i)</math> for those who did not receive it. This is the main problem faced by scientists in the evaluation of treatment effects and has triggered a large body of estimation techniques. |