第11行: |
第11行: |
| This idea is part of the [[Rubin Causal Model|Rubin Causal Inference Model]], developed by [[Donald Rubin]] in collaboration with [[Paul R. Rosenbaum|Paul Rosenbaum]] in the early 1970s. The exact definition differs between their articles in that period. In one of Rubins articles from 1978 Rubin discuss ''ignorable assignment mechanisms'',<ref name="rubin78">{{cite journal |last1=Rubin |first1=Donald |title=Bayesian Inference for Causal Effects: The Role of Randomization |journal=The Annals of Statistics |date=1978 |volume=6 |issue=1 |pages=34–58|doi=10.1214/aos/1176344064 |doi-access=free }}</ref> which can be understood as the way individuals are assigned to treatment groups is irrelevant for the data analysis, given everything that is recorded about that individual. Later, in 1983 <ref>{{cite journal |last1=Rubin |first1=Donald B. |last2=Rosenbaum |first2=Paul R. |title=The Central Role of the Propensity Score in Observational Studies for Causal Effects |journal=Biometrika |date=1983 |volume=70 |issue=1 |pages=41–55 |doi=10.2307/2335942 |jstor=2335942 |doi-access=free }}</ref> Rubin and Rosenbaum rather define ''strongly ignorable treatment assignment'' which is a stronger condition, mathematically formulated as <math>(r_1,r_0) \perp \!\!\!\perp z \mid v ,\quad 0<\operatorname{pr}(z=1)<1 \quad \forall v</math>, where <math>r_t</math> is a potential outcome given treatment <math>t</math>, <math>v</math> is some covariates and <math>z</math> is the actual treatment. | | This idea is part of the [[Rubin Causal Model|Rubin Causal Inference Model]], developed by [[Donald Rubin]] in collaboration with [[Paul R. Rosenbaum|Paul Rosenbaum]] in the early 1970s. The exact definition differs between their articles in that period. In one of Rubins articles from 1978 Rubin discuss ''ignorable assignment mechanisms'',<ref name="rubin78">{{cite journal |last1=Rubin |first1=Donald |title=Bayesian Inference for Causal Effects: The Role of Randomization |journal=The Annals of Statistics |date=1978 |volume=6 |issue=1 |pages=34–58|doi=10.1214/aos/1176344064 |doi-access=free }}</ref> which can be understood as the way individuals are assigned to treatment groups is irrelevant for the data analysis, given everything that is recorded about that individual. Later, in 1983 <ref>{{cite journal |last1=Rubin |first1=Donald B. |last2=Rosenbaum |first2=Paul R. |title=The Central Role of the Propensity Score in Observational Studies for Causal Effects |journal=Biometrika |date=1983 |volume=70 |issue=1 |pages=41–55 |doi=10.2307/2335942 |jstor=2335942 |doi-access=free }}</ref> Rubin and Rosenbaum rather define ''strongly ignorable treatment assignment'' which is a stronger condition, mathematically formulated as <math>(r_1,r_0) \perp \!\!\!\perp z \mid v ,\quad 0<\operatorname{pr}(z=1)<1 \quad \forall v</math>, where <math>r_t</math> is a potential outcome given treatment <math>t</math>, <math>v</math> is some covariates and <math>z</math> is the actual treatment. |
| | | |
− | 这个想法是20世纪70年代早期[[Donald Rubin]]和[[Paul R. Rosenbaum|Paul Rosenbaum]] 合作提出的[[鲁宾因果推理模型]]的一部分。但那时,他们文章中可忽略性的确切定义不同。1978年鲁宾在一篇文章中讨论了''可忽略的分配机制'',<ref name="rubin78">{{cite journal |last1=Rubin |first1=Donald |title=Bayesian Inference for Causal Effects: The Role of Randomization |journal=The Annals of Statistics |date=1978 |volume=6 |issue=1 |pages=34–58|doi=10.1214/aos/1176344064 |doi-access=free }}</ref> 其可理解为将个体分配到处理组的方式与数据分析无关,因为已经记录了有关该个体的所有信息。后来,在 1983 年,Rubin 和 Rosenbaum 更确切地定义了“处理分配的强可忽略性” <ref>{{cite journal |last1=Rubin |first1=Donald B. |last2=Rosenbaum |first2=Paul R. |title=The Central Role of the Propensity Score in Observational Studies for Causal Effects |journal=Biometrika |date=1983 |volume=70 |issue=1 |pages=41–55 |doi=10.2307/2335942 |jstor=2335942 |doi-access=free }}</ref>,这是一个更强的假设条件,数学公式为<math>(r_1,r_0) \perp \!\!\!\perp z \mid v ,\quad 0<\operatorname{pr}(z=1)<1 \quad \forall v</math>,其中<math>r_t</math>是给定处理状态 <math>t</math>下的潜在结果,<math>v</math> 是协变量,<math>z</math> 是实际的处理状态。 | + | 这个想法是20世纪70年代早期[[Donald Rubin]]和[[Paul R. Rosenbaum|Paul Rosenbaum]] 合作提出的[[鲁宾因果推理模型]]的一部分。但那时,他们文章中可忽略性的确切定义不同。1978年鲁宾在一篇文章中讨论了''可忽略的分配机制''<ref name="rubin78">{{cite journal |last1=Rubin |first1=Donald |title=Bayesian Inference for Causal Effects: The Role of Randomization |journal=The Annals of Statistics |date=1978 |volume=6 |issue=1 |pages=34–58|doi=10.1214/aos/1176344064 |doi-access=free }}</ref> ,其可理解为将个体分配到处理组的方式与数据分析无关,因为已经记录了有关该个体的所有信息。后来,在 1983 年,Rubin 和 Rosenbaum 更确切地定义了“处理分配的强可忽略性”<ref>{{cite journal |last1=Rubin |first1=Donald B. |last2=Rosenbaum |first2=Paul R. |title=The Central Role of the Propensity Score in Observational Studies for Causal Effects |journal=Biometrika |date=1983 |volume=70 |issue=1 |pages=41–55 |doi=10.2307/2335942 |jstor=2335942 |doi-access=free }}</ref>,这是一个更强的假设条件,数学公式为<math>(r_1,r_0) \perp \!\!\!\perp z \mid v ,\quad 0<\operatorname{pr}(z=1)<1 \quad \forall v</math>,其中<math>r_t</math>是给定处理状态 <math>t</math>下的潜在结果,<math>v</math> 是协变量,<math>z</math> 是实际的处理状态。 |
| | | |
| | | |
| Pearl [2000] devised a simple graphical criterion, called ''back-door'', that entails ignorability and identifies sets of covariates that achieve this condition. | | Pearl [2000] devised a simple graphical criterion, called ''back-door'', that entails ignorability and identifies sets of covariates that achieve this condition. |
| | | |
− | Pearl [2000]设计了一个简单的图形标准,称为“后门”(back-door) ,它需要可忽略性并确定达到这种条件的协变量集。 | + | Pearl [2000]设计了一个简单的图形准则,称为“后门”(back-door) ,它需要可忽略性并确定达到这种条件的协变量集。 |
| | | |
| | | |
| Ignorability (better called exogeneity) simply means we can ignore how one ended up in one vs. the other group (‘treated’ Tx = 1, or ‘control’ Tx = 0) when it comes to the potential outcome (say Y). It was also called unconfoundedness, selection on the observables, or no omitted variable bias.<ref>{{cite journal|last1=Yamamoto|first1=Teppei|title=Understanding the Past: Statistical Analysis of Causal Attribution|journal=Journal of Political Science|date=2012|volume=56|issue=1|pages=237–256|doi=10.1111/j.1540-5907.2011.00539.x|hdl=1721.1/85887}}</ref> | | Ignorability (better called exogeneity) simply means we can ignore how one ended up in one vs. the other group (‘treated’ Tx = 1, or ‘control’ Tx = 0) when it comes to the potential outcome (say Y). It was also called unconfoundedness, selection on the observables, or no omitted variable bias.<ref>{{cite journal|last1=Yamamoto|first1=Teppei|title=Understanding the Past: Statistical Analysis of Causal Attribution|journal=Journal of Political Science|date=2012|volume=56|issue=1|pages=237–256|doi=10.1111/j.1540-5907.2011.00539.x|hdl=1721.1/85887}}</ref> |
| | | |
− | 可忽略性(称为外生性更好)其简明含义是,当涉及到潜在结果(Y)时,一个人是怎样最终处于一个群体中而非另一个群体中(“处理组”Tx = 1,或“控制组”Tx = 0)我们是可忽略的。它也被称为非混淆性,基于可观测变量的选择选择的可观察的,或无遗漏变量偏差<ref>{{cite journal|last1=Yamamoto|first1=Teppei|title=Understanding the Past: Statistical Analysis of Causal Attribution|journal=Journal of Political Science|date=2012|volume=56|issue=1|pages=237–256|doi=10.1111/j.1540-5907.2011.00539.x|hdl=1721.1/85887}}</ref>。 | + | 可忽略性(称为外生性更好)其简明含义是,当涉及到潜在结果(Y)时,一个人是怎样最终处于一个群体中而非另一个群体中(“处理组”Tx = 1,或“控制组”Tx = 0)我们是可忽略的。它也被称为非混淆性、基于可观测变量的选择或无遗漏变量偏差<ref>{{cite journal|last1=Yamamoto|first1=Teppei|title=Understanding the Past: Statistical Analysis of Causal Attribution|journal=Journal of Political Science|date=2012|volume=56|issue=1|pages=237–256|doi=10.1111/j.1540-5907.2011.00539.x|hdl=1721.1/85887}}</ref>。 |
| | | |
| | | |
| Formally it has been written as [Y<sub>i</sub>1, Y<sub>i</sub>0] ⊥ Tx<sub>i</sub>, or in words the potential Y outcome of person ''i'' had they been treated or not does not depend on whether they have really been (observable) treated or not. We can ignore in other words how people ended up in one vs. the other condition, and treat their potential outcomes as exchangeable. While this seems thick, it becomes clear if we add subscripts for the ‘realized’ and superscripts for the ‘ideal’ (potential) worlds (notation suggested by [https://www.cambridge.org/core/books/statistical-models-and-causal-inference/7CE8D4957FF6E9615AAAC4128FA8246E David Freedman]; a visual can help here: [https://drive.google.com/open?id=1nLHHH0il225LIy33nRiH3ZfgoX1_-_V9 potential outcomes simplified]). | | Formally it has been written as [Y<sub>i</sub>1, Y<sub>i</sub>0] ⊥ Tx<sub>i</sub>, or in words the potential Y outcome of person ''i'' had they been treated or not does not depend on whether they have really been (observable) treated or not. We can ignore in other words how people ended up in one vs. the other condition, and treat their potential outcomes as exchangeable. While this seems thick, it becomes clear if we add subscripts for the ‘realized’ and superscripts for the ‘ideal’ (potential) worlds (notation suggested by [https://www.cambridge.org/core/books/statistical-models-and-causal-inference/7CE8D4957FF6E9615AAAC4128FA8246E David Freedman]; a visual can help here: [https://drive.google.com/open?id=1nLHHH0il225LIy33nRiH3ZfgoX1_-_V9 potential outcomes simplified]). |
| | | |
− | 其数学形式可记为[Y<sub>i</sub>1, Y<sub>i</sub>0] ⊥ Tx<sub>i</sub> ,或者用文字表述为,个体“i”是否接受处理的潜在结果Y并不取决于他们是否真的(可观测到的)接受处理。换句话说,个体最终是通过什么方式处于一种与另一种处理状态我们是可忽略的,并将其潜在结果视为等价可交换的。 虽然这看起来很复杂,但如果用下标表示“已实现”的真实处理状态,用上标表示“理想”(潜在)世界的处理状态,就会变得很清楚。(符号的提出可参考[https://www.cambridge.org/core/books/statistical-models-and-causal-inference/7CE8D4957FF6E9615AAAC4128FA8246E David Freedman]; 可视化帮助可参考:[https://drive.google.com/open?id=1nLHHH0il225LIy33nRiH3ZfgoX1_-_V9 potential outcomes simplified]).
| + | 其数学形式可记为:[Y<sub>i</sub>1, Y<sub>i</sub>0] ⊥ Tx<sub>i</sub> ;或者用文字表述为:个体“i”是否接受处理的潜在结果Y并不取决于他们是否真的(可观测到的)接受处理。换句话说,个体最终是通过什么方式处于一种与另一种处理状态我们是可忽略的,并将其潜在结果视为等价可交换的。 虽然这看起来很复杂,但如果用下标表示“已实现”的真实处理状态,用上标表示“理想”(潜在)世界的处理状态,就会变得很清楚。(符号的提出可参考[https://www.cambridge.org/core/books/statistical-models-and-causal-inference/7CE8D4957FF6E9615AAAC4128FA8246E David Freedman]; 可视化帮助文档可参考:[https://drive.google.com/open?id=1nLHHH0il225LIy33nRiH3ZfgoX1_-_V9 potential outcomes simplified]). |
| | | |
| So: Y<sub>1</sub><sup>1</sup>/*Y<sub>0</sub><sup>1</sup> are potential Y outcomes had the person been treated (superscript <sup>1</sup>), when in reality they have actually been (Y<sub>1</sub><sup>1</sup>, subscript <sub>1</sub>), or not (*Y<sub>0</sub><sup>1</sup>: the * signals this quantity can never be realized or observed, or is ''fully'' contrary-to-fact or counterfactual, CF). | | So: Y<sub>1</sub><sup>1</sup>/*Y<sub>0</sub><sup>1</sup> are potential Y outcomes had the person been treated (superscript <sup>1</sup>), when in reality they have actually been (Y<sub>1</sub><sup>1</sup>, subscript <sub>1</sub>), or not (*Y<sub>0</sub><sup>1</sup>: the * signals this quantity can never be realized or observed, or is ''fully'' contrary-to-fact or counterfactual, CF). |
| | | |
− | 所以:如果个体被处理(上角标为 <sup>1</sup>) ,其对应的潜在结果Y为Y<sub>1</sub><sup>1</sup>/*Y<sub>0</sub><sup>1</sup>,实际上它们可观测的结果是(Y<sub>1</sub><sup>1</sup>, 下角标也为 <sub>1</sub>) ,而不是*Y<sub>0</sub><sup>1</sup>。注意:* 表示这个值是无法实现或不可观测的,即''完全与事实相反''或称为反事实(counterfactual, CF)。
| + | 所以,如果个体被处理(上角标为 <sup>1</sup>),其对应的潜在结果Y为Y<sub>1</sub><sup>1</sup>/*Y<sub>0</sub><sup>1</sup>,实际上它们可观测的结果是(Y<sub>1</sub><sup>1</sup>, 下角标也为 <sub>1</sub>) ,而不是*Y<sub>0</sub><sup>1</sup>。注意:* 表示这个值是无法获取或不可观测的,即''完全与事实相反''或称为反事实(counterfactual, CF)。 |
| | | |
| | | |
第40行: |
第40行: |
| Only one of each potential outcome (PO) can be realized, the other cannot, for the same assignment to condition, so when we try to estimate treatment effects, we need something to replace the fully contrary-to-fact ones with observables (or estimate them). When ignorability/exogeneity holds, like when people are randomized to be treated or not, we can ‘replace’ *''Y''<sub>0</sub><sup>1</sup> with its observable counterpart Y<sub>1</sub><sup>1</sup>, and *Y<sub>1</sub><sup>0</sup> with its observable counterpart ''Y''<sub>0</sub><sup>0</sup>, not at the individual level Y<sub>i</sub>’s, but when it comes to averages like E[''Y''<sub>''i''</sub><sup>1</sup> – ''Y''<sub>''i''</sub><sup>0</sup>], which is exactly the causal treatment effect (TE) one tries to recover. | | Only one of each potential outcome (PO) can be realized, the other cannot, for the same assignment to condition, so when we try to estimate treatment effects, we need something to replace the fully contrary-to-fact ones with observables (or estimate them). When ignorability/exogeneity holds, like when people are randomized to be treated or not, we can ‘replace’ *''Y''<sub>0</sub><sup>1</sup> with its observable counterpart Y<sub>1</sub><sup>1</sup>, and *Y<sub>1</sub><sup>0</sup> with its observable counterpart ''Y''<sub>0</sub><sup>0</sup>, not at the individual level Y<sub>i</sub>’s, but when it comes to averages like E[''Y''<sub>''i''</sub><sup>1</sup> – ''Y''<sub>''i''</sub><sup>0</sup>], which is exactly the causal treatment effect (TE) one tries to recover. |
| | | |
− | | + | 对于相同的处理分配条件,每个潜在结果(PO)中只有一个是实际发生可观测的,而另一个不会发生也无法观测,所以当我们尝试估计处理效应时,需要用可观测值(或估计值)来替代无法观测的反事实结果。当可忽略性/外生性成立时,例如个体是否接受处理是随机的,此时可利用已观测的 Y<sub>1</sub><sup>1</sup>'替换'*''Y''<sub>0</sub><sup>1</sup>,利用已观测的 Y<sub>0</sub><sup>0</sup>'替换'*''Y''<sub>1</sub><sup>0</sup>,不是个人层面的Y<sub>i</sub>,而是从平均角度出发,如 E[''Y''<sub>''i''</sub><sup>1</sup> – ''Y''<sub>''i''</sub><sup>0 </sup>],这正是人们尝试获取的因果处理效应(TE)。 |
− | 对于相同的处理分配条件,每个潜在结果(PO)中只有一个是实际发生可观测的,而另一个不会发生也无法观测,所以当我们尝试估计处理效应时,需要用可观测值(或估计值)来替代无法观测的反事实结果。当可忽略性/外生性成立时,例如个体是否接受处理是随机的,此时可利用已观测的 Y<sub>1</sub><sup>1</sup>'替换'*''Y''<sub>0</sub><sup>1</sup>,利用已观测的 Y<sub>0</sub><sup>0</sup>'替换'*''Y''<sub>1</sub><sup>0</sup>,不是个人层面的Y<sub>i</sub>,而是从平均角度出发,如 E[''Y''<sub>''i''</sub><sup>1</sup> – ''Y''<sub>''i''</sub><sup>0 </sup>],这正是人们试图计算的因果处理效应(TE)。 | |
− | | |
| | | |
| | | |
第51行: |
第49行: |
| | | |
| Now, by simply adding and subtracting the same fully counterfactual quantity *Y<sub>1</sub><sup>0</sup> we get: | | Now, by simply adding and subtracting the same fully counterfactual quantity *Y<sub>1</sub><sup>0</sup> we get: |
− |
| |
| E[Y<sub>i1</sub><sup>1</sup> – Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup> +*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup>] + E[*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = ATT + {Selection Bias}, | | E[Y<sub>i1</sub><sup>1</sup> – Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup> +*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup>] + E[*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = ATT + {Selection Bias}, |
− |
| |
| where ATT = average treatment effect on the treated <ref>{{cite journal|last1=Imai|first1=Kosuke|title=Misunderstandings between experimentalists and observationalists about causal inference|journal=Journal of the Royal Statistical Society, Series A (Statistics in Society)|date=2006|volume=171|issue=2|pages=481–502|doi=10.1111/j.1467-985X.2007.00527.x|url=http://nrs.harvard.edu/urn-3:HUL.InstRepos:4142695}}</ref> and the second term is the bias introduced when people have the choice to belong to either the ‘treated’ or the ‘control’ group. | | where ATT = average treatment effect on the treated <ref>{{cite journal|last1=Imai|first1=Kosuke|title=Misunderstandings between experimentalists and observationalists about causal inference|journal=Journal of the Royal Statistical Society, Series A (Statistics in Society)|date=2006|volume=171|issue=2|pages=481–502|doi=10.1111/j.1467-985X.2007.00527.x|url=http://nrs.harvard.edu/urn-3:HUL.InstRepos:4142695}}</ref> and the second term is the bias introduced when people have the choice to belong to either the ‘treated’ or the ‘control’ group. |
| | | |
| 现在,我们通过简单的加减相同的完全反事实量 *Y<sub>1</sub><sup>0</sup> 得到: | | 现在,我们通过简单的加减相同的完全反事实量 *Y<sub>1</sub><sup>0</sup> 得到: |
| E[Y<sub>i1</sub><sup>1</sup> – Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup> +*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup>] + E[*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = ATT + {选择性偏差}, | | E[Y<sub>i1</sub><sup>1</sup> – Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup> +*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = E[Y<sub>i1</sub><sup>1</sup> –*Y<sub>1</sub><sup>0</sup>] + E[*Y<sub>1</sub><sup>0</sup> - Y<sub>i0</sub><sup>0</sup>] = ATT + {选择性偏差}, |
− | 其中。第一项 ATT = 处理组的平均处理效应<ref>{{cite journal|last1=Imai|first1=Kosuke|title=Misunderstandings between experimentalists and observationalists about causal inference|journal=Journal of the Royal Statistical Society, Series A (Statistics in Society)|date=2006|volume=171|issue=2|pages=481–502|doi=10.1111/j.1467-985X.2007.00527.x|url=http://nrs.harvard.edu/urn-3:HUL.InstRepos:4142695}}</ref>,第二项是当个体可选择属于“处理”组或“控制”组而非完全随机分配时引入的偏差。
| + | 其中,第一项 ATT = 处理组的平均处理效应<ref>{{cite journal|last1=Imai|first1=Kosuke|title=Misunderstandings between experimentalists and observationalists about causal inference|journal=Journal of the Royal Statistical Society, Series A (Statistics in Society)|date=2006|volume=171|issue=2|pages=481–502|doi=10.1111/j.1467-985X.2007.00527.x|url=http://nrs.harvard.edu/urn-3:HUL.InstRepos:4142695}}</ref>,第二项是当个体可选择属于“处理”组或“控制”组而非完全随机分配时引入的偏差。 |
| | | |
| | | |
| Ignorability, either plain or conditional on some other variables, implies that such selection bias can be ignored, so one can recover (or estimate) the causal effect. | | Ignorability, either plain or conditional on some other variables, implies that such selection bias can be ignored, so one can recover (or estimate) the causal effect. |
| | | |
− | 无论是普通的还是条件性的可忽略性,都意味着这种选择偏差可以被忽略,因此人们可以得到(或估计)因果效应。
| + | 无论是普通的还是条件性的可忽略性,都意味着这种选择偏差可以被忽略或消除,因此人们可以得到(或估计)因果效应。 |
| | | |
| | | |