第13行: |
第13行: |
| | | |
| 当研究人员不能进行控制实验,但有观测数据进行建模时,逆概率加权估计量可用于证明因果关系。因为假设治疗不是随机分配的,如果总体中的所有受试者被分配了任何一种治疗,则目标是估计反事实或潜在结果。 | | 当研究人员不能进行控制实验,但有观测数据进行建模时,逆概率加权估计量可用于证明因果关系。因为假设治疗不是随机分配的,如果总体中的所有受试者被分配了任何一种治疗,则目标是估计反事实或潜在结果。 |
− |
| |
− | Suppose observed data are <math>\{\bigl(X_i,A_i,Y_i\bigr)\}^{n}_{i=1}</math> drawn [[Independent and identically distributed random variables|i.i.d ()]] from unknown distribution P, where
| |
− | * <math>X \in \mathbb{R}^{p}</math> covariates
| |
− | * <math>A \in \{0, 1\}</math> are the two possible treatments.
| |
− | * <math>Y \in \mathbb{R}</math> response
| |
− | * We do not assume treatment is randomly assigned.
| |
− | The goal is to estimate the potential outcome, <math>Y^{*}\bigl(a\bigr)</math>, that would be observed if the subject were assigned treatment <math>a</math>. Then compare the mean outcome if all patients in the population were assigned either treatment: <math>\mu_{a} = \mathbb{E}Y^{*}(a)</math>. We want to estimate <math>\mu_a</math> using observed data <math>\{\bigl(X_i,A_i,Y_i\bigr)\}^{n}_{i=1}</math>.
| |
− |
| |
− | Suppose observed data are \{\bigl(X_i,A_i,Y_i\bigr)\}^{n}_{i=1} drawn i.i.d (independent and identically distributed) from unknown distribution P, where
| |
− | * X \in \mathbb{R}^{p} covariates
| |
− | * A \in \{0, 1\} are the two possible treatments.
| |
− | * Y \in \mathbb{R} response
| |
− | * We do not assume treatment is randomly assigned.
| |
− | The goal is to estimate the potential outcome, Y^{*}\bigl(a\bigr), that would be observed if the subject were assigned treatment a. Then compare the mean outcome if all patients in the population were assigned either treatment: \mu_{a} = \mathbb{E}Y^{*}(a). We want to estimate \mu_a using observed data \{\bigl(X_i,A_i,Y_i\bigr)\}^{n}_{i=1}.
| |
| | | |
| 假设观测数据是<math>\{\bigl(X_i,A_i,Y_i\bigr)\}^{n}_{i=1}</math>,这些数据是从未知的分布中抽取出来的独立同分布([[Independent and identically distributed random variables|independent and identically distributed, i.i.d]])数据,其中 | | 假设观测数据是<math>\{\bigl(X_i,A_i,Y_i\bigr)\}^{n}_{i=1}</math>,这些数据是从未知的分布中抽取出来的独立同分布([[Independent and identically distributed random variables|independent and identically distributed, i.i.d]])数据,其中 |
第38行: |
第24行: |
| === 估计器公式 === | | === 估计器公式 === |
| <blockquote><math>\hat{\mu}^{IPWE}_{a,n} = \frac{1}{n}\sum^{n}_{i=1}Y_{i} \frac{\mathbf 1_{A_{i}=a}}{\hat{p}_{n}(A_{i}|X_{i})}</math></blockquote> | | <blockquote><math>\hat{\mu}^{IPWE}_{a,n} = \frac{1}{n}\sum^{n}_{i=1}Y_{i} \frac{\mathbf 1_{A_{i}=a}}{\hat{p}_{n}(A_{i}|X_{i})}</math></blockquote> |
− |
| |
− | \hat{\mu}^{IPWE}_{a,n} = \frac{1}{n}\sum^{n}_{i=1}Y_{i} \frac{\mathbf 1_{A_{i}=a}}{\hat{p}_{n}(A_{i}|X_{i})}
| |
| | | |
| ==== 构建 IPWE ==== | | ==== 构建 IPWE ==== |
− | # <math>\mu_{a} = \mathbb{E}\frac{\mathbf{1}_{A=a} Y}{p(A|X)}</math> where <math>p(a|x) = \frac{P(A=a,X=x)}{P(X=x)}</math> | + | # <math>\mu_{a} = \mathbb{E}\frac{\mathbf{1}_{A=a} Y}{p(A|X)}</math> , 其中 <math>p(a|x) = \frac{P(A=a,X=x)}{P(X=x)}</math>; |
− | # construct <math>\hat{p}_{n}(a|x)</math> or <math>p(a|x)</math> using any propensity model (often a logistic regression model) | + | # 使用任何倾向性模型(通常是逻辑回归模型)构造 <math>\hat{p}_{n}(a|x)</math> 或 <math>p(a|x)</math> ; |
− | # <math>\hat{\mu}^{IPWE}_{a,n} = \sum^{n}_{i=1}\frac{Y_{i} 1_{A_{i}=a}}{n\hat{p}_{n}(A_{i}|X_{i})}</math> | + | # <math>\hat{\mu}^{IPWE}_{a,n} = \sum^{n}_{i=1}\frac{Y_{i} 1_{A_{i}=a}}{n\hat{p}_{n}(A_{i}|X_{i})}</math>。 |
− | With the mean of each treatment group computed, a statistical t-test or ANOVA test can be used to judge difference between group means and determine statistical significance of treatment effect.
| + | 在计算出各处理组的平均数后,可以用统计学上的t检验或方差检验(ANOVA test)来判断组间平均数的差异,并确定处理效果的统计显著性。 |
− | | |
− | # \mu_{a} = \mathbb{E}\frac{\mathbf{1}_{A=a} Y}{p(A|X)} where p(a|x) = \frac{P(A=a,X=x)}{P(X=x)}
| |
− | # construct \hat{p}_{n}(a|x) or p(a|x) using any propensity model (often a logistic regression model)
| |
− | # \hat{\mu}^{IPWE}_{a,n} = \sum^{n}_{i=1}\frac{Y_{i} 1_{A_{i}=a}}{n\hat{p}_{n}(A_{i}|X_{i})}
| |
− | With the mean of each treatment group computed, a statistical t-test or ANOVA test can be used to judge difference between group means and determine statistical significance of treatment effect.
| |
− | | |
− | = = = = = = = # mu { a } = mathbb { e } frac { mathbf {1}{ a = a } y }{ p (a | x)}其中 p (a | x) = frac { p (a = a,x = x)}{ p (x = x)}}{ p (x = x)}} # construct hat { p }{ n }(a | x)或 p (a | x)使用任意模型(通常是 Logit模型模型) # 帽子{ mu } ^ { IPWE } _ { a,n } = sum ^ { n } _ { i = 1} frac { y { i }1 _ { a _ { i } = a }{ n hat { p } _ { n }(a _ { i } | x { i })计算每个治疗组的平均值,方差分析和统计 t 检验可以用来判断治疗效果的差异,并确定治疗效果的统计显著性。
| |
| | | |
| ==== 假设 ==== | | ==== 假设 ==== |
− | # Consistency: <math>Y = Y^{*}(A)</math>
| + | 回顾对于协变量,操作和响应量的联合概率模型。当已知分别为,响应量的分布为 |
− | # No unmeasured confounders: <math>\{Y^{*}(0), Y^{*}(1)\} \perp A|X</math>
| |
− | #* Treatment assignment is based solely on covariate data and independent of potential outcomes.
| |
− | # Positivity: <math>P(A=a|X=x)>0 </math> for all <math>a</math> and <math>x</math>
| |
− | | |
− | # Consistency: Y = Y^{*}(A)
| |
− | # No unmeasured confounders: \{Y^{*}(0), Y^{*}(1)\} \perp A|X
| |
− | #* Treatment assignment is based solely on covariate data and independent of potential outcomes.
| |
− | # Positivity: P(A=a|X=x)>0 for all a and x
| |
− | | |
− | = = = = = = = = = = # 一致性: y = y ^ {
| |
− | * }(a) # 不存在未测量的混杂因素: { y ^ {
| |
− | * }(0) ,y ^ {
| |
− | * }(1)} a/p | x #
| |
− | * 治疗分配完全基于协数据,与潜在结果无关。# 正性: p (a = a | x = x) > 0表示所有 a 和 x
| |
| | | |
− | ==== Limitations ====
| |
− | The Inverse Probability Weighted Estimator (IPWE) can be unstable if estimated propensities are small. If the probability of either treatment assignment is small, then the logistic regression model can become unstable around the tails causing the IPWE to also be less stable.
| |
| | | |
− | The Inverse Probability Weighted Estimator (IPWE) can be unstable if estimated propensities are small. If the probability of either treatment assignment is small, then the logistic regression model can become unstable around the tails causing the IPWE to also be less stable.
| + | 我们做出以下假设: |
| + | # (A1)一致性(Consistency): <math>Y = Y^{*}(A)</math> |
| + | # (A2) 没有未观测的混淆因子: <math>\{Y^{*}(0), Y^{*}(1)\} \perp A|X</math>。更正式地说,对于每个有界和可测函数 |
| + | #* 这意味着治疗分配只基于协变量数据,与潜在结果无关。 |
| + | # (A3) 正值性(Positivity): 对于所有的 <math>a</math> 和 <math>x</math>,<math>P(A=a|X=x)>0 </math> 。 |
| | | |
− | = = = = 极限 = = = = = 反概率加权估计量(IPWE)在估计倾向较小时可能不稳定。如果任一处理分配的概率很小,那么 Logit模型模型可能在尾部附近变得不稳定,导致 IPWE 也变得不稳定。 | + | ==== 缺点 ==== |
| + | 逆概率加权估计量(IPWE)在估计倾向较小时可能不稳定。如果任一处理分配的概率很小,那么逻辑回归模型可能在尾部附近变得不稳定,导致逆概率加权估计量也变得不稳定。 |
| | | |
| == 增广逆概率加权估计器 == | | == 增广逆概率加权估计器 == |
− | An alternative estimator is the augmented inverse probability weighted estimator (AIPWE) combines both the properties of the regression based estimator and the inverse probability weighted estimator. It is therefore a 'doubly robust' method in that it only requires either the propensity or outcome model to be correctly specified but not both. This method augments the IPWE to reduce variability and improve estimate efficiency. This model holds the same assumptions as the Inverse Probability Weighted Estimator (IPWE).<ref>{{Cite journal|last1=Cao|first1=Weihua|last2=Tsiatis|first2=Anastasios A.|last3=Davidian|first3=Marie|author3-link= Marie Davidian |year=2009|title=Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data|journal=Biometrika|volume=96|issue=3|pages=723–734|doi=10.1093/biomet/asp033|issn=0006-3444|pmc=2798744|pmid=20161511}}</ref> | + | An alternative estimator is the augmented inverse probability weighted estimator (AIPWE) combines both the properties of the regression based estimator and the inverse probability weighted estimator. It is therefore a 'doubly robust' method in that it only requires either the propensity or outcome model to be correctly specified but not both. This method augments the IPWE to reduce variability and improve estimate efficiency. This model holds the same assumptions as the Inverse Probability Weighted Estimator (IPWE). |
| | | |
| An alternative estimator is the augmented inverse probability weighted estimator (AIPWE) combines both the properties of the regression based estimator and the inverse probability weighted estimator. It is therefore a 'doubly robust' method in that it only requires either the propensity or outcome model to be correctly specified but not both. This method augments the IPWE to reduce variability and improve estimate efficiency. This model holds the same assumptions as the Inverse Probability Weighted Estimator (IPWE). | | An alternative estimator is the augmented inverse probability weighted estimator (AIPWE) combines both the properties of the regression based estimator and the inverse probability weighted estimator. It is therefore a 'doubly robust' method in that it only requires either the propensity or outcome model to be correctly specified but not both. This method augments the IPWE to reduce variability and improve estimate efficiency. This model holds the same assumptions as the Inverse Probability Weighted Estimator (IPWE). |
| | | |
− | 另一种估计是增广逆概率加权估计(Augmented Inverse Probability Weighted Estimator,AIPWE) ,它综合了基于回归的估计和逆概率加权估计的性质。因此,这是一个双重稳健的方法,因为它只需要正确指定倾向或结果模型,而不是两者都要求。这种方法增强了 IPWE,减少了变异性,提高了估计效率。该模型与逆概率加权估计(IPWE)具有相同的假设条件。
| + | 另一种估计方法是增广逆概率加权估计(Augmented Inverse Probability Weighted Estimator,AIPWE) 。它融合了基于回归的估计和逆概率加权估计的性质。因此,它是一种“双重稳健”的方法。因为它只需要正确指定倾向或结果模型,而不是同时指定。这种方法增强了逆概率加权估计,以减少了变异性并提高了估计效率。该模型与逆概率加权估计(IPWE)具有相同的假设条件<ref>{{Cite journal|last1=Cao|first1=Weihua|last2=Tsiatis|first2=Anastasios A.|last3=Davidian|first3=Marie|author3-link= Marie Davidian |year=2009|title=Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data|journal=Biometrika|volume=96|issue=3|pages=723–734|doi=10.1093/biomet/asp033|issn=0006-3444|pmc=2798744|pmid=20161511}}</ref>。 |
| | | |
| === Estimator Formula === | | === Estimator Formula === |