更改

跳到导航 跳到搜索
添加133字节 、 2021年6月12日 (六) 23:52
校对至Formal definitions之前
第9行: 第9行:  
<font color="#aaaaaaa">【机器翻译】在观察数据的统计分析中,倾向性评分匹配是一种统计匹配技术,它试图通过计算预测接受治疗的协变量来估计治疗、政策或其他干预的效果。PSM 试图减少由于混杂变量造成的偏倚,这些变量可以通过简单地比较接受治疗的单位和没有接受治疗的单位之间的结果来估计治疗效果。保罗 · 罗森鲍姆和唐纳德 · 鲁宾在1983年介绍了这项技术。</font>
 
<font color="#aaaaaaa">【机器翻译】在观察数据的统计分析中,倾向性评分匹配是一种统计匹配技术,它试图通过计算预测接受治疗的协变量来估计治疗、政策或其他干预的效果。PSM 试图减少由于混杂变量造成的偏倚,这些变量可以通过简单地比较接受治疗的单位和没有接受治疗的单位之间的结果来估计治疗效果。保罗 · 罗森鲍姆和唐纳德 · 鲁宾在1983年介绍了这项技术。</font>
   −
在观察数据的统计分析中,倾向性评分匹配Propensity Score Matching (PSM)是一种统计匹配技术,用来估计治疗、政策或其他干预的效果,方法是将协变量对样本“是否接受处理”的影响考虑在内。PSM试图减少由于混杂变量造成的偏倚。这些偏倚一般会在那些只对处理单元和对照单元的结果做简单对比的评估中出现。保罗·罗森鲍姆Paul R. Rosenbaum和唐纳德·鲁宾Donald Rubin在1983年介绍了这项技术。
+
在观察数据的统计分析中,倾向性评分匹配Propensity Score Matching (PSM)是一种用于估计治疗、政策或其他干预的效果统计匹配技术,方法是将协变量对样本“是否接受处理”的影响考虑在内。PSM试图减少由于混杂变量造成的偏倚。这些偏倚一般会在那些只对处理单元和对照单元的结果做简单对比的评估中出现。保罗·罗森鲍姆Paul R. Rosenbaum和唐纳德·鲁宾Donald Rubin在1983年介绍了这项技术。
    
The possibility of bias arises because a difference in the treatment outcome (such as the [[average treatment effect]]) between treated and untreated groups may be caused by a factor that predicts treatment rather than the treatment itself. In [[randomized experiment]]s, the randomization enables unbiased estimation of treatment effects; for each covariate, randomization implies that treatment-groups will be balanced on average, by the [[law of large numbers]]. Unfortunately, for observational studies, the assignment of treatments to research subjects is typically not random. [[Matching (statistics)|Matching]] attempts to reduce the treatment assignment bias, and mimic randomization, by creating a sample of units that received the treatment that is comparable on all observed covariates to a sample of units that did not receive the treatment.
 
The possibility of bias arises because a difference in the treatment outcome (such as the [[average treatment effect]]) between treated and untreated groups may be caused by a factor that predicts treatment rather than the treatment itself. In [[randomized experiment]]s, the randomization enables unbiased estimation of treatment effects; for each covariate, randomization implies that treatment-groups will be balanced on average, by the [[law of large numbers]]. Unfortunately, for observational studies, the assignment of treatments to research subjects is typically not random. [[Matching (statistics)|Matching]] attempts to reduce the treatment assignment bias, and mimic randomization, by creating a sample of units that received the treatment that is comparable on all observed covariates to a sample of units that did not receive the treatment.
第17行: 第17行:  
<font color="#aaaaaaa">【机器翻译】出现偏倚的可能性是因为治疗组和未治疗组之间治疗结果(如平均治疗效果)的差异可能是由预测治疗的因素而不是治疗本身造成的。在随机实验中,随机化可以对治疗效果进行无偏估计; 对于每个协变量,随机化意味着治疗组将按照大数定律在平均水平上达到平衡。不幸的是,对于观察性研究来说,对研究对象的治疗分配通常不是随机的。匹配试图减少处理分配偏差,并模拟随机化,通过创建一个样本单位接受的处理是可比的所有观察到的协变量的一个样本单位没有接受处理。</font>
 
<font color="#aaaaaaa">【机器翻译】出现偏倚的可能性是因为治疗组和未治疗组之间治疗结果(如平均治疗效果)的差异可能是由预测治疗的因素而不是治疗本身造成的。在随机实验中,随机化可以对治疗效果进行无偏估计; 对于每个协变量,随机化意味着治疗组将按照大数定律在平均水平上达到平衡。不幸的是,对于观察性研究来说,对研究对象的治疗分配通常不是随机的。匹配试图减少处理分配偏差,并模拟随机化,通过创建一个样本单位接受的处理是可比的所有观察到的协变量的一个样本单位没有接受处理。</font>
   −
出现偏倚的原因可能是某个因素通过决定样本是否接受处理而导致了处理组和对照组的效果(如平均处理效果)差异,而不是处理本身导致了差异。在随机实验中,随机化选择样本可以做到对处理效果的无偏估计,根据大数定律,随机化意味着基于协变量的平均水平,均衡分配处理组和对照组。不幸的是,对于观察性研究来说,研究对象通常不是随机接受处理的。匹配就是要减少对象非随机接受处理产生的偏倚,并模拟随机试验,方法是从处理组和对照组中分别取样,让两组样本的全部协变量都比较接近。
+
出现偏倚的原因可能是某个因素通过决定样本是否接受处理而导致了处理组和对照组的效果(如平均处理效果)差异,而不是处理本身导致了差异。在随机实验中,随机化选择样本可以做到对处理效果的无偏估计,根据大数定律,随机化分配机制意味着每个协变量将在处理组和对照组中呈现类似的分布。不幸的是,对于观察性研究来说,研究对象通常不是随机接受处理的。匹配就是要减少对象非随机接受处理产生的偏倚,并模拟随机试验,方法是从处理组和对照组中分别取样,让两组样本的全部协变量都比较接近。
    
For example, one may be interested to know the [[Health_effects_of_tobacco#Early_observational_studies|consequences of smoking]]. An observational study is required since it is unethical to randomly assign people to the treatment 'smoking.' The treatment effect estimated by simply comparing those who smoked to those who did not smoke would be biased by any factors that predict smoking (e.g.: gender and age). PSM attempts to control for these biases by making the groups receiving treatment and not-treatment comparable with respect to the control variables.
 
For example, one may be interested to know the [[Health_effects_of_tobacco#Early_observational_studies|consequences of smoking]]. An observational study is required since it is unethical to randomly assign people to the treatment 'smoking.' The treatment effect estimated by simply comparing those who smoked to those who did not smoke would be biased by any factors that predict smoking (e.g.: gender and age). PSM attempts to control for these biases by making the groups receiving treatment and not-treatment comparable with respect to the control variables.
第25行: 第25行:  
<font color="#aaaaaaa">【机器翻译】例如,人们可能有兴趣知道吸烟的后果。因为随机分配患者接受‘吸烟’治疗是不道德的,所以需要一个观察性研究简单地比较吸烟者和不吸烟者的治疗效果会受到任何预测吸烟的因素的影响(例如:。: 性别及年龄)。PSM 试图通过使接受治疗和不接受治疗的组与控制变量相比较来控制这些偏差。</font>
 
<font color="#aaaaaaa">【机器翻译】例如,人们可能有兴趣知道吸烟的后果。因为随机分配患者接受‘吸烟’治疗是不道德的,所以需要一个观察性研究简单地比较吸烟者和不吸烟者的治疗效果会受到任何预测吸烟的因素的影响(例如:。: 性别及年龄)。PSM 试图通过使接受治疗和不接受治疗的组与控制变量相比较来控制这些偏差。</font>
   −
例如,人们想知道吸烟的后果。但是随机分配让患者“吸烟”是不道德的,所以需要做一个观察性研究。简单对比评估吸烟者和不吸烟者会让处理效果产生偏差,它会受到能影响吸烟行为的因素的影响(例如:性别及年龄)。PSM要做的是通过让处理组和对照组的控制变量尽量相似来达到控制这些偏差的目的。
+
例如,人们想知道吸烟的后果。但是随机分配让患者“吸烟”是不道德的,所以需要做一个观察性研究。简单地通过对比评估吸烟者和不吸烟者来估计平均处理效果将产生偏差,它会受到能影响吸烟行为的因素的影响(例如:性别及年龄)。PSM要做的是通过让处理组和对照组的控制变量尽量相似来达到控制这些偏差的目的。
   −
== Overview ==
+
==Overview ==
 +
 
 +
==综述==
   −
== 综述 ==
        第38行: 第39行:  
<font color="#aaaaaaa">【机器翻译】PSM 适用于非实验环境中因果推断和简单选择偏差的情况,其中: (i)非处理对照组中与处理单元可比的单元很少; (ii)选择与处理单元类似的比较单元子集很困难,因为必须跨一组高维预处理特征进行比较。</font>
 
<font color="#aaaaaaa">【机器翻译】PSM 适用于非实验环境中因果推断和简单选择偏差的情况,其中: (i)非处理对照组中与处理单元可比的单元很少; (ii)选择与处理单元类似的比较单元子集很困难,因为必须跨一组高维预处理特征进行比较。</font>
   −
PSM适用于非实验环境中因果推断和简单选择偏差的情况,其中: (i)对照组与处理组中的类似的单元很少; (ii)选择与处理单元类似的对照单元集合很困难,因为必须对一组高维的处理前特征进行比较。
+
PSM适用于非实验环境中因果推断和简单选择偏差的情况,其中: (i)对照组与处理组中的类似单元很少; (ii)选择与处理单元类似的对照单元集合很困难,因为必须对一组高维的协变量特征进行比较。
    
In normal matching, single characteristics that distinguish treatment and control groups are matched in an attempt to make the groups more alike. But if the two groups do not have substantial overlap, then substantial [[Errors and residuals|error]] may be introduced. For example, if only the worst cases from the [[control group|untreated "comparison" group]] are compared to only the best cases from the [[treatment group]], the result may be [[regression toward the mean]], which may make the comparison group look better or worse than reality.
 
In normal matching, single characteristics that distinguish treatment and control groups are matched in an attempt to make the groups more alike. But if the two groups do not have substantial overlap, then substantial [[Errors and residuals|error]] may be introduced. For example, if only the worst cases from the [[control group|untreated "comparison" group]] are compared to only the best cases from the [[treatment group]], the result may be [[regression toward the mean]], which may make the comparison group look better or worse than reality.
第46行: 第47行:  
<font color="#aaaaaaa">【机器翻译】在正常的匹配中,区分治疗组和对照组的单一特征被匹配,试图使这些组更加相似。但是,如果这两个组没有实质性的重叠,那么可能会引入实质性的错误。例如,如果只将来自未经治疗的对照组的最差病例与来自治疗组的最好病例进行比较,结果可能是趋中回归,这可能使对照组看起来比实际情况更好或更糟。</font>
 
<font color="#aaaaaaa">【机器翻译】在正常的匹配中,区分治疗组和对照组的单一特征被匹配,试图使这些组更加相似。但是,如果这两个组没有实质性的重叠,那么可能会引入实质性的错误。例如,如果只将来自未经治疗的对照组的最差病例与来自治疗组的最好病例进行比较,结果可能是趋中回归,这可能使对照组看起来比实际情况更好或更糟。</font>
   −
在正常的匹配中,对一组能够区分处理组和对照组的特征做匹配,以使两组的特征更加相似。但如果这两个组的特征没有实质性的重叠,那么可能会引入实质性的错误。例如,拿对照组最糟的病例和处理组最好的病例进行比较,结果可能倾向于回归均值,这会让对照组看起来比实际情况更好或更糟。
+
在常规的匹配机制中,对一组能够区分处理组和对照组的特征做匹配,以使两组的特征更加相似。但如果这两个组的特征没有显著的重叠,那么可能会引入实质性的错误。例如,拿对照组最糟的病例和处理组最好的病例进行比较,结果可能倾向于回归均值,这会让对照组看起来比实际情况更好或更糟。
 +
 
      第58行: 第60行:       −
== General procedure ==
+
==General procedure==
   −
== 一般步骤 ==
+
==一般步骤==
    
1. Run [[logistic regression]]:
 
1. Run [[logistic regression]]:
第79行: 第81行:     
*计算倾向性评分的[[Estimator|估计量]]:预测概率(''p'')或log[''p''/(1&nbsp;−&nbsp;''p'')]。
 
*计算倾向性评分的[[Estimator|估计量]]:预测概率(''p'')或log[''p''/(1&nbsp;−&nbsp;''p'')]。
 +
      第87行: 第90行:  
<font color="#aaaaaaa">2. 【机器翻译】检查协变量是平衡的治疗和比较组内的倾向分层。</font>
 
<font color="#aaaaaaa">2. 【机器翻译】检查协变量是平衡的治疗和比较组内的倾向分层。</font>
   −
2. <font color="#32cd32">检查协变量的倾向性评分在处理组和对照组是否均衡</font>
+
2. 依照倾向性评分的估计量进行分层,<font color="#32cd32">检查协变量的倾向性评分的估计量在每层处理组和对照组是否均衡</font>
   −
* Use standardized differences or graphs to examine distributions
+
*Use standardized differences or graphs to examine distributions
    
*使用标准化差异指标或者图形来检验分布情况
 
*使用标准化差异指标或者图形来检验分布情况
 +
      第100行: 第104行:  
<font color="#aaaaaaa">3. 【机器翻译】根据倾向得分,将每个参与者与一个或多个非参与者进行匹配,使用以下方法之一:</font>
 
<font color="#aaaaaaa">3. 【机器翻译】根据倾向得分,将每个参与者与一个或多个非参与者进行匹配,使用以下方法之一:</font>
   −
3. 根据倾向性评分,将每个参与者与一个或多个非参与者匹配,使用以下方法之一:
+
3. 根据倾向性评分的估计量,将每个处理组个体与一个或多个对照组个体进行匹配,使用以下方法之一:
    
*[[Nearest neighbor search|Nearest neighbor matching]]
 
*[[Nearest neighbor search|Nearest neighbor matching]]
第106行: 第110行:  
*[[Nearest neighbor search|最近邻匹配]]
 
*[[Nearest neighbor search|最近邻匹配]]
   −
*Caliper matching: comparison units within a certain width of the propensity score of the treated units get matched, where the width is generally a fraction of the standard deviation of the propensity score  
+
*Caliper matching: comparison units within a certain width of the propensity score of the treated units get matched, where the width is generally a fraction of the standard deviation of the propensity score
    
*卡钳匹配:在处理单元倾向性评分的一个范围内选取对照单元,范围的宽度通常用倾向性评分的标准差乘上一个比例值
 
*卡钳匹配:在处理单元倾向性评分的一个范围内选取对照单元,范围的宽度通常用倾向性评分的标准差乘上一个比例值
第146行: 第150行:  
*Use analyses appropriate for non-independent matched samples if more than one nonparticipant is matched to each participant
 
*Use analyses appropriate for non-independent matched samples if more than one nonparticipant is matched to each participant
   −
*如果每个参与者都匹配了多个非参与者,则适当应用非独立匹配样本分析
+
* 如果每个参与者都匹配了多个非参与者,则适当应用非独立匹配样本分析
 +
 
      第158行: 第163行:        +
==Formal definitions==
   −
== Formal definitions ==
+
==形式定义 ==
   −
== 形式定义 ==
+
===Basic settings ===
 
  −
===Basic settings===
      
===基本设置===
 
===基本设置===
   −
The basic case<ref name="Rosenbaum 1983 41–55"/> is of two treatments (numbered 1 and 0), with ''N'' [Independent and identically distributed random variables|i.i.d] subjects. Each subject ''i'' would respond to the treatment with <math>r_{1i}</math> and to the control with <math>r_{0i}</math>. The quantity to be estimated is the [[average treatment effect]]: <math>E[r_1]-E[r_0]</math>. The variable <math>Z_i</math> indicates if subject ''i'' got treatment (''Z''&nbsp;=&nbsp;1) or control (''Z''&nbsp;=&nbsp;0). Let <math>X_i</math> be a vector of observed pretreatment measurement (or covariate) for the ''i''th subject. The observations of <math>X_i</math> are made prior to treatment assignment, but the features in <math>X_i</math> may not include all (or any) of the ones used to decide on the treatment assignment. The numbering of the units (i.e.: ''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''i''&nbsp;=&nbsp;''N'') are assumed to not contain any information beyond what is contained in <math>X_i</math>. The following sections will omit the ''i'' index while still discussing about the stochastic behavior of some subject.
+
The basic case<ref name="Rosenbaum 1983 41–55" /> is of two treatments (numbered 1 and 0), with ''N'' [Independent and identically distributed random variables|i.i.d] subjects. Each subject ''i'' would respond to the treatment with <math>r_{1i}</math> and to the control with <math>r_{0i}</math>. The quantity to be estimated is the [[average treatment effect]]: <math>E[r_1]-E[r_0]</math>. The variable <math>Z_i</math> indicates if subject ''i'' got treatment (''Z''&nbsp;=&nbsp;1) or control (''Z''&nbsp;=&nbsp;0). Let <math>X_i</math> be a vector of observed pretreatment measurement (or covariate) for the ''i''th subject. The observations of <math>X_i</math> are made prior to treatment assignment, but the features in <math>X_i</math> may not include all (or any) of the ones used to decide on the treatment assignment. The numbering of the units (i.e.: ''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''i''&nbsp;=&nbsp;''N'') are assumed to not contain any information beyond what is contained in <math>X_i</math>. The following sections will omit the ''i'' index while still discussing about the stochastic behavior of some subject.
   −
基本场景<ref name="Rosenbaum 1983 41–55"/>是,有两种处理方式(分别记为1和0),''N''个[[Independent and identically distributed random variables|独立同分布]]物体。每个物体''i''如果接受了处理则响应为<math>r_{1i}</math>,接受控制则响应为<math>r_{0i}</math>。被估计量是[[average treatment effect|平均处理效应]]:<math>E[r_1]-E[r_0]</math>。变量<math>Z_i</math>指示物体''i''接受处理(''Z''&nbsp;=&nbsp;1)还是接受控制(''Z''&nbsp;=&nbsp;0)。让<math>X_i</math>代表第''i''个物体处理前观测值(或者协变量)的向量。对<math>X_i</math>的测量发生于处理前,但是<math>X_i</math>中也可以不包括那些决定是否接受处理的特征。单元编号(即:''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''i''&nbsp;=&nbsp;''N'')不包含任何<math>X_i</math>所包含信息之外的的信息。以下部分在讨论某些物体的随机行为的时候将省略索引''i''。
+
基本场景<ref name="Rosenbaum 1983 41–55" />是,有两种处理方式(分别记为1和0),''N''个[[Independent and identically distributed random variables|独立同分布]]物体。每个物体''i''如果接受了处理则响应为<math>r_{1i}</math>,接受控制则响应为<math>r_{0i}</math>。被估计量是[[average treatment effect|平均处理效应]]:<math>E[r_1]-E[r_0]</math>。变量<math>Z_i</math>指示物体''i''接受处理(''Z''&nbsp;=&nbsp;1)还是接受控制(''Z''&nbsp;=&nbsp;0)。让<math>X_i</math>代表第''i''个物体处理前观测值(或者协变量)的向量。对<math>X_i</math>的测量发生于处理前,但是<math>X_i</math>中也可以不包括那些决定是否接受处理的特征。单元编号(即:''i''&nbsp;=&nbsp;1,&nbsp;...,&nbsp;''i''&nbsp;=&nbsp;''N'')不包含任何<math>X_i</math>所包含信息之外的的信息。以下部分在讨论某些物体的随机行为的时候将省略索引''i''。
      第186行: 第190行:  
:<math> r_0, r_1 \perp Z \mid X </math>
 
:<math> r_0, r_1 \perp Z \mid X </math>
   −
这里<math>\perp</math>代表[[statistical independence|统计独立]].<ref name="Rosenbaum 1983 41–55"/>
+
这里<math>\perp</math>代表[[statistical independence|统计独立]].<ref name="Rosenbaum 1983 41–55" />
   −
===Balancing score===
+
=== Balancing score===
   −
===平衡得分===
+
=== 平衡得分===
    
A '''balancing score''' ''b''(''X'') is a function of the observed covariates ''X'' such that the [[conditional probability|conditional distribution]] of ''X'' given ''b''(''X'') is the same for treated (''Z''&nbsp;=&nbsp;1) and control (''Z''&nbsp;=&nbsp;0) units:
 
A '''balancing score''' ''b''(''X'') is a function of the observed covariates ''X'' such that the [[conditional probability|conditional distribution]] of ''X'' given ''b''(''X'') is the same for treated (''Z''&nbsp;=&nbsp;1) and control (''Z''&nbsp;=&nbsp;0) units:
第204行: 第208行:  
===Propensity score===
 
===Propensity score===
   −
===倾向性评分===
+
===倾向性评分 ===
    
A '''propensity score''' is the [[probability]] of a unit (e.g., person, classroom, school) being assigned to a particular treatment given a set of observed covariates.  Propensity scores are used to reduce [[selection bias]] by equating groups based on these covariates.
 
A '''propensity score''' is the [[probability]] of a unit (e.g., person, classroom, school) being assigned to a particular treatment given a set of observed covariates.  Propensity scores are used to reduce [[selection bias]] by equating groups based on these covariates.
第230行: 第234行:  
===主要定理===
 
===主要定理===
   −
The following were first presented, and proven, by Rosenbaum and Rubin in 1983:<ref name="Rosenbaum 1983 41–55"/>
+
The following were first presented, and proven, by Rosenbaum and Rubin in 1983:<ref name="Rosenbaum 1983 41–55" />
    
The following were first presented, and proven, by Rosenbaum and Rubin in 1983:[1]
 
The following were first presented, and proven, by Rosenbaum and Rubin in 1983:[1]
   −
以下是Rosenbaum和Rubin于1983年首次提出并证明的:<ref name="Rosenbaum 1983 41–55"/>
+
以下是Rosenbaum和Rubin于1983年首次提出并证明的:<ref name="Rosenbaum 1983 41–55" />
   −
* The propensity score <math>e(x)</math> is a balancing score.
+
*The propensity score <math>e(x)</math> is a balancing score.
* 倾向性评分<math>e(x)</math>是平衡得分。
+
*倾向性评分<math>e(x)</math>是平衡得分。
   −
* Any score that is 'finer' than the propensity score is a balancing score (i.e.: <math>e(X)=f(b(X))</math> for some function ''f''). The propensity score is the coarsest balancing score function, as it takes a (possibly) multidimensional object (''X''<sub>''i''</sub>) and transforms it into one dimension (although others, obviously, also exist), while <math>b(X)=X</math> is the finest one.
+
*Any score that is 'finer' than the propensity score is a balancing score (i.e.: <math>e(X)=f(b(X))</math> for some function ''f''). The propensity score is the coarsest balancing score function, as it takes a (possibly) multidimensional object (''X''<sub>''i''</sub>) and transforms it into one dimension (although others, obviously, also exist), while <math>b(X)=X</math> is the finest one.
* 任何比倾向性评分更“精细”的得分都是平衡得分(即:对于函数''f'',<math>e(X)=f(b(X))</math>)。倾向性评分是最粗粒度的平衡得分函数,因为它把一个(可能是)多维的对象(''X''<sub>''i''</sub>)转换成只有一维(尽管其他维度显然也存在),而<math>b(X)=X</math>则是最细粒度的平衡得分函数(保留全部维度)。
+
*任何比倾向性评分更“精细”的得分都是平衡得分(即:对于函数''f'',<math>e(X)=f(b(X))</math>)。倾向性评分是最粗粒度的平衡得分函数,因为它把一个(可能是)多维的对象(''X''<sub>''i''</sub>)转换成只有一维(尽管其他维度显然也存在),而<math>b(X)=X</math>则是最细粒度的平衡得分函数(保留全部维度)。
   −
* If treatment assignment is strongly ignorable given ''X'' then:
+
*If treatment assignment is strongly ignorable given ''X'' then:
* 如果对于给定的''X'',处理分配满足强可忽略条件,则:
+
*如果对于给定的''X'',处理分配满足强可忽略条件,则:
   −
:* It is also strongly ignorable given any balancing function. Specifically, given the propensity score:
+
:*It is also strongly ignorable given any balancing function. Specifically, given the propensity score:
 
:* 给定任何的平衡函数,具体来说,给定倾向性评分,处理分配也是强可忽略的:
 
:* 给定任何的平衡函数,具体来说,给定倾向性评分,处理分配也是强可忽略的:
    
:::<math> (r_0, r_1) \perp Z \mid e(X).</math>
 
:::<math> (r_0, r_1) \perp Z \mid e(X).</math>
 
回归
 
回归
:* For any value of a balancing score, the difference between the treatment and control means of the samples at hand (i.e.: <math>\bar{r}_1-\bar{r}_0</math>), based on subjects that have the same value of the balancing score, can serve as an [[Bias of an estimator|unbiased estimator]] of the [[average treatment effect]]: <math>E[r_1]-E[r_0]</math>.  
+
:*For any value of a balancing score, the difference between the treatment and control means of the samples at hand (i.e.: <math>\bar{r}_1-\bar{r}_0</math>), based on subjects that have the same value of the balancing score, can serve as an [[Bias of an estimator|unbiased estimator]] of the [[average treatment effect]]: <math>E[r_1]-E[r_0]</math>.
:* 对于有相同平衡得分值的处理样本和对照样本,它们响应变量均值之差(即:<math>\bar{r}_1-\bar{r}_0</math>),可以作为[[average treatment effect|平均处理效应]]的[[Bias of an estimator|无偏估计量]]:<math>E[r_1]-E[r_0]</math>。
+
:*对于有相同平衡得分值的处理样本和对照样本,它们响应变量均值之差(即:<math>\bar{r}_1-\bar{r}_0</math>),可以作为[[average treatment effect|平均处理效应]]的[[Bias of an estimator|无偏估计量]]:<math>E[r_1]-E[r_0]</math>。
   −
* Using sample estimates of balancing scores can produce sample balance on&nbsp;''X''
+
*Using sample estimates of balancing scores can produce sample balance on&nbsp;''X''
* 利用平衡得分的样本估计可产生在X上均衡的样本
+
*利用平衡得分的样本估计可产生在X上均衡的样本
    
===Relationship to sufficiency===
 
===Relationship to sufficiency===
===与充分性的关系===
+
===与充分性的关系 ===
    
If we think of the value of ''Z'' as a [[Statistical parameter|parameter]] of the population that impacts the distribution of ''X'' then the balancing score serves as a [[Sufficient_statistic#Mathematical_definition|sufficient statistic]] for ''Z''. Furthermore, the above theorems indicate that the propensity score is a [[Sufficient_statistic#Minimal_sufficiency|minimal sufficient statistic]] if thinking of ''Z'' as a parameter of ''X''. Lastly, if treatment assignment ''Z'' is strongly ignorable given ''X'' then the propensity score is a [[Sufficient_statistic#Minimal_sufficiency|minimal sufficient statistic]] for the joint distribution of <math>(r_0, r_1)</math>.
 
If we think of the value of ''Z'' as a [[Statistical parameter|parameter]] of the population that impacts the distribution of ''X'' then the balancing score serves as a [[Sufficient_statistic#Mathematical_definition|sufficient statistic]] for ''Z''. Furthermore, the above theorems indicate that the propensity score is a [[Sufficient_statistic#Minimal_sufficiency|minimal sufficient statistic]] if thinking of ''Z'' as a parameter of ''X''. Lastly, if treatment assignment ''Z'' is strongly ignorable given ''X'' then the propensity score is a [[Sufficient_statistic#Minimal_sufficiency|minimal sufficient statistic]] for the joint distribution of <math>(r_0, r_1)</math>.
第275行: 第279行:  
朱迪亚·珀尔Judea Pearl已经表明存在一个简单的图检测方法,称为后门准则,它可以检测到混杂变量的存在。为了估计处理效果,背景变量X必须阻断图中的所有后门路径。通过把混杂变量加入回归的控制变量,或者在混杂变量上进行匹配可以实现后门路径的阻断。
 
朱迪亚·珀尔Judea Pearl已经表明存在一个简单的图检测方法,称为后门准则,它可以检测到混杂变量的存在。为了估计处理效果,背景变量X必须阻断图中的所有后门路径。通过把混杂变量加入回归的控制变量,或者在混杂变量上进行匹配可以实现后门路径的阻断。
   −
==Disadvantages==
+
== Disadvantages==
==缺点==
+
== 缺点 ==
    
PSM has been shown to increase model "imbalance, inefficiency, model dependence, and bias," which is not the case with most other matching methods. The insights behind the use of matching still hold but should be applied with other matching methods; propensity scores also have other productive uses in weighting and doubly robust estimation.
 
PSM has been shown to increase model "imbalance, inefficiency, model dependence, and bias," which is not the case with most other matching methods. The insights behind the use of matching still hold but should be applied with other matching methods; propensity scores also have other productive uses in weighting and doubly robust estimation.
第311行: 第315行:     
[[Category:待整理页面]]
 
[[Category:待整理页面]]
 +
 +
<references />
15

个编辑

导航菜单