“倾向得分匹配”的版本间的差异

2021年5月25日 (二) 15:19的版本

此词条暂由彩云小译翻译，翻译字数共893，未经人工整理和审校，带来阅读不便，请见谅。

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.^[1]

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

在观察数据的统计分析中，倾向评分匹配是一种统计匹配技术，它试图通过计算预测接受治疗的协变量来估计治疗、政策或其他干预的效果。PSM 试图减少由于混杂变量造成的偏倚，这些变量可以通过简单地比较接受治疗的单位和没有接受治疗的单位之间的结果来估计治疗效果。保罗 · 罗森鲍姆和唐纳德 · 鲁宾在1983年介绍了这项技术。

The possibility of bias arises because a difference in the treatment outcome (such as the average treatment effect) between treated and untreated groups may be caused by a factor that predicts treatment rather than the treatment itself. In randomized experiments, the randomization enables unbiased estimation of treatment effects; for each covariate, randomization implies that treatment-groups will be balanced on average, by the law of large numbers. Unfortunately, for observational studies, the assignment of treatments to research subjects is typically not random. Matching attempts to reduce the treatment assignment bias, and mimic randomization, by creating a sample of units that received the treatment that is comparable on all observed covariates to a sample of units that did not receive the treatment.

出现偏倚的可能性是因为治疗组和未治疗组之间治疗结果(如平均治疗效果)的差异可能是由预测治疗的因素而不是治疗本身造成的。在随机实验中，随机化可以对治疗效果进行无偏估计; 对于每个协变量，随机化意味着治疗组将按照大数定律在平均水平上达到平衡。不幸的是，对于观察性研究来说，对研究对象的治疗分配通常不是随机的。匹配试图减少处理分配偏差，并模拟随机化，通过创建一个样本单位接受的处理是可比的所有观察到的协变量的一个样本单位没有接受处理。

For example, one may be interested to know the consequences of smoking. An observational study is required since it is unethical to randomly assign people to the treatment 'smoking.' The treatment effect estimated by simply comparing those who smoked to those who did not smoke would be biased by any factors that predict smoking (e.g.: gender and age). PSM attempts to control for these biases by making the groups receiving treatment and not-treatment comparable with respect to the control variables.

例如，人们可能有兴趣知道吸烟的后果。因为随机分配患者接受‘吸烟’治疗是不道德的，所以需要一个观察性研究简单地比较吸烟者和不吸烟者的治疗效果会受到任何预测吸烟的因素的影响(例如:。: 性别及年龄)。PSM 试图通过使接受治疗和不接受治疗的组与控制变量相比较来控制这些偏差。

Overview

PSM is for cases of causal inference and simple selection bias in non-experimental settings in which: (i) few units in the non-treatment comparison group are comparable to the treatment units; and (ii) selecting a subset of comparison units similar to the treatment unit is difficult because units must be compared across a high-dimensional set of pretreatment characteristics.

PSM 适用于非实验环境中因果推断和简单选择偏差的情况，其中: (i)非处理对照组中与处理单元可比的单元很少; (ii)选择与处理单元类似的比较单元子集很困难，因为必须跨一组高维预处理特征进行比较。

In normal matching, single characteristics that distinguish treatment and control groups are matched in an attempt to make the groups more alike. But if the two groups do not have substantial overlap, then substantial error may be introduced. For example, if only the worst cases from the untreated "comparison" group are compared to only the best cases from the treatment group, the result may be regression toward the mean, which may make the comparison group look better or worse than reality.

在正常的匹配中，区分治疗组和对照组的单一特征被匹配，试图使这些组更加相似。但是，如果这两个组没有实质性的重叠，那么可能会引入实质性的错误。例如，如果只将来自未经治疗的对照组的最差病例与来自治疗组的最好病例进行比较，结果可能是趋中回归，这可能使对照组看起来比实际情况更好或更糟。

PSM employs a predicted probability of group membership—e.g., treatment versus control group—based on observed predictors, usually obtained from logistic regression to create a counterfactual group. Propensity scores may be used for matching or as covariates, alone or with other matching variables or covariates.

PSM 使用了一种预测的群体成员概率---- 例如，治疗组与控制组---- 基于观察预测，通常从 Logit模型获得来创造一个反事实的群体。倾向得分可用于匹配或作为协变量，单独或与其他匹配变量或协变量。

General procedure

1. Run logistic regression:

1.返回文章页面 Logit模型:

Dependent variable: Z = 1, if unit participated (i.e. is member of the treatment group); Z = 0, if unit did not participate (i.e. is member of the control group).

Choose appropriate confounders (variables hypothesized to be associated with both treatment and outcome)

Obtain an estimation for the propensity score: predicted probability (p) or log[p/(1 − p)].

2. Check that covariates are balanced across treatment and comparison groups within strata of the propensity score.

2.检查协变量是平衡的治疗和比较组内的倾向分层。

Use standardized differences or graphs to examine distributions

3. Match each participant to one or more nonparticipants on propensity score, using one of these methods:

3.根据倾向得分，将每个参与者与一个或多个非参与者进行匹配，使用以下方法之一:

Nearest neighbor matching

Caliper matching: comparison units within a certain width of the propensity score of the treated units get matched, where the width is generally a fraction of the standard deviation of the propensity score

Mahalanobis metric matching in conjunction with PSM

Stratification matching

Difference-in-differences matching (kernel and local linear weights)

Exact matching

4. Verify that covariates are balanced across treatment and comparison groups in the matched or weighted sample

4.验证协变量是平衡的处理和对照组在匹配或加权样本

5. Multivariate analysis based on new sample

5.基于新样本的多变量分析

Use analyses appropriate for non-independent matched samples if more than one nonparticipant is matched to each participant

Note: When you have multiple matches for a single treated observation, it is essential to use Weighted Least Squares rather than Ordinary Least Squares.

注意: 当你有多个匹配的单一处理的观察，它是必不可少的使用加权最小二乘而不是一般最小平方法。

Formal definitions

Basic settings

The basic case^[1] is of two treatments (numbered 1 and 0), with N [Independent and identically distributed random variables|i.i.d] subjects. Each subject i would respond to the treatment with [math]\displaystyle{ r_{1i} }[/math] and to the control with [math]\displaystyle{ r_{0i} }[/math]. The quantity to be estimated is the average treatment effect: [math]\displaystyle{ E[r_1]-E[r_0] }[/math]. The variable [math]\displaystyle{ Z_i }[/math] indicates if subject i got treatment (Z = 1) or control (Z = 0). Let [math]\displaystyle{ X_i }[/math] be a vector of observed pretreatment measurement (or covariate) for the ith subject. The observations of [math]\displaystyle{ X_i }[/math] are made prior to treatment assignment, but the features in [math]\displaystyle{ X_i }[/math] may not include all (or any) of the ones used to decide on the treatment assignment. The numbering of the units (i.e.: i = 1, ..., i = N) are assumed to not contain any information beyond what is contained in [math]\displaystyle{ X_i }[/math]. The following sections will omit the i index while still discussing about the stochastic behavior of some subject.

The basic case

基本情况

Strongly ignorable treatment assignment

PSM has been shown to increase model "imbalance, inefficiency, model dependence, and bias," which is not the case with most other matching methods. The insights behind the use of matching still hold but should be applied with other matching methods; propensity scores also have other productive uses in weighting and doubly robust estimation.

PSM 已经被证明会增加模型的“不平衡性、低效率、模型依赖性和偏差”，这与大多数其他匹配方法不同。使用匹配的见解仍然有效，但应该与其他匹配方法一起应用; 倾向得分在加权和双重稳健估计方面也有其他有益的用途。

Let some subject have a vector of covariates X (i.e.: conditionally unconfounded), and some potential outcomes r₀ and r₁ under control and treatment, respectively. Treatment assignment is said to be strongly ignorable if the potential outcomes are independent of treatment (Z) conditional on background variables X. This can be written compactly as

Like other matching procedures, PSM estimates an average treatment effect from observational data. The key advantages of PSM were, at the time of its introduction, that by using a linear combination of covariates for a single score, it balances treatment and control groups on a large number of covariates without losing a large number of observations. If units in the treatment and control were balanced on a large number of covariates one at a time, large numbers of observations would be needed to overcome the "dimensionality problem" whereby the introduction of a new balancing covariate increases the minimum necessary number of observations in the sample geometrically.

与其他匹配程序一样，PSM 从观测数据中估计平均处理效果。在引入 PSM 的时候，它的主要优点是，通过使用一个线性组合的协变量作为一个单一的评分，它平衡了治疗组和对照组在大量的协变量上，而不会失去大量的观察数据。如果处理和控制中的单元在大量的协变量上一次平衡，就需要大量的观测数据来克服“维数问题”，即引入新的平衡协变量几何地增加样本中必要的最小观测数据。

[math]\displaystyle{ r_0, r_1 \perp Z \mid X }[/math]

One disadvantage of PSM is that it only accounts for observed (and observable) covariates and not latent characteristics. Factors that affect assignment to treatment and outcome but that cannot be observed cannot be accounted for in the matching procedure. As the procedure only controls for observed variables, any hidden bias due to latent variables may remain after matching. Another issue is that PSM requires large samples, with substantial overlap between treatment and control groups.

PSM 的一个缺点是它只能解释观察到的(和可观察到的)协变量，而不能解释潜在的特征。影响治疗分配和结果但无法观察的因素不能在匹配程序中说明。由于程序只控制观察变量，任何隐藏的偏见由于潜在变量可能仍然匹配后。另一个问题是 PSM 需要大量的样本，治疗组和对照组之间有大量的重叠。

where [math]\displaystyle{ \perp }[/math] denotes statistical independence.^[1]

General concerns with matching have also been raised by Judea Pearl, who has argued that hidden bias may actually increase because matching on observed variables may unleash bias due to dormant unobserved confounders. Similarly, Pearl has argued that bias reduction can only be assured (asymptotically) by modelling the qualitative causal relationships between treatment, outcome, observed and unobserved covariates. Confounding occurs when the experimenter is unable to control for alternative, non-causal explanations for an observed relationship between independent and dependent variables. Such control should satisfy the "backdoor criterion" of Pearl. It can also easily be implemented manually.

朱迪亚 · 珀尔也提出了关于配对的普遍担忧，他认为隐性偏见实际上可能会增加，因为观察变量的配对可能会由于潜在的未观察混杂因素而释放出偏见。同样，珀尔认为，只有通过建立治疗、结果、观察和未观察协变量之间的定性因果关系模型，才能确保(渐近地)减少偏见。当实验者无法控制对独立变量和因变量之间观察到的关系的替代性、非因果性解释时，混淆就发生了。这种控制应满足珍珠的“后门规范”。它也可以很容易地手动实现。

Balancing score

A balancing score b(X) is a function of the observed covariates X such that the conditional distribution of X given b(X) is the same for treated (Z = 1) and control (Z = 0) units:

[math]\displaystyle{ Z \perp X \mid b(X). }[/math]

The most trivial function is [math]\displaystyle{ b(X) = X }[/math].

Propensity score

A propensity score is the probability of a unit (e.g., person, classroom, school) being assigned to a particular treatment given a set of observed covariates. Propensity scores are used to reduce selection bias by equating groups based on these covariates.

Suppose that we have a binary treatment indicator Z, a response variable r, and background observed covariates X. The propensity score is defined as the conditional probability of treatment given background variables:

[math]\displaystyle{ e(x) \ \stackrel{\mathrm{def}}{=}\ \Pr(Z=1 \mid X=x). }[/math]

In the context of causal inference and survey methodology, propensity scores are estimated (via methods such as logistic regression, random forests, or others), using some set of covariates. These propensity scores are then used as estimators for weights to be used with Inverse probability weighting methods.

Main theorems

The following were first presented, and proven, by Rosenbaum and Rubin in 1983:^[1]

The propensity score [math]\displaystyle{ e(x) }[/math] is a balancing score.

Any score that is 'finer' than the propensity score is a balancing score (i.e.: [math]\displaystyle{ e(X)=f(b(X)) }[/math] for some function f). The propensity score is the coarsest balancing score function, as it takes a (possibly) multidimensional object (X_i) and transforms it into one dimension (although others, obviously, also exist), while [math]\displaystyle{ b(X)=X }[/math] is the finest one.

If treatment assignment is strongly ignorable given X then:

It is also strongly ignorable given any balancing function. Specifically, given the propensity score:

[math]\displaystyle{ (r_0, r_1) \perp Z \mid e(X). }[/math]

Category:Regression analysis

类别: 回归分析

For any value of a balancing score, the difference between the treatment and control means of the samples at hand (i.e.: [math]\displaystyle{ \bar{r}_1-\bar{r}_0 }[/math]), based on subjects that have the same value of the balancing score, can serve as an unbiased estimator of the average treatment effect: [math]\displaystyle{ E[r_1]-E[r_0] }[/math].

Category:Epidemiology

类别: 流行病学

Using sample estimates of balancing scores can produce sample balance on X

Category:Observational study

类别: 观察性研究

Category:Causal inference

类别: 因果推理

This page was moved from wikipedia:en:Propensity score matching. Its edit history can be viewed at 倾向评分/edithistory

↑ ^1.0 ^1.1 ^1.2 ^1.3 Rosenbaum, Paul R.; Rubin, Donald B. (1983). "The Central Role of the Propensity Score in Observational Studies for Causal Effects". Biometrika. 70 (1): 41–55. doi:10.1093/biomet/70.1.41.

[Rosenbaum_1983_41.E2.80.9355-1] 1.0 ^1.1 ^1.2 ^1.3 Rosenbaum, Paul R.; Rubin, Donald B. (1983). "The Central Role of the Propensity Score in Observational Studies for Causal Effects". Biometrika. 70 (1): 41–55. doi:10.1093/biomet/70.1.41.

[1]