更改

匹配 (查看源代码)

2021年6月27日 (日) 11:03的版本

添加714字节、 2021年6月27日 (日) 11:03

无编辑摘要

第3行：第3行：

|description='通过在观察研究或准实验研究中比较已处理和未处理的单元，以评估处理的效果

}}

−

作为一种统计技术，'''匹配 Matching'''通过在'''观察研究 Observational Study'''或'''准实验研究 Quasi-experiment'''（即''' 处理 Treatment '''是非随机分配的）中比较已处理和未处理的单元，以评估处理的效果。匹配的目标是，对于每个处理单元，找到一个(或多个)具有相似可观察特征的未处理单元，以评估处理效果。通过处理单元与相似未处理单元的匹配，匹配技术可以比较处理单元与未处理单元的不同结果，从而评估处理效应，减少混杂效应带来的偏差。<ref>{{cite journal | doi=10.2307/2529684 | last=Rubin | first=Donald B. | title=Matching to Remove Bias in Observational Studies | journal=Biometrics | volume=29 | issue=1 | year=1973 | pages=159–183 | jstor=2529684}}</ref><ref>{{cite journal | title=On Stratification, Grouping and Matching | last=Anderson | first=Dallas W. |author2=Kish, Leslie |author3=Cornell, Richard G. | journal=Scandinavian Journal of Statistics | volume=7 | issue=2 | year=1980 | pages=61–66 | jstor=4615774}}</ref><ref>{{cite journal | doi=10.2307/2530417 | title=Matching in Epidemiologic Studies: Validity and Efficiency Considerations | last=Kupper | first=Lawrence L. |author2=Karon, John M. |author3=Kleinbaum, David G. |author4=Morgenstern, Hal |author5= Lewis, Donald K. | journal=Biometrics | volume=37 | issue=2 | year=1981 | pages=271–291 | jstor=2530417 | pmid=7272415| citeseerx=10.1.1.154.1197 }}</ref>''' 倾向值匹配 Propensity Score Matching'''，一种早期的匹配技术，是作为''' 鲁宾因果模型 Rubin Causal Model'''<ref name="Rosenbaum Rubin">{{cite journal |last1=Rosenbaum |first1=Paul R. |last2=Rubin |first2=Donald B. |title=The Central Role of the Propensity Score in Observational Studies for Causal Effects |journal=Biometrika |year=1983 |volume=70 |issue=1 |pages=41–55 |doi=10.1093/biomet/70.1.41 |doi-access=free }}</ref>的一部分发展起来的，但已被证明会增加模型依赖性、偏差、无效性和''' 计算量 power '''，与其他匹配方法相比不再推荐使用。<ref>{{Cite journal|last1=King|first1=Gary|last2=Nielsen|first2=Richard|date=October 2019|title=Why Propensity Scores Should Not Be Used for Matching|url=https://www.cambridge.org/core/product/identifier/S1047198719000111/type/journal_article|journal=Political Analysis|language=en|volume=27|issue=4|pages=435–454|doi=10.1017/pan.2019.11|issn=1047-1987|doi-access=free}}</ref>

+

作为一种统计技术，'''匹配 Matching'''通过在'''观察研究 Observational Study'''或'''准实验研究 Quasi-experiment'''（即''' 处理 Treatment '''是非随机分配的）中比较已处理和未处理的单元，以评估处理的效果。匹配的目标是，对于每个处理单元，找到一个(或多个)具有相似可观察特征的未处理单元，以评估处理效果。通过处理单元与相似未处理单元的匹配，匹配技术可以比较处理单元与未处理单元的不同结果，从而评估处理效应，减少混杂效应带来的偏差。<ref>{{cite journal | doi=10.2307/2529684 | last=Rubin | first=Donald B. | title=Matching to Remove Bias in Observational Studies | journal=Biometrics | volume=29 | issue=1 | year=1973 | pages=159–183 | jstor=2529684}}</ref><ref>{{cite journal | title=On Stratification, Grouping and Matching | last=Anderson | first=Dallas W. |author2=Kish, Leslie |author3=Cornell, Richard G. | journal=Scandinavian Journal of Statistics | volume=7 | issue=2 | year=1980 | pages=61–66 | jstor=4615774}}</ref><ref>{{cite journal | doi=10.2307/2530417 | title=Matching in Epidemiologic Studies: Validity and Efficiency Considerations | last=Kupper | first=Lawrence L. |author2=Karon, John M. |author3=Kleinbaum, David G. |author4=Morgenstern, Hal |author5= Lewis, Donald K. | journal=Biometrics | volume=37 | issue=2 | year=1981 | pages=271–291 | jstor=2530417 | pmid=7272415| citeseerx=10.1.1.154.1197 }}</ref>''' 倾向值匹配 Propensity Score Matching'''，一种早期的匹配技术，是作为'''鲁宾因果模型 Rubin Causal Model'''<ref name="Rosenbaum Rubin">{{cite journal |last1=Rosenbaum |first1=Paul R. |last2=Rubin |first2=Donald B. |title=The Central Role of the Propensity Score in Observational Studies for Causal Effects |journal=Biometrika |year=1983 |volume=70 |issue=1 |pages=41–55 |doi=10.1093/biomet/70.1.41 |doi-access=free }}</ref>的一部分发展起来的，但已被证明会增加模型依赖性、偏差、无效性和''' 计算量 power '''，与其他匹配方法相比不再推荐使用。<ref>{{Cite journal|last1=King|first1=Gary|last2=Nielsen|first2=Richard|date=October 2019|title=Why Propensity Scores Should Not Be Used for Matching|url=https://www.cambridge.org/core/product/identifier/S1047198719000111/type/journal_article|journal=Political Analysis|language=en|volume=27|issue=4|pages=435–454|doi=10.1017/pan.2019.11|issn=1047-1987|doi-access=free}}</ref>

−

匹配由''' 唐纳德•鲁宾 Donald Rubin '''<ref name="Rosenbaum Rubin" />~~推动，在经济学中主要受到''' 拉隆德 LaLonde'''（1986）~~<ref>{{cite journal | last = LaLonde | first = Robert J. | title = Evaluating the Econometric Evaluations of Training Programs with Experimental Data | journal = American Economic Review | volume = 76 | issue = 4 |year = 1986 | pages = 604–620 | jstor=1806062 }}</ref>的批评。LaLonde比较了一个实验中的处理效果估计和运用匹配方法产生的可比估计，表明匹配方法是有偏的。''' 德赫加和瓦巴 Dehejia and Wahba '''(1999)重新评估了LaLonde的批评，并指出匹配是一个很好的解决方案。<ref>{{cite journal | title = Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs |first1 = R. H. |last1=Dehejia |first2 = S. |last2=Wahba |journal=Journal of the American Statistical Association |year=1999 |volume=94 |issue=448 |pages=1053–1062 |doi=10.1080/01621459.1999.10473858 |url = http://www.nber.org/papers/w6586.pdf }}</ref>政治学<ref>{{cite journal |last1=Arceneaux |first1=Kevin |first2=Alan S. |last2=Gerber |first3=Donald P. |last3=Green |year=2006 |title=Comparing Experimental and Matching Methods Using a Large-Scale Field Experiment on Voter Mobilization |journal=Political Analysis |volume=14 |issue=1 |pages=37–62 |doi=10.1093/pan/mpj001 }}</ref>和社会学期刊<ref>{{cite journal |last1=Arceneaux |first1=Kevin |first2=Alan S. |last2=Gerber |first3=Donald P. |last3=Green |year=2010 |title=A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark |journal=Sociological Methods & Research |volume=39 |issue=2 |pages=256–282 |doi=10.1177/0049124110378098}}</ref>上也提出了类似的批评。

+

匹配由''' 唐纳德•鲁宾 Donald Rubin '''<ref name="Rosenbaum Rubin" />推动，在经济学中主要受到LaLonde<ref>{{cite journal | last = LaLonde | first = Robert J. | title = Evaluating the Econometric Evaluations of Training Programs with Experimental Data | journal = American Economic Review | volume = 76 | issue = 4 |year = 1986 | pages = 604–620 | jstor=1806062 }}</ref>的批评。LaLonde比较了一个实验中的处理效果估计和运用匹配方法产生的可比估计，表明匹配方法是有偏的。Dehejia和Wahba重新评估了LaLonde的批评，并指出匹配是一个很好的解决方案。<ref>{{cite journal | title = Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs |first1 = R. H. |last1=Dehejia |first2 = S. |last2=Wahba |journal=Journal of the American Statistical Association |year=1999 |volume=94 |issue=448 |pages=1053–1062 |doi=10.1080/01621459.1999.10473858 |url = http://www.nber.org/papers/w6586.pdf }}</ref>政治学<ref>{{cite journal |last1=Arceneaux |first1=Kevin |first2=Alan S. |last2=Gerber |first3=Donald P. |last3=Green |year=2006 |title=Comparing Experimental and Matching Methods Using a Large-Scale Field Experiment on Voter Mobilization |journal=Political Analysis |volume=14 |issue=1 |pages=37–62 |doi=10.1093/pan/mpj001 }}</ref>和社会学期刊<ref>{{cite journal |last1=Arceneaux |first1=Kevin |first2=Alan S. |last2=Gerber |first3=Donald P. |last3=Green |year=2010 |title=A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark |journal=Sociological Methods & Research |volume=39 |issue=2 |pages=256–282 |doi=10.1177/0049124110378098}}</ref>上也提出了类似的批评。

== 分析 ==

−

当感兴趣的结果是二元变量时，分析匹配数据最常用的工具是条件Logistic回归模型，因为它可以处理''' ~~任意大小的层次和连续或二元处理变量(自变量) strata~~ of arbitrary size and continuous or binary treatments (predictors)''' ，并且可以控制协变量。在特定情况下，可以使用''' 配对差异检验 paired difference test'''、 McNemar 检验和 Cochran-Mantel-Haenzel 检验等更简单的检验。

+

当感兴趣的结果是二元变量时，分析匹配数据最常用的工具是条件Logistic回归模型，因为它可以处理''' 任意大小的层次和连续或二元处理变量（自变量）strata of arbitrary size and continuous or binary treatments (predictors)''' ，并且可以控制协变量。在特定情况下，可以使用''' 配对差异检验 paired difference test'''、 McNemar 检验和 Cochran-Mantel-Haenzel 检验等更简单的检验。

−

当感兴趣的结果是连续的，对'''~~平均处理效~~ Average Treatment Effect '''~~应进行估计。~~

+

当感兴趣的结果是连续的，对'''平均处理效应 Average Treatment Effect '''进行估计。

第21行：第21行：

== 过匹配 ==

−

过匹配是对表面是中介变量、实际上是暴露的结果进行匹配。如果中介变量本身是分层的，则很可能引致一种暴露与疾病的令人费解的关系。<ref name=marsh/> 过匹配因此导致统计偏误。<ref name=marsh>{{cite journal |title=Removal of radiation dose response effects: an example of over-matching |last1=Marsh |first1=J. L. |last2=Hutton |first2=J. L. |last3=Binks |first3=K. |year=2002 |journal=British Medical Journal |volume=325 |issue=7359 |pages=327–330 |pmid=12169512 |doi=10.1136/bmj.325.7359.327 |pmc=1123834}}</ref>

+

过匹配是对表面是中介变量、实际上是暴露的结果进行匹配。如果中介变量本身是分层的，则很可能引致一种暴露与疾病的令人费解的关系。<ref name=marsh/>过匹配因此导致统计偏误。<ref name=marsh>{{cite journal |title=Removal of radiation dose response effects: an example of over-matching |last1=Marsh |first1=J. L. |last2=Hutton |first2=J. L. |last3=Binks |first3=K. |year=2002 |journal=British Medical Journal |volume=325 |issue=7359 |pages=327–330 |pmid=12169512 |doi=10.1136/bmj.325.7359.327 |pmc=1123834}}</ref>

第32行：第32行：

== 另见 ==

* [[倾向得分匹配]]

−

==参考文献 ==

−

第47行：第44行：

==编者推荐==

+

===集智课程===

+

=====[https://campus.swarma.org/course/1858 计量经济学因果分析工具在快手中的应用]====

+

在产品迭代和公司决策中，我们常常关心A如何影响B，在不方便使用A/B实验的场景下，我们可以用因果分析的方法，结合观测数据来回答这个问题。主讲人会介绍在快手常用的因果分析计量经济学方法（包括工具变量法、匹配法、双重差分法、合成控制法等），因果分析和机器学习结合的前沿方法（矩阵补全法、基于决策树/随机森林的异质性的因果效应估计等），以及这些方法如何与业务实践相结合。

+

=====[]====

+

===集智文章===

+

*[https://swarma.org/?p=25679 Physics Reports最新综述：解决物理、生物、经济中的稳定匹配]

+

*[]

+

*[]

+

===相关链接===

+

薄荷

7,129

个编辑

更改

匹配 (查看源代码)

2021年6月27日 (日) 11:03的版本

导航菜单

搜索