来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索

模板:Expert needed

Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed.[1][2] The science of why things occur is called etiology. Causal inference is an example of causal reasoning.



定义 Inferring the cause of something has been described as:


  • "...reason[ing] to the conclusion that something is, or is likely to be, the cause of something else".[3]
  • "Identification of the cause or causes of a phenomenon, by establishing covariation of cause and effect, a time-order relationship with the cause preceding the effect, and the elimination of plausible alternative causes."[4]


方法 Epidemiological studies employ different epidemiological methods of collecting and measuring evidence of risk factors and effect and different ways of measuring association between the two. A hypothesis is formulated, and then tested with statistical methods. It is statistical inference that helps decide if data are due to chance, also called random variation, or indeed correlated and if so how strongly. However, correlation does not imply causation, so further methods must be used to infer causation.[citation needed]


Common frameworks for causal inference are structural equation modeling and the Rubin causal model.[citation needed]


In epidemiology

在流行病学中 Epidemiology studies patterns of health and disease in defined populations of living beings in order to infer causes and effects. An association between an exposure to a putative risk factor and a disease may be suggestive of, but is not equivalent to causality because correlation does not imply causation. Historically, Koch's postulates have been used since the 19th century to decide if a microorganism was the cause of a disease. In the 20th century the Bradford Hill criteria, described in 1965[5] have been used to assess causality of variables outside microbiology, although even these criteria are not exclusive ways to determine causality.


In molecular epidemiology the phenomena studied are on a molecular biology level, including genetics, where biomarkers are evidence of cause or effects.


A recent trend模板:When is to identify evidence for influence of the exposure on molecular pathology within diseased tissue or cells, in the emerging interdisciplinary field of molecular pathological epidemiology (MPE).模板:Third-party-inline Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. 模板:Third-party-inline Considering the inherent nature of heterogeneity of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and public health sciences, exemplified as personalized medicine and precision medicine.模板:Third-party-inline


In computer science

在计算机科学领域 Determination of cause and effect from joint observational data for two time-independent variables, say X and Y, has been tackled using asymmetry between evidence for some model in the directions, X → Y and Y → X. The primary approaches are based on Algorithmic information theory models and noise models.[citation needed]

利用两个时间独立变量 x 和 y 的联合观测数据确定因果关系,用 x → y 和 y → x 方向上某些模型的证据不对称性处理了这一问题。主要的方法是基于算法信息论模型和噪声模型。

Algorithmic information models


Compare two programs, both of which output both X and Y.

比较两个同时输出 x 和 y 的程序。

  • Store Y and a compressed form of X in terms of uncompressed Y.
  • Store X and a compressed form of Y in terms of uncompressed X.

The shortest such program implies the uncompressed stored variable more-likely causes the computed variable.[6][7]


Noise models


Incorporate an independent noise term in the model to compare the evidences of the two directions.


Here are some of the noise models for the hypothesis Y → X with the noise E:

下面是一些假设 y → x 有噪声 e 的噪声模型:

  • Additive noise:[8] [math]\displaystyle{ Y = F(X)+E }[/math]
  • Linear noise:[9] [math]\displaystyle{ Y = pX + qE }[/math]
  • Post-non-linear:[10] [math]\displaystyle{ Y = G(F(X)+E) }[/math]
  • Heteroskedastic noise: [math]\displaystyle{ Y = F(X)+E.G(X) }[/math]
  • Functional noise:[11] [math]\displaystyle{ Y = F(X,E) }[/math]

The common assumption in these models are:


  • There are no other causes of Y.
  • X and E have no common causes.
  • Distribution of cause is independent from causal mechanisms.

On an intuitive level, the idea is that the factorization of the joint distribution P(Cause, Effect) into P(Cause)*P(Effect | Cause) typically yields models of lower total complexity than the factorization into P(Effect)*P(Cause | Effect). Although the notion of “complexity” is intuitively appealing, it is not obvious how it should be precisely defined.[11] A different family of methods attempt to discover causal "footprints" from large amounts of labeled data, and allow the prediction of more flexible causal relations.[12]

在直观的层面上,这个想法是联合分布 p (因果)到 p (因果) * p (效果 | 原因)的因式分解通常产生的模型的总复杂性低于因式分解到 p (效果) * p (因果)。尽管“复杂性”的概念在直觉上很吸引人,但是它应该如何精确定义却并不明显。

In statistics and economics


In statistics and economics, causality is often tested via regression analysis. Several methods can be used to distinguish actual causality from spurious correlations. First, economists constructing regression models establish the direction of causal relation based on economic theory (theory-driven econometrics). For example, if one studies the dependency between rainfall and the future price of a commodity, then theory (broadly construed) indicates that rainfall can influence prices, but futures prices cannot make changes to the amount of rain[13] . Second, the instrumental variables (IV) technique may be employed to remove any reverse causation by introducing a role for other variables (instruments) that are known to be unaffected by the dependent variable. Third, economists consider time precedence to choose appropriate model specification. Given that partial correlations are symmetrical, one cannot determine the direction of causal relation based on correlations only. Based on the notion of probabilistic view on causality, economists assume that causes must be prior in time than their effects. This leads to using the variables representing phenomena happening earlier as independent variables and developing econometric tests for causality (e.g., Granger-causality tests) applicable in time series analysis[14]. Fifth, other regressors are included to ensure that confounding variables are not causing a regressor to appear to be significant spuriously but, in the areas suffering from the problem of multicollinearity such as macroeconomics, it is in principle impossible to include all confounding factors and therefore econometric models are susceptible to the common-cause fallacy.[15]. Recently, the movement of design-based econometrics has popularized using natural experiments and quasi-experimental research designs to address the problem of spurious correlations.[16]

在统计学和经济学中,因果关系通常通过回归分析来检验。有几种方法可以用来区分实际的因果关系和虚假的相关性。首先,经济学家根据经济理论(理论驱动计量经济学)构建回归模型,确定因果关系的方向。例如,如果研究降雨量与商品未来价格之间的依赖关系,那么理论(广义解释)表明,降雨量可以影响价格,但期货价格不能改变降雨量。其次,工具变量(IV)技术可以用来消除任何反向因果关系,通过引入其他变量(工具)的作用,已知是不受因变量的影响。第三,经济学家考虑时间优先选择合适的模型规范。由于部分相关是对称的,人们不能确定方向的因果关系的基础上,只有相关性。基于对因果关系的概率观点,经济学家假设原因必须在时间上优先于它们的结果。这导致使用表示早期发生的现象的变量作为自变量,并开发适用于时间序列分析的因果关系检验(例如,格兰杰因果检验)的计量经济学检验。第五,包括其他回归因素是为了确保混杂变量不会导致回归因素出现明显的虚假性,但在遭受多重共线性问题困扰的领域,如宏观经济学,原则上不可能包括所有混杂因素,因此计量经济模型容易出现共因谬误。 .近年来,以设计为基础的计量经济学运动已经推广使用自然实验和准实验研究设计来解决虚假关联的问题。

In social science

在社会科学领域 The social sciences have moved increasingly toward a quantitative framework for assessing causality. Much of this has been described as a means of providing greater rigor to social science methodology. Political science was significantly influenced by the publication of Designing Social Inquiry, by Gary King, Robert Keohane, and Sidney Verba, in 1994. King, Keohane, and Verba (often abbreviated as KKV) recommended that researchers applying both quantitative and qualitative methods adopt the language of statistical inference to be clearer about their subjects of interest and units of analysis.[17][18] Proponents of quantitative methods have also increasingly adopted the potential outcomes framework, developed by Donald Rubin, as a standard for inferring causality.[citation needed]

社会科学越来越倾向于一个评估因果关系的定量框架。其中很大一部分被描述为一种提供更严密的社会科学方法论的手段。1994年,加里 · 金、罗伯特 · 基奥汉和西德尼 · 维尔巴合著的《设计社会探究》对政治科学产生了重大影响。King,Keohane,和 Verba (通常缩写为 KKV)建议研究人员应用定量和定性的方法采用推论统计学的语言来更清楚地说明他们感兴趣的主题和分析的单位。定量方法的支持者也越来越多地采用唐纳德 · 鲁宾开发的潜在结果框架作为推断因果关系的标准。

Debates over the appropriate application of quantitative methods to infer causality resulted in increased attention to the reproducibility of studies. Critics of widely-practiced methodologies argued that researchers have engaged in P hacking to publish articles on the basis of spurious correlations.[19] To prevent this, some have advocated that researchers preregister their research designs prior to conducting to their studies so that they do not inadvertently overemphasize a non-reproducible finding that was not the initial subject of inquiry but was found to be statistically significant during data analysis.[20] Internal debates about methodology and reproducibility within the social sciences have at times been acrimonious.[citation needed]

关于适当应用定量方法来推断因果关系的争论导致了对研究重复性的更多关注。对广泛使用的方法论持批评态度的人认为,研究人员利用 p 黑客技术,在虚假关联的基础上发表文章。为了防止这种情况,一些人主张研究人员在进行研究之前预先注册他们的研究设计,这样他们就不会无意中过分强调一项不可复制的发现,这项发现并非最初的调查对象,但在数据分析中被发现具有统计意义。社会科学内部关于方法论和可重现性的争论有时是尖刻的。

While much of the emphasis remains on statistical inference in the potential outcomes framework, social science methodologists have developed new tools to conduct causal inference with both qualitative and quantitative methods, sometimes called a “mixed methods” approach.[21][22] Advocates of diverse methodological approaches argue that different methodologies are better suited to different subjects of study. Sociologist Herbert Smith and Political Scientists James Mahoney and Gary Goertz have cited the observation of Paul Holland, a statistician and author of the 1986 article “Statistics and Causal Inference,” that statistical inference is most appropriate for assessing the “effects of causes” rather than the “causes of effects.”[23][24] Qualitative methodologists have argued that formalized models of causation, including process tracing and fuzzy set theory, provide opportunities to infer causation through the identification of critical factors within case studies or through a process of comparison among several case studies.[18] These methodologies are also valuable for subjects in which a limited number of potential observations or the presence of confounding variables would limit the applicability of statistical inference.[citation needed]

虽然在潜在结果框架中,大部分重点仍然放在推论统计学上,但社会科学方法论者已经开发出新的工具,用定性和定量的方法进行因果推理,有时被称为混合方法。不同方法论的支持者认为不同的方法论更适合不同的研究对象。社会学家 Herbert Smith 和政治学家 James Mahoney 和 Gary Goertz 引用了统计学家 Paul Holland 的观察结果,他在1986年发表了一篇名为《统计学和因果推断》的文章,认为推论统计学最适合于评估“原因的影响”而不是“影响的原因” 定性方法学家认为,形式化的因果关系模型,包括过程追踪和模糊集合理论,提供了推断因果关系的机会,通过在案例研究中识别关键因素或通过几个案例研究之间的比较过程。这些方法对于那些数量有限的潜在观察或混杂变量的存在会限制推论统计学的适用性的课题也是有价值的。

See also




  1. Pearl, Judea (1 January 2009). "Causal inference in statistics: An overview" (PDF). Statistics Surveys. 3: 96–146. doi:10.1214/09-SS057.
  2. Morgan, Stephen; Winship, Chris (2007). Counterfactuals and Causal inference. Cambridge University Press. ISBN 978-0-521-67193-4. 
  3. "causal inference". Encyclopædia Britannica, Inc. Retrieved 24 August 2014.
  4. John Shaughnessy; Eugene Zechmeister; Jeanne Zechmeister (2000). Research Methods in Psychology. McGraw-Hill Humanities/Social Sciences/Languages. pp. Chapter 1 : Introduction. ISBN 978-0077825362. http://www.mhhe.com/socscience/psychology/shaugh/ch01_concepts.html. Retrieved 24 August 2014. 
  5. Hill, Austin Bradford (1965). "The Environment and Disease: Association or Causation?". Proceedings of the Royal Society of Medicine. 58 (5): 295–300. doi:10.1177/003591576505800503. PMC 1898525. PMID 14283879.
  6. Kailash Budhathoki and Jilles Vreeken "Causal Inference by Compression" 2016 IEEE 16th International Conference on Data Mining (ICDM)
  7. Marx, Alexander; Vreeken, Jilles (2018). "Telling cause from effect by local and global regression". Knowledge and Information Systems. 60 (3): 1277–1305. doi:10.1007/s10115-018-1286-7.
  8. Hoyer, Patrik O., et al. "Nonlinear causal discovery with additive noise models." NIPS. Vol. 21. 2008.
  9. Shimizu, Shohei; et al. (2011). "DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model" (PDF). The Journal of Machine Learning Research. 12: 1225–1248.
  10. Zhang, Kun, and Aapo Hyvärinen. "On the identifiability of the post-nonlinear causal model." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.
  11. 11.0 11.1 Mooij, Joris M., et al. "Probabilistic latent variable models for distinguishing between cause and effect." NIPS. 2010.
  12. Lopez-Paz, David, et al. "Towards a learning theory of cause-effect inference" ICML. 2015
  13. Simon, Herbert (1977). Models of Discovery. Dordrecht: Springer. p. 52. 
  14. Maziarz, Mariusz (2020). The Philosophy of Causality in Economics: Causal Inferences and Policy Proposals. New York: Routledge. 
  15. Henschen, Tobias (2018). "The in-principle inconclusiveness of causal evidence in macroeconomics". European Journal for Philosophy of Science. 8: 709–733.
  16. Angrist Joshua & Pischke Jörn-Steffen (2008). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton: Princeton University Press. 
  17. King, Gary (2012). Designing social inquiry : scientific inference in qualitative research. Princeton Univ. Press. ISBN 978-0691034713. OCLC 754613241. 
  18. 18.0 18.1 Mahoney, James (January 2010). "After KKV". World Politics. 62 (1): 120–147. doi:10.1017/S0043887109990220. JSTOR 40646193.
  19. Dominus, Susan (18 October 2017). "When the Revolution Came for Amy Cuddy". The New York Times (in English). ISSN 0362-4331. Retrieved 2 March 2019.
  20. "The Statistical Crisis in Science". American Scientist (in English). 6 February 2017. Retrieved 18 April 2019.
  21. Creswell, John W.; Clark, Vicki L. Plano (2011) (in en). Designing and Conducting Mixed Methods Research. SAGE Publications. ISBN 9781412975179. https://books.google.com/books/about/Designing_and_Conducting_Mixed_Methods_R.html?id=YcdlPWPJRBcC. 
  22. Seawright, Jason (September 2016) (in en). Multi-Method Social Science by Jason Seawright. doi:10.1017/CBO9781316160831. ISBN 9781316160831. https://www.cambridge.org/core/books/multimethod-social-science/286C2742878FBCC6225E2F10D6095A0C. 
  23. Smith, Herbert L. (10 February 2014). "Effects of Causes and Causes of Effects: Some Remarks from the Sociological Side". Sociological Methods and Research. 43 (3): 406–415. doi:10.1177/0049124114521149. PMC 4251584. PMID 25477697.
  24. Goertz, Gary; Mahoney, James (2006). "A Tale of Two Cultures: Contrasting Quantitative and Qualitative Research". Political Analysis (in English). 14 (3): 227–249. doi:10.1093/pan/mpj017. ISSN 1047-1987.



External links

外部链接 模板:Commons category

模板:Portal bar

类别: 图形模型

类别: 回归分析

类别: 归纳推理

分类: 统计哲学

This page was moved from wikipedia:en:Causal inference. Its edit history can be viewed at 因果推断/edithistory

This page was moved from mywiki:zh-cn:因果推断. Its edit history can be viewed at 词条生产:因果推断/edithistory