“因果推断”的版本间的差异

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索
第54行: 第54行:
 
'''<font color = '#ff8000'>流行病学Epidemiology</font>'''研究特定生物群体的健康和疾病'''<font color = '#ff8000'>模式patterns</font>''',以推断原因和结果。'''<font color = '#ff8000'>暴露exposure </font>'''于一般认为的'''<font color = '#ff8000'>危险因素risk factor</font>'''和疾病之间的联系可能被提出,但不等同于确认因果关系,因为相关不意味着因果。从历史上看,'''<font color = '#ff8000'>科赫法则 Koch's postulates</font>'''从19世纪开始就被用来判断一种微生物是否是一种疾病的原因。在20世纪,'''<font color = '#ff8000'>布拉德福德·希尔准则Bradford Hill criteria</font>'''(参见Bradford Hill 1965年的文章<ref name="bh65"></ref>中)已经被用来评估微生物学之外的变量的因果关系,尽管即使是这些标准也不是确定因果关系的唯一方法。
 
'''<font color = '#ff8000'>流行病学Epidemiology</font>'''研究特定生物群体的健康和疾病'''<font color = '#ff8000'>模式patterns</font>''',以推断原因和结果。'''<font color = '#ff8000'>暴露exposure </font>'''于一般认为的'''<font color = '#ff8000'>危险因素risk factor</font>'''和疾病之间的联系可能被提出,但不等同于确认因果关系,因为相关不意味着因果。从历史上看,'''<font color = '#ff8000'>科赫法则 Koch's postulates</font>'''从19世纪开始就被用来判断一种微生物是否是一种疾病的原因。在20世纪,'''<font color = '#ff8000'>布拉德福德·希尔准则Bradford Hill criteria</font>'''(参见Bradford Hill 1965年的文章<ref name="bh65"></ref>中)已经被用来评估微生物学之外的变量的因果关系,尽管即使是这些标准也不是确定因果关系的唯一方法。
  
 +
  --ZC(讨论)  【审校】“的联系可能被提出,但不等同于确认因果关系”一句改为,“可能存在关联,但不等于确定因果关系”
 +
  --ZC(讨论)  【审校】“从19世纪开始”将此短语提至“科赫法则”之前。
 +
  --ZC(讨论)  【审校】“疾病的原因”一词改为“疾病病因”
 +
  --ZC(讨论)  【审校】“尽管即使是这些标准也不是确定因果关系的唯一方法”一句中去掉“即使”
  
 
In [[molecular epidemiology]] the phenomena studied are on a [[molecular biology]] level, including genetics, where [[biomarkers]] are evidence of cause or effects.
 
In [[molecular epidemiology]] the phenomena studied are on a [[molecular biology]] level, including genetics, where [[biomarkers]] are evidence of cause or effects.
第59行: 第63行:
 
'''<font color = '#ff8000'>分子流行病学molecular epidemiology</font>'''中研究的现象是在'''<font color = '#ff8000'>分子生物学</font>'''水平上的,也涵盖了遗传学。而遗传学中的'''<font color = '#ff8000'>生物标志物biomarkers</font>'''就是原因或结果的证据。
 
'''<font color = '#ff8000'>分子流行病学molecular epidemiology</font>'''中研究的现象是在'''<font color = '#ff8000'>分子生物学</font>'''水平上的,也涵盖了遗传学。而遗传学中的'''<font color = '#ff8000'>生物标志物biomarkers</font>'''就是原因或结果的证据。
  
 +
  --ZC(讨论)  【审校】“分子生物学”一词改成“生物分子”
 +
  --ZC(讨论)  【审校】“水平上的,也涵盖了遗传学。而遗传学中的”一句改为“水平上也涵盖了遗传学,而其中”
  
 
A recent trend{{when|date=August 2014}} is to identify [[evidence]] for influence of the exposure on [[molecular pathology]] within diseased [[Tissue (biology)|tissue]] or cells, in the emerging interdisciplinary field of [[molecular pathological epidemiology]] (MPE).{{third-party-inline|date=August 2014}} Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. {{third-party-inline|date=August 2014}} Considering the inherent nature of [[heterogeneity]] of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and [[public health]] sciences, exemplified as [[personalized medicine]] and [[precision medicine]].{{third-party-inline|date=August 2014}}
 
A recent trend{{when|date=August 2014}} is to identify [[evidence]] for influence of the exposure on [[molecular pathology]] within diseased [[Tissue (biology)|tissue]] or cells, in the emerging interdisciplinary field of [[molecular pathological epidemiology]] (MPE).{{third-party-inline|date=August 2014}} Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. {{third-party-inline|date=August 2014}} Considering the inherent nature of [[heterogeneity]] of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and [[public health]] sciences, exemplified as [[personalized medicine]] and [[precision medicine]].{{third-party-inline|date=August 2014}}
第64行: 第70行:
 
在新兴的'''<font color = '#ff8000'>分子病理流行病学molecular pathological epidemiology (MPE)</font>'''交叉学科领域,最近的一个趋势(截至2014年)是确定在病变组织或细胞内,'''<font color = '#ff8000'>暴露exposure </font>'''在'''<font color = '#ff8000'>分子病理学molecular pathology</font>'''上的影响。将暴露与疾病的分子病理特征联系起来可以帮助因果关系的评估。鉴于给定疾病的'''<font color = '#ff8000'>异质性</font>'''的固有特征、独特的疾病原理等,疾病'''<font color = '#ff8000'>表现型phenotyping</font>'''和'''<font color = '#ff8000'>亚型subtyping </font>'''现在是生物医学和'''<font color = '#ff8000'>公共卫生科学public health</font>'''的趋势,例证包括'''<font color = '#ff8000'>个体化医学personalized medicine</font>'''和'''<font color = '#ff8000'>精准医学precision medicine</font>'''等。
 
在新兴的'''<font color = '#ff8000'>分子病理流行病学molecular pathological epidemiology (MPE)</font>'''交叉学科领域,最近的一个趋势(截至2014年)是确定在病变组织或细胞内,'''<font color = '#ff8000'>暴露exposure </font>'''在'''<font color = '#ff8000'>分子病理学molecular pathology</font>'''上的影响。将暴露与疾病的分子病理特征联系起来可以帮助因果关系的评估。鉴于给定疾病的'''<font color = '#ff8000'>异质性</font>'''的固有特征、独特的疾病原理等,疾病'''<font color = '#ff8000'>表现型phenotyping</font>'''和'''<font color = '#ff8000'>亚型subtyping </font>'''现在是生物医学和'''<font color = '#ff8000'>公共卫生科学public health</font>'''的趋势,例证包括'''<font color = '#ff8000'>个体化医学personalized medicine</font>'''和'''<font color = '#ff8000'>精准医学precision medicine</font>'''等。
  
 
+
  --ZC(讨论)  【审校】“暴露”改为“暴露迹象”
 +
  --ZC(讨论)  【审校】“上的影响”改为“影响的证据”
 +
  --ZC(讨论)  【审校】“将暴露”改为“将暴露迹象”
 +
  --ZC(讨论)  【审校】“因果关系的评估”改为“评估因果关系”
 +
  --ZC(讨论)  【审校】“给定疾病的”改为“特定疾病”
 +
  --ZC(讨论)  【审校】“独特的疾病原理等”改为“罕见的病理”
 +
  --ZC(讨论)  【审校】“现在是生物医学和”改为“是生物医学和”
 +
  --ZC(讨论)  【审校】“个体化医学”改为“私人医疗”
 +
  --ZC(讨论)  【审校】“精准医学”改为“精密医学”
 
==In computer science在计算机科学领域==
 
==In computer science在计算机科学领域==
  
第71行: 第85行:
  
 
确定两个时间独立变量:X 和 Y 的联合观测数据因果关系的问题已经被解决了,方法是利用 X → Y 和 Y → X 方向上某些模型的证据不对称性。主要的方法基于'''<font color = '#ff8000'>算法信息理论Algorithmic information theory</font>'''模型和'''<font color = '#ff8000'>噪声模型noise models</font>'''。
 
确定两个时间独立变量:X 和 Y 的联合观测数据因果关系的问题已经被解决了,方法是利用 X → Y 和 Y → X 方向上某些模型的证据不对称性。主要的方法基于'''<font color = '#ff8000'>算法信息理论Algorithmic information theory</font>'''模型和'''<font color = '#ff8000'>噪声模型noise models</font>'''。
 +
 +
  --ZC(讨论)  【审校】“确定两个时间独立变量:X 和 Y 的联合观测数据因果关系的问题 已经被解决了,方法是利用 X → Y 和 Y → X 方向上某些模型的证据不对称性。”改为“由两个时间自变量,例如X,Y;联合观测数据来确定因果关系。已经被使用一些模型方向上证据的不对称性给解决了,例如:X → Y 和 Y → X 。”
  
  
第86行: 第102行:
  
 
* 用未压缩的 X 来存储 X 和压缩形式的 Y 。
 
* 用未压缩的 X 来存储 X 和压缩形式的 Y 。
 +
 +
  --ZC(讨论)  【审校】“用未压缩的 X 来存储 X 和压缩形式的 Y ”改为“根据未压缩的Y存储Y和X的压缩型,根据未压缩的X存储X和Y的压缩型”
  
 
The shortest such program implies the uncompressed stored variable more-likely causes the computed variable.<ref>Kailash Budhathoki and Jilles Vreeken "[http://eda.mmci.uni-saarland.de/pubs/2016/origo-budhathoki,vreeken.pdf Causal Inference by Compression]" 2016 IEEE 16th International Conference on Data Mining (ICDM)</ref><ref>{{Cite journal |doi = 10.1007/s10115-018-1286-7|title = Telling cause from effect by local and global regression|journal = Knowledge and Information Systems|year = 2018|last1 = Marx|first1 = Alexander|last2 = Vreeken|first2 = Jilles|volume=60|issue = 3|pages=1277–1305|doi-access = free}}</ref>
 
The shortest such program implies the uncompressed stored variable more-likely causes the computed variable.<ref>Kailash Budhathoki and Jilles Vreeken "[http://eda.mmci.uni-saarland.de/pubs/2016/origo-budhathoki,vreeken.pdf Causal Inference by Compression]" 2016 IEEE 16th International Conference on Data Mining (ICDM)</ref><ref>{{Cite journal |doi = 10.1007/s10115-018-1286-7|title = Telling cause from effect by local and global regression|journal = Knowledge and Information Systems|year = 2018|last1 = Marx|first1 = Alexander|last2 = Vreeken|first2 = Jilles|volume=60|issue = 3|pages=1277–1305|doi-access = free}}</ref>
  
 
最短的这种程序意味着,更有可能是未压缩的'''<font color='#ff8000>存储变量stored variable</font>'''导致了'''<font color='#ff8000>计算出的变量computed variable</font>'''。
 
最短的这种程序意味着,更有可能是未压缩的'''<font color='#ff8000>存储变量stored variable</font>'''导致了'''<font color='#ff8000>计算出的变量computed variable</font>'''。
 +
 +
 +
  --ZC(讨论)  【审校】“最短的这种程序意味着”改为“最短的此类程序表明”
 +
  --ZC(讨论)  【审校】“计算出的变量”改为“计算变量”
  
 
===Noise models噪音模型===
 
===Noise models噪音模型===
第135行: 第157行:
  
 
在直观的层面上,这个想法是联合分布P(Cause, Effect) 到 P(Cause)*P(Effect | Cause)的因式分解通常产生的模型的总'''<font color='#ff8000'>复杂性complexity </font>'''低于到P(Effect)*P(Cause | Effect)的因式分解。尽管“复杂性”的概念在直觉上很吸引人,但是对于如何定义它却并不显而易见。另一种不同类族的方法尝试从大量标签过的数据中发现因果的“足迹”,并且允许预测更灵活的因果关系。
 
在直观的层面上,这个想法是联合分布P(Cause, Effect) 到 P(Cause)*P(Effect | Cause)的因式分解通常产生的模型的总'''<font color='#ff8000'>复杂性complexity </font>'''低于到P(Effect)*P(Cause | Effect)的因式分解。尽管“复杂性”的概念在直觉上很吸引人,但是对于如何定义它却并不显而易见。另一种不同类族的方法尝试从大量标签过的数据中发现因果的“足迹”,并且允许预测更灵活的因果关系。
 +
  --ZC(讨论)  【审校】“这个想法是联合分布P(Cause, Effect) 到 P(Cause)*P(Effect | Cause)的因式分解通常产生的模型的总”改为“联合分布P(起因,结果)到P(起因)*P(结果|起因)拆分的主意通常产生模型的总”
 +
  --ZC(讨论)  【审校】“低于到P(Effect)*P(Cause | Effect)的因式分解”改为“低于将P(起因,结果)到P(结果)*P(起因|结果)的拆分”
 +
  --ZC(讨论)  【审校】“但是对于如何定义”改为“但对于应该如何精确定义”
 +
  --ZC(讨论)  【审校】“标签过”改为“被标记”
  
 
== In statistics and economics 在统计学和经济学领域==
 
== In statistics and economics 在统计学和经济学领域==

2020年9月18日 (五) 03:50的版本

模板:Expert needed


Causal inference is the process of drawing a conclusion about a causal connection based on the conditions of the occurrence of an effect. The main difference between causal inference and inference of association is that the former analyzes the response of the effect variable when the cause is changed.[1][2] The science of why things occur is called etiology. Causal inference is an example of causal reasoning.

因果推断Causal inference是根据某一效应发生的条件得出关于因果关系的结论的过程。因果推断与关联推理inference of association的主要区别在于前者分析在原因变化时结果变量的反应。研究事情为什么发生的科学叫做病因学etiology。因果推断是因果推理causal reasoning的一个例子。

 --ZC(讨论)  【审校】“是根据某一效应发生的条件得出关于因果关系的结论的过程。”一句中的“根据”改为“基于”,“效应”改为“事件”,“因果关系的结论”改为“因果联系结论”
 --ZC(讨论)  【审校】“的主要区别在于前者分析在原因变化时结果变量的反应。”一句中“在原因变化时”改为“了当原因改变时”,“反应”改为“响应”
 --ZC(讨论)  【审校】“研究事情为什么发生的科学叫做”一句改为“研究事件因何而起的科学被称为”

Definition定义

Inferring the cause of something has been described as:

对某事原因的推断过程已经被描述为是:Inferring the cause of something has been described as:

 --嘉树讨论) inferring是ing形式,因此译为“推断的过程”,存疑
 --ZC(讨论)  【审校】“对某事原因的推断过程已经被描述为是:”一句改为“推断某事原因的过程被描述为:”
  • "...reason[ing] to the conclusion that something is, or is likely to be, the cause of something else".[3]
  • “推理得出某事是(或可能是)其他事情的原因这一结论的过程。”
 --ZC(讨论)  【审校】“推理得出某事是(或可能是)其他事情的原因这一结论的过程。”一句改为“某事是(或可能是)其他事情的原因这一结论。”
  • "Identification of the cause or causes of a phenomenon, by establishing covariation of cause and effect, a time-order relationship with the cause preceding the effect, and the elimination of plausible alternative causes."[4]
  • “通过建立因果的共变关系、建立原因先于结果的时间顺序关系,以及消除其他可能的替代原因的过程,从而对现象的一个或多个原因进行确定的过程。”
 --ZC(讨论)  【审校】“通过建立因果的共变关系、建立原因先于结果的时间顺序关系,以及消除其他可能的替代原因的过程,从而对现象的一个或多个原因进行确定的过程。”一句改为“通过建立因果的共变模型,前因后果的时间顺序联系,以及消除其他可能的替代原因,从而对现象的一个或多个原因进行确认。”

Methods 方法

Epidemiological studies employ different epidemiological methods of collecting and measuring evidence of risk factors and effect and different ways of measuring association between the two. A hypothesis is formulated, and then tested with statistical methods. It is statistical inference that helps decide if data are due to chance, also called random variation, or indeed correlated and if so how strongly. However, correlation does not imply causation, so further methods must be used to infer causation.[citation needed]

流行病学epidemiological收集和衡量危险因素和结果,以及衡量危险因素和结果之间关系的方法与其他学科不同。一个假设提出来以后用统计学假设检验Statistical hypothesis testing。这种统计学推断statistical inference 有助于判断数据是由偶然性引起的,也就是随机变异random variation,还是确实存在相关性,以及相关性有多强。然而,相关不意味着因果,因此必须进一步使用其他方法来推断因果关系。

 --ZC(讨论)  【审校】“收集和衡量危险因素和结果,以及衡量危险因素和结果之间关系的方法与其他学科不同。一个假设提出来以后用”一句改为“运用不同的流行病模型来收集和衡量危险因素和结果,以及判定两者之间的联系”
 --ZC(讨论)  【审校】“一个假设提出来以后”一句改为“一个假说被提出来后”
 --ZC(讨论)  【审校】“是由偶然性引起的,也就是”一句改为“是否由偶然性引起,也被称为”
 --ZC(讨论)  【审校】“还是确实存在相关性,以及相关性有多强”一句改为“还是确实相关以及相关性的强弱”
 --ZC(讨论)  【审校】“因此必须进一步使用其他方法来推断因果关系”一句改为“必须使用其他方法来推断其因果关系”

Common frameworks for causal inference are structural equation modeling and the Rubin causal model.[citation needed]

因果推断的常见框架有结构方程模型structural equation modelingRubin因果模型 Rubin causal model

 --ZC(讨论)  【审校】“因果推断的常见框架”一句改为“常见的因果推断框架有”

In epidemiology 在流行病学中

Epidemiology studies patterns of health and disease in defined populations of living beings in order to infer causes and effects. An association between an exposure to a putative risk factor and a disease may be suggestive of, but is not equivalent to causality because correlation does not imply causation. Historically, Koch's postulates have been used since the 19th century to decide if a microorganism was the cause of a disease. In the 20th century the Bradford Hill criteria, described in 1965[5] have been used to assess causality of variables outside microbiology, although even these criteria are not exclusive ways to determine causality.

流行病学Epidemiology研究特定生物群体的健康和疾病模式patterns,以推断原因和结果。暴露exposure 于一般认为的危险因素risk factor和疾病之间的联系可能被提出,但不等同于确认因果关系,因为相关不意味着因果。从历史上看,科赫法则 Koch's postulates从19世纪开始就被用来判断一种微生物是否是一种疾病的原因。在20世纪,布拉德福德·希尔准则Bradford Hill criteria(参见Bradford Hill 1965年的文章[5]中)已经被用来评估微生物学之外的变量的因果关系,尽管即使是这些标准也不是确定因果关系的唯一方法。

 --ZC(讨论)  【审校】“的联系可能被提出,但不等同于确认因果关系”一句改为,“可能存在关联,但不等于确定因果关系”
 --ZC(讨论)  【审校】“从19世纪开始”将此短语提至“科赫法则”之前。
 --ZC(讨论)  【审校】“疾病的原因”一词改为“疾病病因”
 --ZC(讨论)  【审校】“尽管即使是这些标准也不是确定因果关系的唯一方法”一句中去掉“即使”

In molecular epidemiology the phenomena studied are on a molecular biology level, including genetics, where biomarkers are evidence of cause or effects.

分子流行病学molecular epidemiology中研究的现象是在分子生物学水平上的,也涵盖了遗传学。而遗传学中的生物标志物biomarkers就是原因或结果的证据。

 --ZC(讨论)  【审校】“分子生物学”一词改成“生物分子”
 --ZC(讨论)  【审校】“水平上的,也涵盖了遗传学。而遗传学中的”一句改为“水平上也涵盖了遗传学,而其中”

A recent trend模板:When is to identify evidence for influence of the exposure on molecular pathology within diseased tissue or cells, in the emerging interdisciplinary field of molecular pathological epidemiology (MPE).模板:Third-party-inline Linking the exposure to molecular pathologic signatures of the disease can help to assess causality. 模板:Third-party-inline Considering the inherent nature of heterogeneity of a given disease, the unique disease principle, disease phenotyping and subtyping are trends in biomedical and public health sciences, exemplified as personalized medicine and precision medicine.模板:Third-party-inline

在新兴的分子病理流行病学molecular pathological epidemiology (MPE)交叉学科领域,最近的一个趋势(截至2014年)是确定在病变组织或细胞内,暴露exposure 分子病理学molecular pathology上的影响。将暴露与疾病的分子病理特征联系起来可以帮助因果关系的评估。鉴于给定疾病的异质性的固有特征、独特的疾病原理等,疾病表现型phenotyping亚型subtyping 现在是生物医学和公共卫生科学public health的趋势,例证包括个体化医学personalized medicine精准医学precision medicine等。

 --ZC(讨论)  【审校】“暴露”改为“暴露迹象”
 --ZC(讨论)  【审校】“上的影响”改为“影响的证据”
 --ZC(讨论)  【审校】“将暴露”改为“将暴露迹象”
 --ZC(讨论)  【审校】“因果关系的评估”改为“评估因果关系”
 --ZC(讨论)  【审校】“给定疾病的”改为“特定疾病”
 --ZC(讨论)  【审校】“独特的疾病原理等”改为“罕见的病理”
 --ZC(讨论)  【审校】“现在是生物医学和”改为“是生物医学和”
 --ZC(讨论)  【审校】“个体化医学”改为“私人医疗”
 --ZC(讨论)  【审校】“精准医学”改为“精密医学”

In computer science在计算机科学领域

Determination of cause and effect from joint observational data for two time-independent variables, say X and Y, has been tackled using asymmetry between evidence for some model in the directions, X → Y and Y → X. The primary approaches are based on Algorithmic information theory models and noise models.[citation needed]

确定两个时间独立变量:X 和 Y 的联合观测数据因果关系的问题已经被解决了,方法是利用 X → Y 和 Y → X 方向上某些模型的证据不对称性。主要的方法基于算法信息理论Algorithmic information theory模型和噪声模型noise models

 --ZC(讨论)  【审校】“确定两个时间独立变量:X 和 Y 的联合观测数据因果关系的问题 已经被解决了,方法是利用 X → Y 和 Y → X 方向上某些模型的证据不对称性。”改为“由两个时间自变量,例如X,Y;联合观测数据来确定因果关系。已经被使用一些模型方向上证据的不对称性给解决了,例如:X → Y 和 Y → X 。”


Algorithmic information models算法信息模型

Compare two programs, both of which output both X and Y.

比较两个同时输出 X 和 Y 的程序。

  • Store Y and a compressed form of X in terms of uncompressed Y.
  • Store X and a compressed form of Y in terms of uncompressed X.
  • 用未压缩的 X 来存储 X 和压缩形式的 Y 。
 --ZC(讨论)  【审校】“用未压缩的 X 来存储 X 和压缩形式的 Y ”改为“根据未压缩的Y存储Y和X的压缩型,根据未压缩的X存储X和Y的压缩型”

The shortest such program implies the uncompressed stored variable more-likely causes the computed variable.[6][7]

最短的这种程序意味着,更有可能是未压缩的存储变量stored variable导致了计算出的变量computed variable


 --ZC(讨论)  【审校】“最短的这种程序意味着”改为“最短的此类程序表明”
 --ZC(讨论)  【审校】“计算出的变量”改为“计算变量”

Noise models噪音模型

Incorporate an independent noise term in the model to compare the evidences of the two directions.

在模型中引入一个独立的噪声项,以比较两个方向的证据。


Here are some of the noise models for the hypothesis Y → X with the noise E:

下面是一些支持 Y → X 假设且具有噪声 E 的噪声模型:

  • Additive noise:[8] [math]\displaystyle{ Y = F(X)+E }[/math]
  • 加法噪音Additive noise
  • Linear noise:[9] [math]\displaystyle{ Y = pX + qE }[/math]
  • 线性噪音Linear noise
  • Post-non-linear:[10] [math]\displaystyle{ Y = G(F(X)+E) }[/math]
  • 后非线性Post-non-linear(噪音)
  • Heteroskedastic noise: [math]\displaystyle{ Y = F(X)+E.G(X) }[/math]
  • 异方差噪音Heteroskedastic noise
  • Functional noise:[11] [math]\displaystyle{ Y = F(X,E) }[/math]
  • 功能性噪音Functional noise


The common assumption in these models are:

这些模型的共同假设是:

  • There are no other causes of Y.
  • Y 没有其他原因。
  • X and E have no common causes.
  • X 和 E 没有共同的原因。
  • Distribution of cause is independent from causal mechanisms.
  • 原因的分布独立于因果机制。

On an intuitive level, the idea is that the factorization of the joint distribution P(Cause, Effect) into P(Cause)*P(Effect | Cause) typically yields models of lower total complexity than the factorization into P(Effect)*P(Cause | Effect). Although the notion of “complexity” is intuitively appealing, it is not obvious how it should be precisely defined.[11] A different family of methods attempt to discover causal "footprints" from large amounts of labeled data, and allow the prediction of more flexible causal relations.[12]

在直观的层面上,这个想法是联合分布P(Cause, Effect) 到 P(Cause)*P(Effect | Cause)的因式分解通常产生的模型的总复杂性complexity 低于到P(Effect)*P(Cause | Effect)的因式分解。尽管“复杂性”的概念在直觉上很吸引人,但是对于如何定义它却并不显而易见。另一种不同类族的方法尝试从大量标签过的数据中发现因果的“足迹”,并且允许预测更灵活的因果关系。

 --ZC(讨论)  【审校】“这个想法是联合分布P(Cause, Effect) 到 P(Cause)*P(Effect | Cause)的因式分解通常产生的模型的总”改为“联合分布P(起因,结果)到P(起因)*P(结果|起因)拆分的主意通常产生模型的总”
 --ZC(讨论)  【审校】“低于到P(Effect)*P(Cause | Effect)的因式分解”改为“低于将P(起因,结果)到P(结果)*P(起因|结果)的拆分”
 --ZC(讨论)  【审校】“但是对于如何定义”改为“但对于应该如何精确定义”
 --ZC(讨论)  【审校】“标签过”改为“被标记”

In statistics and economics 在统计学和经济学领域

 --嘉树讨论) 本段似有错漏,英文词条和本段内容有一些不同。因此,(1)先写本段的翻译,(2)再写英文词条中有不同的地方,(3)最后整合一下。【】括起来的是在两个版本中需要照应和修改的部分。

In statistics and economics, causality is often tested via regression analysis. Several methods can be used to distinguish actual causality from spurious correlations. First, economists constructing regression models establish the direction of causal relation based on economic theory (theory-driven econometrics). For example, if one studies the dependency between rainfall and the future price of a commodity, then theory (broadly construed) indicates that rainfall can influence prices, but futures prices cannot make changes to the amount of rain[13] . Second, the instrumental variables (IV) technique may be employed to remove any reverse causation by introducing a role for other variables (instruments) that are known to be unaffected by the dependent variable. Third, economists consider time precedence to choose appropriate model specification. Given that partial correlations are symmetrical, one cannot determine the direction of causal relation based on correlations only. Based on the notion of probabilistic view on causality, economists assume that causes must be prior in time than their effects. This leads to using the variables representing phenomena happening earlier as independent variables and developing econometric tests for causality (e.g., Granger-causality tests) applicable in time series analysis[14]. Fifth, other regressors are included to ensure that confounding variables are not causing a regressor to appear to be significant spuriously but, in the areas suffering from the problem of multicollinearity such as macroeconomics, it is in principle impossible to include all confounding factors and therefore econometric models are susceptible to the common-cause fallacy.[15]. Recently, the movement of design-based econometrics has popularized using natural experiments and quasi-experimental research designs to address the problem of spurious correlations.[16]

在统计学和经济学中,因果关系通常通过回归分析来检验。有几种方法可以用来区分真实的因果关系和虚假的相关性。第一,经济学家根据经济理论(理论驱动theory-driven的计量经济学)构建回归模型,从而确定因果关系的方向。 例如,如果研究降雨与未来商品价格之间的依赖关系,那么一个广义上建构的理论表明,降雨可以影响价格,但未来价格不能改变降雨量。[17] . 第二,工具变量instrumental variables(IV)技术可以通过引入其他已知不受因变量的影响的工具变量来消除任何反向因果关系。【第三,经济学家会考虑时间优先级来选择合适的具体模型。由于偏相关partial correlations是对称的,人们不能在相关的基础上确定因果关系的方向。经济学家基于因果关系的概率观点probabilistic view假设,原因必须在时间上优先于其结果。这导致经济学家使用较早发生的现象作为自变量,并开发适用于时间序列分析的因果关系检验计量经济方法(例如,格兰杰因果检验[18]。第五,有些做法包括了其他回归项,从而确保混淆变量confounding variables不会导致某个回归因素显得明显虚假,但遭受多重共线性multicollinearity问题困扰的领域,如宏观经济学,原则上不可能包括所有混淆因素,因此计量经济模型容易出现共因谬误common-cause fallacy[19]。近年来,以研究设计为基础的计量经济学design-based econometrics运动已经推广使用自然实验和准实验研究设计来解决虚假相关spurious correlations的问题[20]。】

 --嘉树讨论) 英文词条的第三和第四:

Third, the principle that effects cannot precede causes can be invoked, by including on the right side of the regression only variables that precede in time the dependent variable. Fourth, other regressors are included to ensure that confounding variables are not causing a regressor to spuriously appear to be significant. Correlation by coincidence, as opposed to correlation reflecting actual dependence of the underlying process, can be ruled out by using large samples and by performing cross validation to check that correlations are maintained on data that were not used in the regression.

【第三,通过在回归的右侧只包括时间上在因变量之前的变量,就可以使用结果不能优先于原因的原则。 第四,有些方法包括了其他回归因素,以确保混淆变量confounding variables不会导致一个回归项虚假地呈现显著。通过使用大样本和交叉验证检查在回归中未使用的数据上是否保持了相关性,可以排除巧合的,而不是反映实际依赖内在过程的相关性。】

 --嘉树讨论)整合后的翻译:

在统计学和经济学中,因果关系通常通过回归分析来检验。有几种方法可以用来区分真实的因果关系和虚假的相关性。第一,经济学家根据经济理论(理论驱动theory-driven的计量经济学)构建回归模型,从而确定因果关系的方向。例如,如果研究降雨与未来商品价格之间的依赖关系,那么一个广义上建构的理论表明,降雨可以影响价格,但未来价格不能改变降雨量。[21] . 第二,工具变量instrumental variables(IV)技术可以通过引入其他已知不受因变量的影响的工具变量来消除任何反向因果关系。第三,通过在回归的右侧只包括时间上在因变量之前的变量,就可以使用结果不能优先于原因的原则。由于偏相关partial correlations是对称的,人们不能在相关的基础上确定因果关系的方向。经济学家基于因果关系的概率观点probabilistic view假设,原因必须在时间上优先于其结果。这导致经济学家使用较早发生的现象作为自变量,并开发适用于时间序列分析的因果关系检验计量经济方法(例如,格兰杰因果检验[22]。第四,有些方法包括了其他回归因素,以确保混淆变量confounding variables不会导致一个回归项虚假地呈现显著。通过使用大样本和交叉验证检查在回归中未使用的数据上是否保持了相关性,可以排除巧合的,而不是反映实际依赖内在过程的相关性。但遭受多重共线性multicollinearity问题困扰的领域,如宏观经济学,原则上不可能包括所有混淆因素,因此计量经济模型容易出现共因谬误common-cause fallacy[23]。近年来,以研究设计为基础的计量经济学design-based econometrics运动已经推广使用自然实验和准实验研究设计来解决虚假相关spurious correlations的问题[24]

In social science 在社会科学领域

The social sciences have moved increasingly toward a quantitative framework for assessing causality. Much of this has been described as a means of providing greater rigor to social science methodology. Political science was significantly influenced by the publication of Designing Social Inquiry, by Gary King, Robert Keohane, and Sidney Verba, in 1994. King, Keohane, and Verba (often abbreviated as KKV) recommended that researchers applying both quantitative and qualitative methods adopt the language of statistical inference to be clearer about their subjects of interest and units of analysis.[25][26] Proponents of quantitative methods have also increasingly adopted the potential outcomes framework, developed by Donald Rubin, as a standard for inferring causality.[citation needed]

社会科学越来越倾向评估因果关系的定量框架。框架中的很大一部分已经被描述为一种提供更严格的社会科学方法social science methodology的方式。1994年,加里·金Gary King罗伯特 · 基奥汉Robert Keohane西德尼 · 维尔巴Sidney Verba合著的《设计社会学问卷Designing Social Inquiry》对政治科学产生了重大影响。金、基奥汉,和维尔巴(通常缩写为 KKV)建议研究人员同时采用定量和定性的方法,采用统计推论的语言来更清楚地说明他们感兴趣的主题和分析的单位。King, Keohane, and Verba (often abbreviated as KKV) recommended that researchers applying both quantitative and qualitative methods adopt the language of statistical inference to be clearer about their subjects of interest and units of analysis.[27][26]定量方法的支持者也越来越多地采用唐纳德 · 鲁宾Donald Rubin开发的潜在结果框架potential outcomes framework作为推断因果关系的标准。 Debates over the appropriate application of quantitative methods to infer causality resulted in increased attention to the reproducibility of studies. Critics of widely-practiced methodologies argued that researchers have engaged in P hacking to publish articles on the basis of spurious correlations.[28] To prevent this, some have advocated that researchers preregister their research designs prior to conducting to their studies so that they do not inadvertently overemphasize a non-reproducible finding that was not the initial subject of inquiry but was found to be statistically significant during data analysis.[29] Internal debates about methodology and reproducibility within the social sciences have at times been acrimonious.[citation needed]

合适地应用定量方法来推断因果关系的相关争论导致了对研究可重复性reproducibility的更多关注。对广泛被使用的方法持批评态度的人认为,研究人员利用数据挖掘Data dredgingp-hacking技术以在虚假相关的基础上发表文章[30]。为了避免这种情况的发生,一些人主张研究人员在进行研究之前预注册preregister他们的研究设计,这样他们就不会无意中过分强调一项不可复制的发现,这项发现并非最初的调查对象,但在数据分析中被发现具有统计意义[31]。社会科学内部关于方法论和可重复性的争论有时是尖锐的[citation needed]


While much of the emphasis remains on statistical inference in the potential outcomes framework, social science methodologists have developed new tools to conduct causal inference with both qualitative and quantitative methods, sometimes called a “mixed methods” approach.[32][33] Advocates of diverse methodological approaches argue that different methodologies are better suited to different subjects of study. Sociologist Herbert Smith and Political Scientists James Mahoney and Gary Goertz have cited the observation of Paul Holland, a statistician and author of the 1986 article “Statistics and Causal Inference,” that statistical inference is most appropriate for assessing the “effects of causes” rather than the “causes of effects.”[34][35] Qualitative methodologists have argued that formalized models of causation, including process tracing and fuzzy set theory, provide opportunities to infer causation through the identification of critical factors within case studies or through a process of comparison among several case studies.[26] These methodologies are also valuable for subjects in which a limited number of potential observations or the presence of confounding variables would limit the applicability of statistical inference.[citation needed]

虽然在潜在结果框架中大部分重点仍然放在统计推论上,但社会科学方法学家已经开发出新的定性和定量方法来进行因果推断,有时被称为混合方法mixed methods。多种不同方法的支持者认为它更适合不同学科的研究different subjects of study

 --嘉树讨论) 学科还是对象?

社会学家 Herbert Smith 和政治学家 James Mahoney 、 Gary Goertz 引用了统计学家 Paul Holland 的观察结果,Paul Holland在1986年发表了一篇名为《统计学和因果推断Statistics and Causal Inference》的文章,认为推论统计学最适合于评估“结果的原因”而不是“结果的原因”。定性方法学家认为,形式化的因果关系模型,包括过程追踪process tracing模糊集理论fuzzy set theory,通过在某个案例研究内识别关键因素或在几个案例研究之间比较的过程提供了推断因果关系的机会。这些方法对于那些可能的观察数量有限或存在混淆变量从而限制统计推论适用性的课题也是有价值的。

See also

参见


References

参考资料

  1. Pearl, Judea (1 January 2009). "Causal inference in statistics: An overview" (PDF). Statistics Surveys. 3: 96–146. doi:10.1214/09-SS057.
  2. Morgan, Stephen; Winship, Chris (2007). Counterfactuals and Causal inference. Cambridge University Press. ISBN 978-0-521-67193-4. 
  3. "causal inference". Encyclopædia Britannica, Inc. Retrieved 24 August 2014.
  4. John Shaughnessy; Eugene Zechmeister; Jeanne Zechmeister (2000). Research Methods in Psychology. McGraw-Hill Humanities/Social Sciences/Languages. pp. Chapter 1 : Introduction. ISBN 978-0077825362. http://www.mhhe.com/socscience/psychology/shaugh/ch01_concepts.html. Retrieved 24 August 2014. 
  5. 5.0 5.1 Hill, Austin Bradford (1965). "The Environment and Disease: Association or Causation?". Proceedings of the Royal Society of Medicine. 58 (5): 295–300. doi:10.1177/003591576505800503. PMC 1898525. PMID 14283879.
  6. Kailash Budhathoki and Jilles Vreeken "Causal Inference by Compression" 2016 IEEE 16th International Conference on Data Mining (ICDM)
  7. Marx, Alexander; Vreeken, Jilles (2018). "Telling cause from effect by local and global regression". Knowledge and Information Systems. 60 (3): 1277–1305. doi:10.1007/s10115-018-1286-7.
  8. Hoyer, Patrik O., et al. "Nonlinear causal discovery with additive noise models." NIPS. Vol. 21. 2008.
  9. Shimizu, Shohei; et al. (2011). "DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model" (PDF). The Journal of Machine Learning Research. 12: 1225–1248.
  10. Zhang, Kun, and Aapo Hyvärinen. "On the identifiability of the post-nonlinear causal model." Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. AUAI Press, 2009.
  11. 11.0 11.1 Mooij, Joris M., et al. "Probabilistic latent variable models for distinguishing between cause and effect." NIPS. 2010.
  12. Lopez-Paz, David, et al. "Towards a learning theory of cause-effect inference" ICML. 2015
  13. Simon, Herbert (1977). Models of Discovery. Dordrecht: Springer. p. 52. 
  14. Maziarz, Mariusz (2020). The Philosophy of Causality in Economics: Causal Inferences and Policy Proposals. New York: Routledge. 
  15. Henschen, Tobias (2018). "The in-principle inconclusiveness of causal evidence in macroeconomics". European Journal for Philosophy of Science. 8: 709–733.
  16. Angrist Joshua & Pischke Jörn-Steffen (2008). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton: Princeton University Press. 
  17. Simon, Herbert (1977). Models of Discovery. Dordrecht: Springer. p. 52. 
  18. Maziarz, Mariusz (2020). The Philosophy of Causality in Economics: Causal Inferences and Policy Proposals. New York: Routledge. 
  19. Henschen, Tobias (2018). "The in-principle inconclusiveness of causal evidence in macroeconomics". European Journal for Philosophy of Science. 8: 709–733.
  20. Angrist Joshua & Pischke Jörn-Steffen (2008). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton: Princeton University Press. 
  21. Simon, Herbert (1977). Models of Discovery. Dordrecht: Springer. p. 52. 
  22. Maziarz, Mariusz (2020). The Philosophy of Causality in Economics: Causal Inferences and Policy Proposals. New York: Routledge. 
  23. Henschen, Tobias (2018). "The in-principle inconclusiveness of causal evidence in macroeconomics". European Journal for Philosophy of Science. 8: 709–733.
  24. Angrist Joshua & Pischke Jörn-Steffen (2008). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton: Princeton University Press. 
  25. King, Gary (2012). Designing social inquiry : scientific inference in qualitative research. Princeton Univ. Press. ISBN 978-0691034713. OCLC 754613241. 
  26. 26.0 26.1 26.2 Mahoney, James (January 2010). "After KKV". World Politics. 62 (1): 120–147. doi:10.1017/S0043887109990220. JSTOR 40646193.
  27. King, Gary (2012). Designing social inquiry : scientific inference in qualitative research. Princeton Univ. Press. ISBN 978-0691034713. OCLC 754613241. 
  28. Dominus, Susan (18 October 2017). "When the Revolution Came for Amy Cuddy". The New York Times (in English). ISSN 0362-4331. Retrieved 2 March 2019.
  29. "The Statistical Crisis in Science". American Scientist (in English). 6 February 2017. Retrieved 18 April 2019.
  30. Dominus, Susan (18 October 2017). "When the Revolution Came for Amy Cuddy". The New York Times (in English). ISSN 0362-4331. Retrieved 2 March 2019.
  31. "The Statistical Crisis in Science". American Scientist (in English). 6 February 2017. Retrieved 18 April 2019.
  32. Creswell, John W.; Clark, Vicki L. Plano (2011) (in en). Designing and Conducting Mixed Methods Research. SAGE Publications. ISBN 9781412975179. https://books.google.com/books/about/Designing_and_Conducting_Mixed_Methods_R.html?id=YcdlPWPJRBcC. 
  33. Seawright, Jason (September 2016) (in en). Multi-Method Social Science by Jason Seawright. doi:10.1017/CBO9781316160831. ISBN 9781316160831. https://www.cambridge.org/core/books/multimethod-social-science/286C2742878FBCC6225E2F10D6095A0C. 
  34. Smith, Herbert L. (10 February 2014). "Effects of Causes and Causes of Effects: Some Remarks from the Sociological Side". Sociological Methods and Research. 43 (3): 406–415. doi:10.1177/0049124114521149. PMC 4251584. PMID 25477697.
  35. Goertz, Gary; Mahoney, James (2006). "A Tale of Two Cultures: Contrasting Quantitative and Qualitative Research". Political Analysis (in English). 14 (3): 227–249. doi:10.1093/pan/mpj017. ISSN 1047-1987.


Bibliography

参考书目


External links

外部链接 模板:Commons category


模板:Portal bar


编者推荐

集智课程推荐

社交网络中的因果推断 本课程中,将简要介绍一些基本的因果推断策略,并聚焦社会网络分析中的因果推断问题,涉及社会网络实验设计、固定样本、工具变量和敏感度分析等。

书籍推荐

为什么:关于因果关系的新科学 在本书中,人工智能领域的权威专家Judea Pearl及其同事领导的因果关系革命突破多年的迷雾,厘清了知识的本质,确立了因果关系研究在科学探索中的核心地位。

类别: 图形模型

类别: 回归分析

类别: 归纳推理

分类: 统计哲学


This page was moved from wikipedia:en:Causal inference. Its edit history can be viewed at 因果推断/edithistory

This page was moved from mywiki:zh-cn:因果推断. Its edit history can be viewed at 因果推断/edithistory