更改

分层随机试验 (查看源代码)

2021年6月9日 (三) 15:21的版本

添加8,393字节、 2021年6月9日 (三) 15:21

无编辑摘要

第1行：第1行： −

'''平均处理效应 (Average Treatment Effect, ATE)'''是在随机试验个体、政策干预评估和医药试验中用于比较治疗或干预的一种测量方法。平均处理效应测量分配给处理个体和控制个体之间的平均结果的差异。在[[随机试验]]中，平均处理效应可以通过比较样本在处理个体和未处理个体的平均结果进行估计获得。然而，平均处理效应通常被理解为研究人员希望知道的一个因果参数 (即一个总体的估计或属性) ，定义时不参考试验设计或估计过程。观察性研究和随机赋值的实验性研究设计都可以用多种方式进行估计平均处理效应。

−

~~== 一般性定义 General definition ==~~

+

[[File:Graphic_breakdown_of_stratified_random_sampling.jpeg|thumb|220x220px|分层随机抽样的图形分解 Graphic breakdown of stratified random sampling]]

−

'''~~处理 (Treatment)~~'''一词起源于农业和医药领域的早期统计分析，现在被更广泛地用于自然科学和社会科学的其他领域，尤其是心理学、政治科学和经济学，例如评价公共政策的影响。试验中的处理或'''~~结果 (Outcome)~~'''的具体内容在评估平均处理效应时相对而言并不重要，也就是说，平均处理效应估算要求对某些个体进行处理，但不处理其他个体，但处理具体内容(例如药物、奖励性支付、政治广告)与平均处理效应的定义和估计无关。

+

在统计学中， '''分层随机试验 Stratified randomization''' 是一种抽样方法，首先将整个研究 '''总体 Population''' 层为具有相同属性或特征的子群，称为 '''分层 Attributes''' ，然后从分层组中进行简单随机抽样，在抽样过程的任何阶段，随机、完全偶然地无偏抽取同一子群中的元素。<ref name=":3" /><ref>{{Citation|title=Simple random sample|date=2020-03-18|url=https://en.wikipedia.org/w/index.php?title=Simple_random_sample&oldid=946144051|work=Wikipedia|language=en|access-date=2020-04-07}}</ref>分层随机试验被认为是 '''分层抽样 Stratified sampling''' 的一个细分。当共享属性部分存在，并且在被调查总体的不同亚群之间有很大差异时，应该采用分层随机试验。因此，在取样过程中需要特别考虑或明确区分。<ref>{{Citation|title=Stratified sampling|date=2020-02-09|url=https://en.wikipedia.org/w/index.php?title=Stratified_sampling&oldid=939938944|work=Wikipedia|language=en|access-date=2020-04-07}}</ref>这种抽样方法应区别于 '''整群抽样方法 Cluster sampling''' ，整群抽样方法是在整个群体中选择一个简单的随机抽样来代表整个总体，或分层系统抽样方法，在分层过程之后进行 '''系统抽样 Systematic sampling''' 。分层随机抽样有时也称为 '''定额随机抽样 Quota random sampling''' 。<ref name=":3">{{Cite web|url=https://www.investopedia.com/ask/answers/032615/what-are-some-examples-stratified-random-sampling.asp|title=How Stratified Random Sampling Works|last=Nickolas|first=Steven|date=July 14, 2019|website=Investopedia|language=en|access-date=2020-04-07}}</ref>

−

“处理效应”一词是指某一特定处理或干预 (如给予某种药物)对结果变量(如病人的康复)的'''因果影响 (Causal Effect)'''。在因果关系的 Neyman-Rubin“[[潜在结果框架]]”中，处理效应被定义为每个独立个体的两个“潜在结果”，如果该个体给与处理，就会显现一种结果；如果该个体不给予处理，就会显现出另一种结果。“处理效果”是这两种潜在结果之间的差异。然而，这种个体水平的处理效果是不可观察到的，因为每个独立个体只能接受处理或不接受处理，但不能同时接受和不接受。随机分配需要确保给处理组的个体和对照组的个体在大量迭代实验上是服从同分布。事实上，两组中的个体在协变量和潜在结果上的分布是相同的。因此，处理个体之间的平均结果是控制个体的平均结果的反事实。这两个平均值之间的差异是平均处理效应，这是不可观测到的个体层面的处理效果的中心趋势的估计。<ref>{{cite journal |last=Holland |first=Paul W. |year=1986 |title=Statistics and Causal Inference |journal=Journal of the American Statistical Association|volume=81 |issue=396 |pages=945–960 |jstor=2289064 }}</ref>如果样本是从总体中随机构成，那么'''样本平均处理效应 (Sample Average Treatment Effect， SATE)'''也是'''总体平均处理效应 (Population Average Treatment Effect，PATE)'''的估计值。<ref>{{cite journal |last=Imai |first=Kosuke |first2=Gary |last2=King |first3=Elizabeth A. |last3=Stuart |year=2008 |title=Misunderstandings Between Experimentalists and Observationalists About Causal Inference |journal=Journal of the Royal Statistical Society, Series |volume=171 |issue=2 |pages=481–502 |url=http://nrs.harvard.edu/urn-3:HUL.InstRepos:4142695 }}</ref>

+

== 分层随机试验的步骤 ==

−

虽然实验确保了潜在结果以及所有协变量在处理组和对照组中的等价分布，但是在观察性研究中，情况并非如此。在观察性研究中，处理组和对照组个体并不服从随机分布，因此处理个体可能取决于未观测到或不可观测的因素。观察到的因素可以在统计学上加以控制 (如通过回归或匹配) ，但是任何关于平均处理效应的估计都可能与不可观察因素混淆，这些因素影响了哪些个体接受了处理，哪些个体没有接受处理。

+

分层随机试验在目标总体异 '''质性 Heterogeneous'''的情况下非常有用，它能有效地显示研究中的趋势或特征在不同阶层之间的差异。<ref name=":3" />当进行分层随机试验时，应采取以下8个步骤：<ref name=":4">{{Cite web|url=https://www.statisticshowto.com/stratified-random-sample/|title=Stratified Random Sample: Definition, Examples|last=Stephanie|date=Dec 11, 2013|website=Statistics How To|language=en-US|access-date=2020-04-07}}</ref><ref name=":5">{{Cite web|url=https://www.questionpro.com/blog/stratified-random-sampling/|title=Stratified Random Sampling: Definition, Method and Examples|date=2018-03-13|website=QuestionPro|language=en|access-date=2020-04-07}}</ref>

−

== ~~形式化定义 Formal definition~~ ==

+

# 定义目标总体

+

#定义分层 '''变量 Variables''' 并决定要创建的分层数量。确定分层变量的标准,包括年龄、社会经济地位、国籍、种族、教育程度等，并应与研究目标相一致。理想情况下，应该使用4-6个阶层，因为任何分层变量的增加将提高其中一些变量抵消其他变量的影响的概率。<ref name=":5" />

+

#使用 '''抽样框架 Sampling frame''' 评估目标总体中的所有元素。之后根据 '''覆盖率 Coverage''' 和分组进行更改。

+

#列出所有的元素并考虑抽样结果。每个阶层应该相互排斥 Mutually exclusive，加起来涵盖总体的所有成员，而总体的每一个成员应该属于唯一的阶层，和其他差异最小的成员一起。<ref name=":4" />

+

#决定随机抽样的选择标准。这可以手动完成，也可以用设计好的计算机程序完成。

+

#为所有元素分配一个随机且唯一的编号，然后根据分配的编号对这些元素进行排序。

+

#回顾每一层的大小（Size）和每一层中所有元素的 '''数值分布 Numerical distribution''' 。确定抽样类型，按比例或不按比例分层抽样。

+

#按照第5步中的规定进行所选的随机抽样。至少，必须从每个阶层中选择一种元素，以便最终样品包括每个阶层的代表。如果从每个阶层中选择两个或两个以上的元素，则可以计算所收集数据的 '''误差范围 Error margins''' 。<ref name=":5" />

−

为了形式化定义平均处理效应，我们定义了两个潜在的结果: <math>y_{0}(i)</math > 是个体 <math> i </math> 没有被处理时的结果变量的取值，<math> y _ {1}(i) </math> 是个体 <math> i </math> 被处理时的结果变量的取值。例如，<math>y_{0}(i)</math > 是个体 <math> i </math> 没有被注射研究药物时的健康状态，<math>y_{1}(i)</math > 是个体 <math> i </math> 被注射药物时的健康状态。

+

Simple randomization is considered as the easiest method for allocating subjects in each stratum. Subjects are assigned to each group purely randomly for every assignment. Even though it is easy to conduct, simple randomization is commonly applied in strata that contain more than 100 samples since a small sampling size would make assignment unequal.

−

个体 <math> i </math> 的处理效应定义为 <math> y_{1}(i)-y_{0}(i) = \beta (i) </math> 。在一般情况下，这种处理效果在个体之间是不一样的。平均处理效果<math> \text{ATE} </math>的定义为

+

简单随机化被认为是最简单的方法分配主体在每个阶层。每次分配的主题都是随机分配给每个小组的。尽管简单的随机化方法易于实施，但是由于小样本容易造成分配不等，因此在样本数超过100个的地层中常常采用简单的随机化方法。

−

~~:<math>\text{ATE}~~ = ~~\frac{1}{N}\sum_i (y_{1}(i)-y_{0}(i))</math>~~

+

== Techniques ==

−

~~这里对总体中所有N数量个体的处理效应进行了聚合平均计算。~~

+

[[File:Simple_random_sampling_after_stratification_step.png|thumb|Simple random sampling after stratification step]]

+

Block randomization is commonly used in the experiment with a relatively big sampling size to avoid the imbalance allocation of samples with important characteristics. In certain fields with strict requests of randomization such as clinical trials, the allocation would be predictable when there is no blinding process for conductors and the block size is limited. The blocks permuted randomization in strata could possibly cause an imbalance of samples among strata as the number of strata increases and the sample size is limited, For instance, there is a possibility that no sample is found meeting the characteristic of certain strata.

−

如果我们能观察到一个大型代表性样本中每个个体的<math> y _ {1}(i) </math> 和 <math> y _ {0}(i) </math> ，我们可以简单地通过取样本中 <math> y _ {1}(i)-y _ {0}(i) </math> 的平均值来估计平均处理效应。然而，我们不能同时观察每个个体的<math> y _ {1}(i)、y _ {0}(i) </math>，因为每个个体不能同时被处理和不被处理。例如，在药物例子中，我们只能观察到个体接受过药物治疗的<math> y _ {1}(i) </math> 和个体未接受药物的 <math> y _ {0}(i) </math> 。这是研究者们在评估治疗效果时面临的主要问题，并因此引发了大量与估计方法相关的研究。

+

为了避免重要特征样本分配不平衡的问题，实验中常采用分块随机化的方法，采样规模较大。在某些严格要求随机化的领域，例如临床试验，当没有导体的盲法和块大小有限时，分配是可以预测的。随着地层数量的增加和样本容量的限制，地层中的块体随机化可能导致地层之间样本的不平衡，例如，有可能找不到符合特定地层特征的样本。

−

== ~~估计 Estimation~~ ==

+

Stratified randomization decides one or multiple prognostic factors to make subgroups, on average, have similar entry characteristics. The patient factor can be accurately decided by examining the outcome in previous studies.<ref>{{Cite journal|last=Sylvester|first=Richard|date=December 1982|title=Fundamentals of clinical trials|journal=Controlled Clinical Trials|volume=3|issue=4|pages=385–386|doi=10.1016/0197-2456(82)90029-0|issn=0197-2456}}</ref>

−

~~根据数据及其潜在环境的不同，我们可以使用许多方法来估计平均处理效应<math> \text{ATE} </math>。最常见方法包括:~~

−

* [[自然实验]] Natural Experiment

−

* [[双重差分模型]] Difference in Differences

+

The number of subgroups can be calculated by multiplying the number of strata for each factor. Factors are measured before or at the time of randomization and experimental subjects are divided into several subgroups or strata according to the results of measurements.<ref name=":0">{{Cite book|last=Pocock, Stuart J.|title=Clinical trials : a practical approach|publisher=John Wiley & Sons Ltd|date=Jul 1, 2013|isbn=978-1-118-79391-6|location=Chichester|oclc=894581169}}</ref>

−

* [[断点回归设计]] Regression Discontinuity Design

+

In order to guarantee the similarity of each treatment group, the "minimization" method attempts are made, which is more direct than random permuted block within strats. In the minimization method, samples in each stratum are assigned to treatment groups based on the sum of samples in each treatment group, which makes the number of subjects keep balance among the group.

−

* [[倾向评分匹配]] Propensity Score Matching

+

为了保证每个处理组之间的相似性，尝试了“最小化”方法，这种方法比层内随机置乱更直接。在最小化方法中，根据每个处理组的样本总和，将每个地层的样本分配给处理组，使处理组的受试者人数保持平衡。

−

* [[工具变量估计]] Instrumental Variables Estimation

−

~~== 示例 An example ==~~

−

考虑一个失业群体，对其中一些个体给与政策干预（处理组），其余的不做任何处理（控制组）。现需要计算求职监控政策（干预）对失业期长短的影响: 平均来说，如果对个体进行求职监控（给与干预），失业期会缩短多少？在选择一种干预这种情况下，平均处理效应是处理组和对照组的失业时间长度的期望值（平均值）差异。

+

Within each stratum, several randomization strategies can be applied, which involves [[Simple random sample|simple randomization]], [[Blocking (statistics)#Blocking used for nuisance factors that can be controlled|blocked randomization]], and [[Minimisation (clinical trials)|minimization]].

−

在这个例子中，平均处理效应为正值意味着就业政策延长了失业期，平均处理效应为负值表明就业政策缩短了失业期。平均处理效应等于零表明提供就业政策对失业期长短并没有任何利处或不利。判断一个平均处理效应估计值是否为可以区分的零值需要进行统计推断。

+

Confounding factors are important to consider in clinical trials

−

因为平均处理效应是对处理的平均效果估计，正值或者负值平均处理效应并不表明处理对任意特定个体是有益或者有害。因此，平均处理效应忽略了处理效应的分布。即使平均处理效应是正值，总体的部分个体也可能因为这种处理或者干预而使得情况变得更糟。

+

在临床试验中，混杂因素是需要考虑的重要因素

+

=== Simple randomization within strata ===

+

Stratified random sampling is useful and productive in situations requiring different weightings on specific strata. In this way, the researchers can manipulate the selection mechanisms from each strata to amplify or minimize the desired characteristics in the survey result.

−

~~== 异质处理效应 Heterogenous treatment effects ==~~

+

分层随机抽样在特定地层需要不同权重的情况下是有用的和有效的。通过这种方式，研究人员可以操纵来自每个阶层的选择机制，以便在调查结果中放大或减少所需的特征。

−

~~一些研究人员将处理效果依赖于个体的情况称之为“异质性”。例如，上面提到的求职监控政策依赖于性别（男、女）或者是区域的不同。~~

+

Simple randomization is considered as the easiest method for allocating subjects in each stratum. Subjects are assigned to each group purely randomly for every assignment. Even though it is easy to conduct, simple randomization is commonly applied in strata that contain more than 100 samples since a small sampling size would make assignment unequal.<ref name=":0" />

−

一种异质处理效应的研究方法是将研究数据进行分组 (如按照男、女性别，或者区域进行划分) ，比较平均处理效果在子组内的效应差异。每个子组的平均处理效应被称为'''“条件平均处理效应”(Conditional Average Treatment Effect，CATE)''' ，也就是说，每个子组的平均处理效应被称为条件平均处理效应，以子组内的分类方式为条件。

+

Stratified randomization is helpful when researchers intend to seek for associations between two or more strata, as simple random sampling causes a larger chance of unequal representation of target groups. It is also useful when the researchers wish to eliminate confounders in observational studies as stratified random sampling allows the adjustments of covariances and the p-values for more accurate results.

+

当研究人员试图寻找两个或多个阶层之间的联系时，分层随机化是有帮助的，因为简单的随机抽样会导致目标群体代表性不平等的可能性更大。当研究人员希望在观察研究中消除混杂因素时，这也是有用的，因为分层随机抽样允许调整协方差和 p 值以获得更准确的结果。

−

~~这种研究方法存在的一个问题是，子组的数据可能比未分组的数据要少得多，没有足够数据进行分析。~~

+

=== Block randomization within strata ===

+

[[Randomized block design|Block randomization]], sometimes called permuted block randomization, applies blocks to allocate subjects from the same strata equally to each group in the study. In block randomization, allocation ratio (ratio of the number of one specific group over other groups) and group sizes are specified. The block size must be the multiples of the number of treatments so that samples in each stratum can be assigned to treatment groups with the intended ratio.<ref name=":0" /> For instance, there should be 4 or 8 strata in a clinical trial concerning breast cancer where age and nodal statuses are two prognostic factors and each factor is split into two-level. The different blocks can be assigned to samples in multiple ways including random list and computer programming.<ref>{{Cite web|url=https://www.sealedenvelope.com/help/redpill/latest/block/|title=Sealed Envelope {{!}} Random permuted blocks|date=Feb 25, 2020|website=www.sealedenvelope.com|access-date=2020-04-07}}</ref><ref>{{Citation|last1=Friedman|first1=Lawrence M.|title=Introduction to Clinical Trials|date=2010|work=Fundamentals of Clinical Trials|pages=1–18|publisher=Springer New York|isbn=978-1-4419-1585-6|last2=Furberg|first2=Curt D.|last3=DeMets|first3=David L.|doi=10.1007/978-1-4419-1586-3_1}}</ref>

+

There is also a higher level of statistical accuracy for stratified random sampling compared with simple random sampling, due to the high relevance of elements chosen to represent the population. The step of stratified randomization is extremely important as an attempt to ensure that no bias, delibrate or accidental, affects the representative nature of the patient sample under study. It increases the study power, especially in small clinical trials(n<400), as these known clinical traits stratified are thought to effect the outcomes of the interventions. It helps prevent the occurrence of type I error, which is valued highly in clinical studies. It also has an important effect on sample size for active control equivalence trials and in theory, facilitates subgroup analysis and interim analysis.

−

~~也有一些利用随机森林检测异质处理效应的相关工作~~<~~ref name="het-paper">https://arxiv.org/abs/1510.04342</ref><ref name="het-blog-post">https://www.markhw.com/blog/causalforestintro</ref>。~~

+

与简单随机抽样相比，分层随机抽样具有更高的统计准确性，因为所选择的元素代表总体具有高度的相关性。分层随机化的步骤是非常重要的，它试图确保没有偏差，取样或偶然，影响研究中患者样本的代表性。它增加了研究力量，特别是在小型临床试验(n < 400) ，因为这些已知的临床特征分层被认为影响干预的结果。它有助于防止 i 型错误的发生，这在临床研究中是很有价值的。它还对主动控制等效试验的样本容量有重要影响，并在理论上简化了亚组分析和中期分析。

−

~~== 参考文献 References ==~~

−

~~{{reflist}}~~

+

The subgroup size is taken to be of the same importance if the data available cannot represent overall subgroup population. In some applications, subgroup size is decided with reference to the amount of data available instead of scaling sample sizes to subgroup size, which would introduce bias in the effects of factors. In some cases that data needs to be stratified by variances, subgroup variances differ significantly, making each subgroup sampling size proportional to the overall subgroup population cannot be guaranteed.

+

如果可用的数据不能代表整个分组人口，则子组大小被认为具有同样的重要性。在一些应用中，子群大小是根据可用数据量来决定的，而不是按照子群大小来衡量样本大小，这会在因素的影响中引入偏倚。在某些情况下，数据需要由方差分层，分组方差差异显著，使得每个分组抽样大小与整个分组总体成比例不能得到保证。

−

== ~~扩展阅读 Further reading~~ ==

+

Block randomization is commonly used in the experiment with a relatively big sampling size to avoid the imbalance allocation of samples with important characteristics. In certain fields with strict requests of randomization such as [[clinical trial]]s, the allocation would be predictable when there is no blinding process for conductors and the block size is limited. The blocks permuted randomization in strata could possibly cause an imbalance of samples among strata as the number of strata increases and the sample size is limited, For instance, there is a possibility that no sample is found meeting the characteristic of certain strata.<ref>{{Cite book|title=Fundamentals of clinical trials|others=Friedman, Lawrence M., 1942-, Furberg, Curt,, DeMets, David L., 1944-, Reboussin, David,, Granger, Christopher B.|date=27 August 2015|isbn=978-3-319-18539-2|edition=Fifth|location=New York|oclc=919463985}}</ref>

−

*{{cite book |last=Wooldridge |first=Jeffrey M. |chapter=Policy Analysis with Pooled Cross Sections |pages=438–443 |title=Introductory Econometrics: A Modern Approach |year=2013 |publisher=Thomson South-Western |location=Mason, OH |isbn=978-1-111-53104-1 }}

+

Stratified sampling can not be applied if the population cannot be completely assigned into strata, which would result in sample sizes proportional to sample available instead of overall subgroup population.

−

~~==编者推荐==~~

+

如果总体不能完全分配到地层中，那么分层抽样就不能应用，这将导致样本大小与可用样本成比例，而不是整个子群总体。

−

~~===书籍推荐===~~

−

* [https://www.cambridge.~~org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB Causal Inference for Statistics, Social, and Biomedical Sciences]~~

+

The process of assigning samples into subgroups could involve overlapping if subjects meet the inclusion standard of multiple strata, which could result in a misrepresentation of the population.

+

如果受试者符合多个阶层的包含标准，将样本分配到各个子群组的过程可能会涉及重叠，这可能会导致人口的不正当手法引诱。

−

~~这本书是Rubin和Imbens的著作，适合入门因果推断这个领域。~~

+

=== Minimization method ===

+

In order to guarantee the similarity of each treatment group, the "minimization" method attempts are made, which is more direct than random permuted block within strats. In the minimization method, samples in each stratum are assigned to treatment groups based on the sum of samples in each treatment group, which makes the number of subjects keep balance among the group.<ref name=":0" /> If the sums for multiple treatment groups are the same, simple randomization would be conducted to assign the treatment. In practice, the minimization method needs to follow a daily record of treatment assignments by prognostic factors, which can be done effectively by using a set of index cards to record. The minimization method effectively avoids imbalance among groups but involves less random process than block randomization because the random process is only conducted when the treatment sums are the same. A feasible solution is to apply an additional random list which makes the treatment groups with a smaller sum of marginal totals possess a higher chance (e.g.¾) while other treatments have a lower chance(e.g.¼ ).<ref name=":1">{{Cite journal|last=Pocock|first=S. J.|date=March 1979|title=Allocation of Patients to Treatment in Clinical Trials|journal=Biometrics|volume=35|issue=1|pages=183–197|doi=10.2307/2529944|jstor=2529944|pmid=497334|issn=0006-341X}}</ref>

−

*[[统计因果推理入门]] 对应英文[[Causal Inference in Statistics: A Primer]]

+

== Application ==

−

关于因果的讨论很多，但是许多入门的教材只是为没有统计学基础的读者介绍如何使用统计学技术处理因果性问题，而没有讨论因果模型和因果参数，本书希望协助具有基础统计学知识的教师和学生应对几乎在所有自然科学和社会科学非试验研究中存在的因果性问题。本书聚焦于用简单和自然的方法定义因果参数，并且说明在观察研究中，哪些假设对于估计参数是必要的。我们也证明这些假设可以用显而易见的数学形式描述出来，也可以用简单的数学工具将这些假设转化为量化的因果关系，如治疗效果和政策干预，以确定其可检测的内在关系。

+

[[File:Confounding_factors_are_important_to_consider_in_clinical_trials.png|thumb|219x219px|Confounding factors are important to consider in clinical trials]]

−

===~~课程推荐~~===

+

Stratified random sampling is useful and productive in situations requiring different [[weighting]]s on specific strata. In this way, the researchers can manipulate the selection mechanisms from each strata to amplify or minimize the desired characteristics in the survey result.<ref>{{Cite web|url=https://www.thoughtco.com/stratified-sampling-3026731|title=Understanding Stratified Samples and How to Make Them|last=Crossman|first=Ashley|date=Jan 27, 2020|website=ThoughtCo|language=en|access-date=2020-04-07}}</ref>

−

*[https://campus.swarma.org/course/2526 两套因果框架深度剖析：潜在结果模型与结构因果模型]

−

::这个视频内容来自[[集智俱乐部读书会]]-因果科学与Causal AI读书会第二季内容的分享，由英国剑桥大学及其学习组博士陆超超详细的阐述了潜在结果模型和结果因果模型，并介绍了两个框架的相互转化规律。

−

~~::1. 讲述因果推断的两大框架：潜在结果模型和结构因果模型，讨论他们各自的优缺点以及他们之间的联系，详细介绍他们之间的转化规律。~~

+

Stratified randomization is helpful when researchers intend to seek for [[Association (statistics)|associations]] between two or more strata, as simple random sampling causes a larger chance of unequal representation of target groups. It is also useful when the researchers wish to eliminate [[Confounding|confounders]] in [[Observational study|observational studies]] as stratified random sampling allows the adjustments of [[covariance]]s and the [[P-value|''p''-values]] for more accurate results.<ref>{{Cite book|last=Hennekens, Charles H.|title=Epidemiology in medicine|date=1987|publisher=Little, Brown|others=Buring, Julie E., Mayrent, Sherry L.|isbn=0-316-35636-0|edition=1st|location=Boston, Massachusetts|oclc=16890223}}</ref>

−

*[https://www.bilibili.com/video/BV1NJ411w7ms?from=search&seid=15960075946481426104 Average Effect of Treatment on the Treated (ATT) 实验组的平均干预效应/匹配方法]

−

~~::B站搬运的杜克大学社会科学研究中心的分享视频，介绍了在使用匹配方法时会涉及到的ATT、CATE、ATE的方法。~~

+

There is also a higher level of [[Accuracy and precision|statistical accuracy]] for stratified random sampling compared with simple random sampling, due to the high [[relevance]] of elements chosen to represent the population.<ref name=":5" /> The differences within the strata is much less compared to the one between strata. Hence, as the between-sample differences are minimized, the [[standard deviation]] will be consequently tightened, resulting in higher degree of accuracy and small error in the final results. This effectively reduces the [[Sample size determination|sample size]] needed and increases [[Cost-effectiveness analysis|cost-effectiveness]] of sampling when research funding is tight.

−

~~=== 文章总结 ===~~

−

* [https:/~~/mp.weixin.qq.com/s/f-rI5W6tc6qOzthbzK4oAw 崔鹏：稳定学习——挖掘因果推理和机器学习的共同基础]~~

+

In real life, stratified random sampling can be applied to results of election polling, investigations into income disparities among social groups, or measurements of education opportunities across nations.<ref name=":3" />

−

* Mesonychid在自己的个人主页上分享的关于[https://hanyuz1996.github.io/2017/08/30/Donald-Rubin/ Donald-Rubin潜在结果模型]的解释。

+

== Stratified randomization in clinical trials ==

−

* Yishi Lin在自己的个人主页上分享的关于因果推断的一些介绍[https://~~dango~~.~~rocks~~/~~blog~~/~~2019~~/01/08/~~Causal~~-~~Inference~~-~~Introduction1~~/ ~~因果推断漫谈（一）：掀开 “因果推断” 的面纱~~]

+

In [[clinical trial]]s, patients are stratified according to their social and individual backgrounds, or any factor that are relevant to the study, to match each of these groups within the entire patient population. The aim of such is to create a balance of clinical/prognostic factor as the trials would not produce valid results if the study design is not balanced.<ref>{{Cite book|last1=Polit|first1=DF|title=Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed.|last2=Beck|first2=CT|publisher=Lippincott Williams & Wilkins.|year=2012|location=Philadelphia, USA: Wolters Klower Health}}</ref> The step of stratified randomization is extremely important as an attempt to ensure that no bias, delibrate or accidental, affects the representative nature of the patient sample under study.<ref>{{Cite web|url=https://www.omixon.com/patient-stratification-in-clinical-trials/|title=Patient Stratification in Clinical Trials|date=2014-12-01|website=Omixon {{!}} NGS for HLA|language=en-US|access-date=2020-04-26}}</ref> It increases the study power, especially in small clinical trials(n<400), as these known clinical traits stratified are thought to effect the outcomes of the interventions.<ref>{{Cite web|url=https://www.statisticshowto.com/stratified-randomization/|title=Stratified Randomization in Clinical Trials|last=Stephanie|date=2016-05-20|website=Statistics How To|language=en-US|access-date=2020-04-26}}</ref> It helps prevent the occurrence of [[Type I and type II errors|type I error]], which is valued highly in clinical studies.<ref name=":6">{{Cite journal|last=Kernan|first=W|date=Jan 1999|title=Stratified Randomization for Clinical Trials|journal=Journal of Clinical Epidemiology|volume=52|issue=1|pages=19–26|doi=10.1016/S0895-4356(98)00138-3|pmid=9973070}}</ref> It also has an important effect on sample size for active control equivalence trials and in theory, facilitates [[subgroup analysis]] and [[interim analysis]].<ref name=":6" />

−

* 伯克利大学统计系助理教授丁鹏老师在统计之都分享的一系列关于因果推断的文章，适合学习和入门。

−

~~:: [https://cosx.org/2012/03/causality1-simpson-paradox/ 因果推断简介之一：从 Yule-Simpson’s Paradox 讲起]~~

−

:~~: [https://cosx.org/2012/03/causality2-rcm 因果推断简介之二：Rubin Causal Model~~ (~~RCM~~) ~~和随机化试验]~~

+

Category:Sampling (statistics)

−

:~~: [https://cosx.org/2012/03/causality3-fisher-and-neyman/ 因果推断简介之三：R. A. Fisher 和 J. Neyman 的分歧]~~

+

类别: 抽样(统计)

−

~~:: [https://cosx.org/2012/04/causality4-observational-study-ignorability-and-propensity-score/ 因果推断简介之四：观察性研究，可忽略性和倾向得分]~~

+

== Advantage ==

−

:~~: [https://cosx.org/2012/10/causality5-causal-diagram 因果推断简介之五：因果图 (Causal Diagram)]~~

+

Category:Sampling techniques

−

:~~: [https://cosx.org/2013/08/causality6-instrumental-variable 因果推断简介之六：工具变量（instrumental variable）]~~

+

类别: 抽样技术

−

~~:: [https://cosx.org/2013/09/causality7-lord-paradox 因果推断简介之七：Lord’s Paradox]~~

+

−

:: [~~https:~~//~~cosx.org/2013~~/~~09/casuality8-smoke-and-lung-cancer 因果推断简介之八：吸烟是否导致肺癌？Fisher versus Cornfield]~~

+

This page was moved from [[wikipedia:en:Stratified randomization]]. Its edit history can be viewed at [[分层随机试验/edithistory]]</noinclude>

−

~~=== 相关路径 ===~~

+

[[Category:待整理页面]]

−

* [https://pattern.swarma.org/path?id=99 因果科学与Casual AI读书会必读参考文献列表]，这个是根据读书会中解读的论文，做的一个分类和筛选，方便大家梳理整个框架和内容。

−

* [https://pattern.swarma.org/path?id=9 因果推断方法概述],这个路径对因果在哲学方面的探讨，以及因果在机器学习方面应用的分析。

−

* [https://pattern.swarma.org/path?id=90 因果科学和 Causal AI入门路径]，这条路径解释了因果科学是什么以及它的发展脉络。此路径将分为三个部分进行展开，第一部分是因果科学的基本定义及其哲学基础，第二部分是统计领域中的因果推断，第三个部分是机器学习中的因果（Causal AI）。

−

~~----~~

−

~~本中文词条由[[用户: Janeway|Janeway]]用户参与编译，[[用户:LFZ|LFZ]]参与审校，~~[[用户:~~思无涯咿呀咿呀|思无涯咿呀咿呀~~]]~~编辑，欢迎在讨论页面留言。~~

−

~~'''本词条内容源自wikipedia及公开资料，遵守 CC3.0协议。'''~~

孙钦贵

387

个编辑

更改

分层随机试验 (查看源代码)

2021年6月9日 (三) 15:21的版本

导航菜单

搜索