更改

删除947字节 、 2022年3月23日 (三) 22:27
无编辑摘要
第1行: 第1行:    −
'''逆概率加权'''是一种统计技术,用于计算与收集数据的人群不同的伪总体([[pseudo-population]])的标准化统计数据。在应用中,抽样人群和目标推断人群(目标人群)不一致的研究设计是很常见的<ref>Robins, JM; Rotnitzky, A; Zhao, LP (1994). "Estimation of regression coefficients when some regressors are not always observed". Journal of the American Statistical Association. 89 (427): 846–866. doi:10.1080/01621459.1994.10476818.</ref>。可能有一些禁止性因素,如成本、时间或道德方面的考虑,使研究人员无法直接从目标人群中抽样<ref>Breslow, NE; Lumley, T; et al. (2009). "[https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768499 Using the Whole Cohort in the Analysis of Case-Cohort Data]". Am J Epidemiol. 169 (11): 1398–1405. [https://doi.org/10.1093%2Faje%2Fkwp055 doi:10.1093/aje/kwp055]. PMC [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2768499 2768499]. PMID [https://pubmed.ncbi.nlm.nih.gov/19357328 19357328]</ref>。解决这个问题的方法是使用另一种设计策略,如分层抽样([[stratified sampling]])。如果应用得当,加权可以潜在地提高效率,减少非加权估计的偏差。
+
'''逆概率加权'''是一种统计技术,用于计算与收集数据的人群不同的伪总体([[pseudo-population]])的标准化统计数据。在应用中,抽样人群和目标推断人群(目标人群)不一致的研究设计是很常见的<ref>{{cite journal | last1 = Robins | first1 = JM | last2 = Rotnitzky | first2 = A | last3 = Zhao | first3 = LP | year = 1994 | title = Estimation of regression coefficients when some regressors are not always observed | journal = [[Journal of the American Statistical Association]] | volume = 89 | issue = 427 | pages = 846–866 | doi=10.1080/01621459.1994.10476818}}</ref>。可能有一些禁止性因素,如成本、时间或道德方面的考虑,使研究人员无法直接从目标人群中抽样<ref>{{cite journal | last1 = Breslow | first1 = NE | last2 = Lumley | first2 = T | year = 2009 | pmc = 2768499 | title = Using the Whole Cohort in the Analysis of Case-Cohort Data | journal = Am J Epidemiol | volume = 169 | issue = 11 | pages = 1398–1405 | doi=10.1093/aje/kwp055 | pmid=19357328|display-authors=etal}}</ref>。解决这个问题的方法是使用另一种设计策略,如分层抽样([[stratified sampling]])。如果应用得当,加权可以潜在地提高效率,减少非加权估计的偏差。
    
一个非常早期的加权估计器是均值的Horvitz-Thompson估计器([[Horvitz–Thompson estimator]])<ref>{{cite journal | first1 = D. G. |last1 = Horvitz | first2 = D. J. |last2 = Thompson | title = A generalization of sampling without replacement from a finite universe | journal = [[Journal of the American Statistical Association]] | volume = 47 |  pages = 663–685 | year = 1952 |issue = 260 | doi=10.1080/01621459.1952.10483446}}</ref>。当抽样概率是已知的,抽样人群是从目标人群中抽取的,那么这个概率的倒数被用来加权观测。这种方法已经在不同的框架下被推广到统计学的许多方面。特别是,有加权似然([[likelihood function|weighted likelihoods]])、加权估计方程([[generalized estimating equations|weighted estimating equations]])和加权概率密度([[probability density function|weighted probability densities]]),大多数统计学都是由此而来的。这些应用编纂了其他统计学和估计器的理论,如边际结构模型([[marginal structural models]])、标准化死亡率([[standardized mortality ratio]]),以及用于粗粒度或聚合数据的EM算法([[EM algorithm]])。
 
一个非常早期的加权估计器是均值的Horvitz-Thompson估计器([[Horvitz–Thompson estimator]])<ref>{{cite journal | first1 = D. G. |last1 = Horvitz | first2 = D. J. |last2 = Thompson | title = A generalization of sampling without replacement from a finite universe | journal = [[Journal of the American Statistical Association]] | volume = 47 |  pages = 663–685 | year = 1952 |issue = 260 | doi=10.1080/01621459.1952.10483446}}</ref>。当抽样概率是已知的,抽样人群是从目标人群中抽取的,那么这个概率的倒数被用来加权观测。这种方法已经在不同的框架下被推广到统计学的许多方面。特别是,有加权似然([[likelihood function|weighted likelihoods]])、加权估计方程([[generalized estimating equations|weighted estimating equations]])和加权概率密度([[probability density function|weighted probability densities]]),大多数统计学都是由此而来的。这些应用编纂了其他统计学和估计器的理论,如边际结构模型([[marginal structural models]])、标准化死亡率([[standardized mortality ratio]]),以及用于粗粒度或聚合数据的EM算法([[EM algorithm]])。
   −
当数据缺失的受试者不能被纳入主要分析时,逆概率加权也被用来解释缺失的数据<ref>Hernan, MA; Robins, JM (2006). "Estimating Causal Effects From Epidemiological Data". J Epidemiol Community Health. 60 (7): 578–596. CiteSeerX 10.1.1.157.9366. doi:10.1136/jech.2004.029496. PMC 2652882. PMID 16790829.</ref>。有了对抽样概率的估计,或该因素在另一次测量中被测量的概率,逆概率加权可以用来提高那些由于数据缺失程度大而代表性不足的受试者的权重。
+
当数据缺失的受试者不能被纳入主要分析时,逆概率加权也被用来解释缺失的数据<ref>{{cite journal | last1 = Hernan | first1 = MA | last2 = Robins | first2 = JM | year = 2006 | title = Estimating Causal Effects From Epidemiological Data | citeseerx = 10.1.1.157.9366 | journal = J Epidemiol Community Health | volume = 60 | issue = 7 | pages = 578–596 | doi=10.1136/jech.2004.029496| pmc = 2652882 | pmid=16790829}}</ref>。有了对抽样概率的估计,或该因素在另一次测量中被测量的概率,逆概率加权可以用来提高那些由于数据缺失程度大而代表性不足的受试者的权重。
    
== 逆概率加权估计量(Inverse Probability Weighted Estimator, IPWE) ==
 
== 逆概率加权估计量(Inverse Probability Weighted Estimator, IPWE) ==
第87行: 第87行:     
= 参考文献 =
 
= 参考文献 =
  −
{{Reflist|refs=
  −
<ref name="refname1">
  −
{{cite journal | last1 = Hernan | first1 = MA | last2 = Robins | first2 = JM | year = 2006 | title = Estimating Causal Effects From Epidemiological Data | citeseerx = 10.1.1.157.9366 | journal = J Epidemiol Community Health | volume = 60 | issue = 7 | pages = 578–596 | doi=10.1136/jech.2004.029496| pmc = 2652882 | pmid=16790829}}
  −
</ref>
  −
<ref name="refname2">
  −
{{cite journal | last1 = Robins | first1 = JM | last2 = Rotnitzky | first2 = A | last3 = Zhao | first3 = LP | year = 1994 | title = Estimation of regression coefficients when some regressors are not always observed | journal = [[Journal of the American Statistical Association]] | volume = 89 | issue = 427 | pages = 846–866 | doi=10.1080/01621459.1994.10476818}}
  −
</ref>
  −
<ref name="refname3">
  −
{{cite journal | last1 = Breslow | first1 = NE | last2 = Lumley | first2 = T  | year = 2009 | pmc = 2768499 | title = Using the Whole Cohort in the Analysis of Case-Cohort Data | journal = Am J Epidemiol | volume = 169 | issue = 11 | pages = 1398–1405 | doi=10.1093/aje/kwp055 | pmid=19357328|display-authors=etal}}
  −
</ref>
  −
}}
 
49

个编辑