更改

添加7,355字节 、 2021年1月26日 (二) 21:15
无编辑摘要
第1行: 第1行: −
此词条Jie翻译。
+
此词条Jie翻译。已由Smile审校。
    
{{too technical|date=May 2020}}
 
{{too technical|date=May 2020}}
第9行: 第9行:  
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution.  In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
 
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution.  In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
   −
在概率论中,重尾分布'''<font color="#ff8000"> Heavy-tailed distributions</font>'''是指其尾部呈现出不受指数限制的概率分布:也就是说,它们的尾部比指数分布“重”。在许多应用中,关注的是分布的右尾,但是分布的左尾可能也很重,或者两个尾都很重。
+
在概率论中,<font color="#ff8000">重尾分布 Heavy-tailed distributions</font>是指其尾部呈现出不受指数限制的概率分布<ref name="Asmussen">{{Cite book | doi = 10.1007/0-387-21525-5_10 | first = S. R. | last = Asmussen| chapter = Steady-State Properties of GI/G/1 | title = Applied Probability and Queues | series = Stochastic Modelling and Applied Probability | volume = 51 | pages = 266–301 | year = 2003 | isbn = 978-0-387-00211-8 | pmid =  | pmc = }}</ref>:也就是说,它们的尾部比<font color="#ff8000">指数分布 exponential distribution </font> “重”。在许多应用中,关注的是分布的右尾,但是分布的左尾可能也很重,或者两个尾都很重。
      第17行: 第17行:  
There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions and the subexponential distributions.  In practice, all commonly used heavy-tailed distributions belong to the subexponential class.
 
There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions and the subexponential distributions.  In practice, all commonly used heavy-tailed distributions belong to the subexponential class.
   −
重尾分布有三个重要的子类:'''<font color="#ff8000"> 胖尾分布Fat-tailed distribution</font>''''''<font color="#ff8000"> 长尾分布Long-tailed distribution</font>''''''<font color="#ff8000"> 次指数分布Subexponential distributions</font>'''。实际上,所有常用的重尾分布都属于次指数类分布。
+
重尾分布有三个重要的子类:<font color="#ff8000">胖尾分布 Fat-tailed distribution</font>,<font color="#ff8000">长尾分布 Long-tailed distribution</font>和<font color="#ff8000">次指数分布 Subexponential distributions</font>。实际上,所有常用的重尾分布都属于<font color="#ff8000">次指数分布类 subexponential class </font>。
      第25行: 第25行:  
There is still some discrepancy over the use of the term heavy-tailed.  There are two other definitions in use.  Some authors  use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a finite variance.  The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as log-normal that possess all their power moments, yet which are generally considered to be heavy-tailed.  (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)
 
There is still some discrepancy over the use of the term heavy-tailed.  There are two other definitions in use.  Some authors  use the term to refer to those distributions which do not have all their power moments finite; and some others to those distributions that do not have a finite variance.  The definition given in this article is the most general in use, and includes all distributions encompassed by the alternative definitions, as well as those distributions such as log-normal that possess all their power moments, yet which are generally considered to be heavy-tailed.  (Occasionally, heavy-tailed is used for any distribution that has heavier tails than the normal distribution.)
   −
在使用'''<font color="#ff8000"> “重尾Heavy-tailed”</font>'''一词时仍存在一些歧义。于是就出现了另外两种定义。一些作者使用该术语来指代那些并非所有幂矩都是有限的分布。也有其它一些人以此指代没有有限方差的分布。本文中给出的是最常用的定义,包括替代定义所涵盖的所有分布,以及具有所有幂矩的对数正态分布,但通常被认为是重尾的。(有时“重尾”用于任何具有比正态分布更重尾巴的分布。)
+
在使用<font color="#ff8000">“重尾” Heavy-tailed</font>一词时仍存在一些歧义。于是就出现了另外两种定义。一些作者使用该术语来指代并非所有幂矩都是有限的那些分布,以及其它一些没有有限方差的分布。本文中给出的是最常用的定义,包括替代定义所涵盖的所有分布,以及具有所有幂矩的<font color="#ff8000">对数正态分布 long-normal distributions </font>,但通常被认为是重尾的。(有时“重尾”用于任何具有比正态分布更重的尾巴的分布。)
      第37行: 第37行:  
The distribution of a random variable X with distribution function F is said to have a heavy (right) tail if the moment generating function of X, MX(t), is infinite for all t&nbsp;>&nbsp;0.
 
The distribution of a random variable X with distribution function F is said to have a heavy (right) tail if the moment generating function of X, MX(t), is infinite for all t&nbsp;>&nbsp;0.
   −
如果''X''的矩生成函数,  ''M<sub>X</sub>''(''t'')对于所有''t''> 0都是无限的,则具有分布函数F的随机变量''X''的分布被称为重尾(右)。
+
如果''X''的矩生成函数,  ''M<sub>X</sub>''(''t'')对于所有''t''&nbsp;>&nbsp;0都是无限的,则具有分布函数''F''的随机变量''X''的分布被称为重尾(右)。<ref name="ReferenceA">Rolski, Schmidli, Scmidt, Teugels, ''Stochastic Processes for Insurance and Finance'', 1999</ref>
      第67行: 第67行:  
This is also written in terms of the tail distribution function
 
This is also written in terms of the tail distribution function
   −
同样的,可以被认为是尾分布函数:
+
也可以写成<font color="#ff8000">尾分布函数 the tail distribution function </font>:
    
<math>
 
<math>
第91行: 第91行:  
The distribution of a random variable X with distribution function F is said to have a long right tail[1] if for all t > 0,
 
The distribution of a random variable X with distribution function F is said to have a long right tail[1] if for all t > 0,
   −
如果对于所有t>0,具有分布函数F的随机变量X的分布具有较长的右尾,
+
如果对于所有''t''>0,则称具有分布函数''F''的随机变量''X''的分布为有较长的右尾,
    
:<math>
 
:<math>
第114行: 第114行:  
This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level.
 
This has the intuitive interpretation for a right-tailed long-tailed distributed quantity that if the long-tailed quantity exceeds some high level, the probability approaches 1 that it will exceed any other higher level.
   −
对于右尾长尾分布量,该解释非常直观:即如果长尾量超过某个高水平,则概率将接近1,它将超过任何其他更高水平。
+
对于右尾长尾分布量具有直观的解释,即如果长尾量超过某个高水平,则概率将接近1,它将超过其他更高的水平。
      第122行: 第122行:  
All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.
 
All long-tailed distributions are heavy-tailed, but the converse is false, and it is possible to construct heavy-tailed distributions that are not long-tailed.
   −
所有长尾分布都是重尾分布,但反之不一定,事实是可以构造出非长尾分布的重尾分布。
+
所有长尾分布都是重尾分布,但反过来不一定成立,且可以构造出非长尾分布的重尾分布。
      第132行: 第132行:  
Subexponentiality is defined in terms of [[Convolution of probability distributions|convolutions of probability distributions]]. For two independent, identically distributed [[random variables]] <math> X_1,X_2</math> with common distribution function <math>F</math> the convolution of <math>F</math> with itself, <math>F^{*2}</math> is convolution square, using [[Lebesgue–Stieltjes integration]], by:
 
Subexponentiality is defined in terms of [[Convolution of probability distributions|convolutions of probability distributions]]. For two independent, identically distributed [[random variables]] <math> X_1,X_2</math> with common distribution function <math>F</math> the convolution of <math>F</math> with itself, <math>F^{*2}</math> is convolution square, using [[Lebesgue–Stieltjes integration]], by:
   −
次指数性是根据概率分布的卷积定义的。对于具有共同分布函数F的两个独立的,分布均匀的随机变量X1,X2,F与自身的卷积,F2是卷积平方,使用Lebesgue–Stieltjes积分,方法如下:
+
次指数性是根据概率分布的<font color="#ff8000">卷积 Convolution </font>定义的。对于具有共同分布函数<math>F</math>的两个独立且分布均匀的随机变量<math> X_1,X_2</math>,<math>F</math>与自身的卷积,<math>F^{*2}</math>是卷积的平方,使用Lebesgue–Stieltjes积分,方法如下:
      第142行: 第142行:  
and the ''n''-fold convolution <math>F^{*n}</math> is defined inductively by the rule:
 
and the ''n''-fold convolution <math>F^{*n}</math> is defined inductively by the rule:
   −
n倍卷积<math>F^{*n}</math>定义如下:
+
''n''倍卷积<math>F^{*n}</math>定义如下:
      第157行: 第157行:  
A distribution <math>F</math> on the positive half-line is subexponential<ref name="Asmussen"/><ref>{{Cite web|url=https://www.researchgate.net/publication/242637603_A_Theorem_on_Sums_of_Independent_Positive_Random_Variables_and_Its_Applications_to_Branching_Random_Processes|title=A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes|last=Chistyakov|first=V. P.|date=1964|website=ResearchGate|language=en|archive-url=|archive-date=|access-date=April 7, 2019}}</ref><ref>{{Cite web|url=https://projecteuclid.org/download/pdf_1/euclid.aop/1176996225|title=The Class of Subexponential Distributions|last=Teugels|first=Jozef L.|authorlink=|date=1975|website=|publisher=Annals of Probability|publication-place=[[KU Leuven|University of Louvain]]|archive-url=|archive-date=|access-date=April 7, 2019}}</ref> if
 
A distribution <math>F</math> on the positive half-line is subexponential<ref name="Asmussen"/><ref>{{Cite web|url=https://www.researchgate.net/publication/242637603_A_Theorem_on_Sums_of_Independent_Positive_Random_Variables_and_Its_Applications_to_Branching_Random_Processes|title=A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes|last=Chistyakov|first=V. P.|date=1964|website=ResearchGate|language=en|archive-url=|archive-date=|access-date=April 7, 2019}}</ref><ref>{{Cite web|url=https://projecteuclid.org/download/pdf_1/euclid.aop/1176996225|title=The Class of Subexponential Distributions|last=Teugels|first=Jozef L.|authorlink=|date=1975|website=|publisher=Annals of Probability|publication-place=[[KU Leuven|University of Louvain]]|archive-url=|archive-date=|access-date=April 7, 2019}}</ref> if
   −
如果满足以下条件,则正半线上的分布<math>F</math>为次指数:
+
如果满足以下条件,则正半线上的分布<math>F</math>为次指数<ref name="Asmussen"/><ref>{{Cite web|url=https://www.researchgate.net/publication/242637603_A_Theorem_on_Sums_of_Independent_Positive_Random_Variables_and_Its_Applications_to_Branching_Random_Processes|title=A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes|last=Chistyakov|first=V. P.|date=1964|website=ResearchGate|language=en|archive-url=|archive-date=|access-date=April 7, 2019}}</ref><ref>{{Cite web|url=https://projecteuclid.org/download/pdf_1/euclid.aop/1176996225|title=The Class of Subexponential Distributions|last=Teugels|first=Jozef L.|authorlink=|date=1975|website=|publisher=Annals of Probability|publication-place=[[KU Leuven|University of Louvain]]|archive-url=|archive-date=|access-date=April 7, 2019}}</ref>
      第167行: 第167行:  
This implies<ref name="Embrechts">{{cite book |author1=Embrechts P. |author2=Klueppelberg C. |author3=Mikosch T. |title=Modelling extremal events for insurance and finance |publisher=Springer | series = Stochastic Modelling and Applied Probability|location=Berlin |year=1997  | volume=33| doi = 10.1007/978-3-642-33483-2|isbn=978-3-642-08242-9 }}</ref> that, for any <math>n \geq 1</math>,
 
This implies<ref name="Embrechts">{{cite book |author1=Embrechts P. |author2=Klueppelberg C. |author3=Mikosch T. |title=Modelling extremal events for insurance and finance |publisher=Springer | series = Stochastic Modelling and Applied Probability|location=Berlin |year=1997  | volume=33| doi = 10.1007/978-3-642-33483-2|isbn=978-3-642-08242-9 }}</ref> that, for any <math>n \geq 1</math>,
   −
这意味着,对于任何<math>n \geq 1</math>,
+
这意味着<ref name="Embrechts">{{cite book |author1=Embrechts P. |author2=Klueppelberg C. |author3=Mikosch T. |title=Modelling extremal events for insurance and finance |publisher=Springer | series = Stochastic Modelling and Applied Probability|location=Berlin |year=1997  | volume=33| doi = 10.1007/978-3-642-33483-2|isbn=978-3-642-08242-9 }}</ref>,对于任何<math>n \geq 1</math>,
      第176行: 第176行:  
The probabilistic interpretation<ref name="Embrechts"/> of this is that, for a sum of <math>n</math> [[statistical independence|independent]] [[random variables]] <math>X_1,\ldots,X_n</math> with common distribution <math>F</math>,
 
The probabilistic interpretation<ref name="Embrechts"/> of this is that, for a sum of <math>n</math> [[statistical independence|independent]] [[random variables]] <math>X_1,\ldots,X_n</math> with common distribution <math>F</math>,
   −
对此的概率解释是,对于具有共同分布的n个独立随机变量X1,...,Xn的总和
+
对此的概率解释<ref name="Embrechts"/>是,对于具有共同分布<math>F</math>的<math>n</math>个独立随机变量<math>X_1,\ldots,X_n</math>的总和
      第185行: 第185行:  
This is often known as the principle of the single big jump<ref>{{Cite journal | last1 = Foss | first1 = S. | last2 = Konstantopoulos | first2 = T. | last3 = Zachary | first3 = S. | doi = 10.1007/s10959-007-0081-2 | title = Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments | journal = Journal of Theoretical Probability| volume = 20 | issue = 3 | pages = 581 | year = 2007 | arxiv = math/0509605| pmid =  | url = http://www.math.nsc.ru/LBRT/v1/foss/fkz_revised.pdf| pmc = | citeseerx = 10.1.1.210.1699 }}</ref> or catastrophe principle.<ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 9, 2014 | accessdate = January 9, 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>
 
This is often known as the principle of the single big jump<ref>{{Cite journal | last1 = Foss | first1 = S. | last2 = Konstantopoulos | first2 = T. | last3 = Zachary | first3 = S. | doi = 10.1007/s10959-007-0081-2 | title = Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments | journal = Journal of Theoretical Probability| volume = 20 | issue = 3 | pages = 581 | year = 2007 | arxiv = math/0509605| pmid =  | url = http://www.math.nsc.ru/LBRT/v1/foss/fkz_revised.pdf| pmc = | citeseerx = 10.1.1.210.1699 }}</ref> or catastrophe principle.<ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 9, 2014 | accessdate = January 9, 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>
   −
这通常被称为单跳或巨灾原理。
+
这通常被称为<font color="#ff8000">单跳 single big jump</font><ref>{{Cite journal | last1 = Foss | first1 = S. | last2 = Konstantopoulos | first2 = T. | last3 = Zachary | first3 = S. | doi = 10.1007/s10959-007-0081-2 | title = Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments | journal = Journal of Theoretical Probability| volume = 20 | issue = 3 | pages = 581 | year = 2007 | arxiv = math/0509605| pmid =  | url = http://www.math.nsc.ru/LBRT/v1/foss/fkz_revised.pdf| pmc = | citeseerx = 10.1.1.210.1699 }}</ref>或<font color="#ff8000">浩劫原则 catastrophe principle</font> <ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 9, 2014 | accessdate = January 9, 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>。
      第192行: 第192行:  
<math>F I([0,\infty))</math> is.<ref>{{cite journal | last = Willekens | first =  E. | title = Subexponentiality on the real line | journal = Technical Report | publisher = K.U. Leuven | year = 1986}}</ref> Here <math>I([0,\infty))</math> is the [[indicator function]]  of the positive half-line.  Alternatively, a random variable <math>X</math> supported on the real line is subexponential if and only if <math>X^+ = \max(0,X)</math> is subexponential.
 
<math>F I([0,\infty))</math> is.<ref>{{cite journal | last = Willekens | first =  E. | title = Subexponentiality on the real line | journal = Technical Report | publisher = K.U. Leuven | year = 1986}}</ref> Here <math>I([0,\infty))</math> is the [[indicator function]]  of the positive half-line.  Alternatively, a random variable <math>X</math> supported on the real line is subexponential if and only if <math>X^+ = \max(0,X)</math> is subexponential.
   −
如果分布<math>F I([0,\infty))</math>为实数,则整个实线上的分布<math>F</math>是次指数的。此时<math>I([0,\infty))</math>是正半线的指标函数。 又或者,当且仅当<math>X^+ = \max(0,X)</math>是次指数时,实线上支持的随机变量<math>X</math>才是次指数。
+
如果分布<math>F I([0,\infty))</m4ath>为实数,则<math>F</math>为整个实数上的次指数分布。<ref>{{cite journal | last = Willekens | first =  E. | title = Subexponentiality on the real line | journal = Technical Report | publisher = K.U. Leuven | year = 1986}}</ref>此时<math>I([0,\infty))</math>是正半轴的指标函数。或者,当且仅当<math>X^+ = \max(0,X)</math>是次指数时,实数上支持的随机变量<math>X</math>才是次指数。
      第198行: 第198行:  
All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.
 
All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.
   −
所有次指数分布都是长尾分布,但可以构造非次指数分布的长尾分布示例。
+
所有次指数分布都是长尾分布,但可以构造出非次指数分布的长尾分布的示例。
    
== Common heavy-tailed distributions 常见的重尾分布 ==
 
== Common heavy-tailed distributions 常见的重尾分布 ==
第223行: 第223行:     
All commonly used heavy-tailed distributions are subexponential.[6]
 
All commonly used heavy-tailed distributions are subexponential.[6]
所有常用的重尾分布都是次指数的。
+
 
 +
所有常用的重尾分布都是次指数的。<ref name="Embrechts"/>
    
Those that are one-tailed include:
 
Those that are one-tailed include:
 
单尾的包括:
 
单尾的包括:
* '''<font color="#ff8000"> 帕累托分布Pareto distribution</font>''';
+
* <font color="#ff8000">帕累托分布 Pareto distribution</font>;
* '''<font color="#ff8000"> 对数正态分布Log-normal distribution</font>''';
+
* <font color="#ff8000">对数正态分布 Log-normal distribution</font>;
* '''<font color="#ff8000"> 莱维分布Lévy distribution</font>''';
+
* <font color="#ff8000">莱维分布 Lévy distribution</font>;
* 形状参数大于0但小于1的'''<font color="#ff8000"> 韦布尔分布Weibull distribution</font>''';
+
* 形状参数大于0但小于1的<font color="#ff8000">韦布尔分布 Weibull distribution</font>;
* '''<font color="#ff8000"> 伯尔分布Burr distribution</font>''';
+
* <font color="#ff8000">伯尔分布 Burr distribution</font>;
* '''<font color="#ff8000"> 对数逻辑分布log-logistic distribution</font>''';
+
* <font color="#ff8000">对数逻辑分布 log-logistic distribution</font>;
* '''<font color="#ff8000"> 对数伽玛分布log-gamma distribution</font>''';
+
* <font color="#ff8000">对数伽玛分布 log-gamma distribution</font>;
* '''<font color="#ff8000"> 弗雷歇分布Fréchet distribution</font>''';
+
* <font color="#ff8000">弗雷歇分布 Fréchet distribution</font>;
* '''<font color="#ff8000"> 对数柯西分布log-Cauchy distribution</font>''',有时被描述为“超重尾”分布,因为它表现出对数衰减,从而产生比帕累托分布更重的尾。
+
* <font color="#ff8000">对数柯西分布 log-Cauchy distribution</font>,有时被描述为“超重尾”分布,因为它表现出对数衰减,从而产生比帕累托分布更重的尾。<ref>{{cite book|title=Laws of Small Numbers: Extremes and Rare Events|author=Falk, M., Hüsler, J. & Reiss, R.|page=80|year=2010|publisher=Springer|isbn=978-3-0348-0008-2}}</ref><ref>{{cite web|title=Statistical inference for heavy and super-heavy tailed distributions|url=http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|author=Alves, M.I.F., de Haan, L. & Neves, C.|date=March 10, 2006|access-date=November 1, 2011|archive-url=https://web.archive.org/web/20070623175435/http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|archive-date=June 23, 2007|url-status=dead}}</ref>
 
*  
 
*  
 
Those that are two-tailed include:
 
Those that are two-tailed include:
 
双尾的包括:
 
双尾的包括:
* '''<font color="#ff8000"> 柯西分布Cauchy distribution</font>'''本身就是稳定分布和t分布的特例;
+
* <font color="#ff8000">柯西分布 Cauchy distribution</font>本身就是稳定分布和t分布的特例;
* '''<font color="#ff8000"> 稳定分布族The family of stable distributions</font>''',但该族中正态分布的特殊情况除外。一些稳定的分布是单面的(或由半线的),例如'''<font color="#ff8000"> 莱维分布Lévy distribution</font>'''。另请参见具有长尾分布和波动性聚类的财务模型。
+
* <font color="#ff8000">稳定分布族 The family of stable distributions</font><ref>{{cite web |author=John P. Nolan | title=Stable Distributions: Models for Heavy Tailed Data| year=2009 | url=http://academic2.american.edu/~jpnolan/stable/chap1.pdf | accessdate=2009-02-21}}</ref>,但该族中正态分布的特殊情况除外。一些稳定的分布是单面的(或有半线的支持),例如莱维分布。另请参见具有长尾分布和波动性聚类的财务模型。
 
* t分布
 
* t分布
* 偏对数正态级联分布。
+
*<font color="#ff8000">偏对数正态级联分布 The skew lognormal cascade distribution</font>。<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref>
 +
 
          
== Relationship to fat-tailed distributions 与胖尾分布的关系 ==
 
== Relationship to fat-tailed distributions 与胖尾分布的关系 ==
A [[fat-tailed distribution]] is a distribution for which the probability density function, for large x, goes to zero as a power <math>x^{-a}</math>.  Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed.  Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the [[log-normal distribution]] {{Contradict-inline|article=fat-tailed distribution|reason=Fat-tailed page says log-normals are in fact fat-tailed.|date=June 2019}}.  Many other heavy-tailed distributions such as the [[log-logistic distribution|log-logistic]] and [[Pareto distribution|Pareto]] distribution are, however, also fat-tailed.
+
A [[fat-tailed distribution]] is a distribution for which the probability density function, for large x, goes to zero as a power <math>x^{-a}</math>.  Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed.  Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the [[log-normal distribution]] <ref>{{Contradict-inline|article=fat-tailed distribution|reason=Fat-tailed page says log-normals are in fact fat-tailed.|date=June 2019}}</ref>.  Many other heavy-tailed distributions such as the [[log-logistic distribution|log-logistic]] and [[Pareto distribution|Pareto]] distribution are, however, also fat-tailed.
   −
胖尾分布是这样的分布:对于大x,概率密度函数作为幂<math>x^{-a}</math>变为零。由于幂总是受到指数分布的概率密度函数的限制,因此,胖尾分布始终是重尾分布。但是,某些分布的尾部趋近于零的速率比指数函数慢(表示它们是重尾),而比幂快(表示它们不是胖尾)。例如对数正态分布。当然,许多其他的重尾分布,例如对数逻辑分布和帕累托分布也属于胖尾分布。
+
胖尾分布是这样的分布,对于较大的x,概率密度函数为<math>x^{-a}</math>趋于零。由于这样的幂总是受到指数分布概率密度函数的限制,因此,胖尾分布始终是重尾分布。但是,某些分布的尾部趋近于零的速率比指数函数慢(表示它们是重尾),而比幂快(表示它们不是胖尾)。例如对数正态分布<ref>{{Contradict-inline|article=fat-tailed distribution|reason=Fat-tailed page says log-normals are in fact fat-tailed.|date=June 2019}}</ref>。当然,许多其他的重尾分布,例如对数逻辑分布和帕累托分布也属于胖尾分布。
      第263行: 第265行:  
}}</ref>) approaches to the problem of the tail-index estimation.
 
}}</ref>) approaches to the problem of the tail-index estimation.
   −
对于尾指数估计的问题,有参数方法(参见Emprechts等人)和非参数方法(例如,Novak)两种。
+
对于尾指数估计的问题,有参数方法(参见Emprechts等人<ref name="Embrechts"/>)和非参数方法(例如,Novak<ref name="Novak2011">{{cite book
 +
| author=Novak S.Y.
 +
| title=Extreme value methods with applications to finance
 +
| year=2011
 +
| series=London: CRC
 +
| isbn=978-1-43983-574-6
 +
}}</ref>)两种。
      第269行: 第277行:  
To estimate the tail-index using the parametric approach, some authors employ  [[GEV distribution]] or [[Pareto distribution]]; they may apply the maximum-likelihood estimator (MLE).
 
To estimate the tail-index using the parametric approach, some authors employ  [[GEV distribution]] or [[Pareto distribution]]; they may apply the maximum-likelihood estimator (MLE).
   −
为了使用参数化方法估计尾指数,有些作者采用了GEV分布或帕累托分布;他们可能会运用最大似然估计器(MLE)。
+
为了使用参数化方法估计尾指数,有些作者采用了GEV分布或帕累托分布;他们可能会运用极大似然估计方法(MLE)。
         −
=== Pickand's tail-index estimator Pickand的尾指数估算器===
+
=== Pickand's tail-index estimator   Pickand的尾指数估算器===
    
With <math>(X_n , n \geq 1)</math> a random sequence of independent and same  density function <math>F \in D(H(\xi))</math>, the Maximum Attraction Domain<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref>  of the generalized extreme value density <math> H </math>, where <math>\xi \in \mathbb{R}</math>. If <math>\lim_{n\to\infty} k(n) = \infty  </math> and  <math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>, then the ''Pickands'' tail-index estimation is<ref name="Embrechts"/><ref name="Pickands"/>
 
With <math>(X_n , n \geq 1)</math> a random sequence of independent and same  density function <math>F \in D(H(\xi))</math>, the Maximum Attraction Domain<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref>  of the generalized extreme value density <math> H </math>, where <math>\xi \in \mathbb{R}</math>. If <math>\lim_{n\to\infty} k(n) = \infty  </math> and  <math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>, then the ''Pickands'' tail-index estimation is<ref name="Embrechts"/><ref name="Pickands"/>
   −
对于<math>(X_n , n \geq 1)</math>的独立且相同密度函数<math>F \in D(H(\xi))</math>的随机序列,广义极值密度<math> H </math>的最大吸引域,其中<math>\xi \in \mathbb{R}</math>。如果<math>\lim_{n\to\infty} k(n) = \infty  </math>和<math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>,则Pickands尾部指数估计为
+
对于<math>(X_n , n \geq 1)</math>的独立且相同的密度函数<math>F \in D(H(\xi))</math>的随机序列,是<font color="#ff8000">广义极值密度 the generalized extreme value density </font><math>H</math>的<font color="#ff8000">最大吸引域 the Maximum Attraction Domain </font><ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref>,其中<math>\xi \in \mathbb{R}</math>。如果<math>\lim_{n\to\infty} k(n) = \infty  </math>和<math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>,则Pickands尾部指数估计为<ref name="Embrechts"/><ref name="Pickands"/>
      第291行: 第299行:       −
=== Hill's tail-index estimator 希尔的尾指数估算器 ===
+
=== Hill's tail-index estimator 希尔 Hill的尾指数估算器 ===
    
Let <math>(X_t , t \geq 1)</math> be a sequence of independent and identically distributed random variables with distribution function <math>F \in D(H(\xi))</math>, the maximum domain of attraction of the [[generalized extreme value distribution]] <math> H </math>, where <math>\xi \in \mathbb{R}</math>. The sample path is <math>{X_t: 1 \leq t \leq n}</math> where <math>n</math> is the sample size. If  
 
Let <math>(X_t , t \geq 1)</math> be a sequence of independent and identically distributed random variables with distribution function <math>F \in D(H(\xi))</math>, the maximum domain of attraction of the [[generalized extreme value distribution]] <math> H </math>, where <math>\xi \in \mathbb{R}</math>. The sample path is <math>{X_t: 1 \leq t \leq n}</math> where <math>n</math> is the sample size. If  
 
<math>\{k(n)\}</math> is an intermediate order sequence, i.e. <math>k(n) \in \{1,\ldots,n-1\}, </math>, <math>k(n) \to \infty</math> and  <math>k(n)/n \to 0</math>, then the Hill tail-index estimator is<ref>Hill B.M. (1975) A simple general approach to inference about  the tail of a distribution. Ann. Stat., v. 3, 1163–1174.</ref>
 
<math>\{k(n)\}</math> is an intermediate order sequence, i.e. <math>k(n) \in \{1,\ldots,n-1\}, </math>, <math>k(n) \to \infty</math> and  <math>k(n)/n \to 0</math>, then the Hill tail-index estimator is<ref>Hill B.M. (1975) A simple general approach to inference about  the tail of a distribution. Ann. Stat., v. 3, 1163–1174.</ref>
   −
令<math>(X_t , t \geq 1)</math>为具有分布函数<math>F \in D(H(\xi))</math>独立且均匀分布的随机变量序列,其分布函数为广义极值分布<math> H </math>的最大吸引域,其中<math>\xi \in \mathbb{R}</math>。样本路径为<math>{X_t: 1 \leq t \leq n}</math>,其中<math>n</math>为样本大小。 如果<math>\{k(n)\}</math>是中间阶数序列,即<math>k(n) \in \{1,\ldots,n-1\}, </math>,<math>k(n) \to \infty</math>和<math>k(n)/n \to 0</math>,则希尔尾指数估计器为:
+
令<math>(X_t , t \geq 1)</math>为具有分布函数<math>F \in D(H(\xi))</math>独立且均匀分布的随机变量序列,其分布函数为广义极值分布<math> H </math>的最大吸引域,其中<math>\xi \in \mathbb{R}</math>。样本路径为<math>{X_t: 1 \leq t \leq n}</math>,其中<math>n</math>为样本大小。 如果<math>\{k(n)\}</math>是中间阶数序列,即<math>k(n) \in \{1,\ldots,n-1\}, </math>,<math>k(n) \to \infty</math>和<math>k(n)/n \to 0</math>,则Hill尾指数估计器为<ref>Hill B.M. (1975) A simple general approach to inference about  the tail of a distribution. Ann. Stat., v. 3, 1163–1174.</ref>:
      第308行: 第316行:  
.<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref> Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences,<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref> irrespective of whether <math>X_t</math> is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent.<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref>
 
.<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref> Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences,<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref> irrespective of whether <math>X_t</math> is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent.<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref>
   −
其中<math>X_{(i,n)}</math>是<math>X_1, \dots, X_n</math>的i阶统计量。该估计量收敛于<math>\xi</math>的概率,并且当<math>k(n) \to \infty  </math>基于较高阶的正则变化性质受到限制时,它是渐近正态的。一致性和渐近正态性适用于一大类相关序列和异类序列,它与是否观测到<math>X_t</math>无关,也无关于是否从大量模型和估计量(包括错误指定的模型和具有相关误差的模型)中计算出的残差或滤波数据。
+
其中<math>X_{(i,n)}</math>是<math>X_1, \dots, X_n</math>的第<math>i</math>次序统计量。该估计量依概率收敛于<math>\xi</math>,并且在基于高阶的正则变化性质的情况下,是限制<math>k(n) \to \infty  </math>的渐近正态<ref>Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.</ref>.<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref>。一致性和渐近正态性适用于一大类相关序列和异类序列<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref>,而不管是否观测到<math>X_t</math>,或者来自大量模型和估计量(包括错误指定的模型和具有相关误差的模型)计算出的残差或筛选数据。<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref>
      第319行: 第327行:  
It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".
 
It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".
   −
尾指数的比率估计器(RE估计器)由Goldie和Smith提出。它的构造类似于希尔估计器,但使用了非随机的“调整参数”
+
尾指数的比率估计器(RE估计器)由Goldie和Smith提出<ref>Goldie C.M., Smith R.L. (1987) Slow variation with remainder:
 +
theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.</ref>。它的构造类似于Hill估计器,但使用了非随机的“调整参数”
      第325行: 第334行:  
A comparison of Hill-type and RE-type estimators can be found in Novak.<ref name="Novak2011"/>
 
A comparison of Hill-type and RE-type estimators can be found in Novak.<ref name="Novak2011"/>
   −
在Novak中可以找到希尔型和RE型估计量的比较。
+
在Novak中可以找到Hill型和RE型估计量的比较。<ref name="Novak2011"/>
    
=== Software 应用软件===
 
=== Software 应用软件===
 
* [http://www.cs.bu.edu/~crovella/aest.html aest], [[C (programming language)|C]] tool for estimating the heavy-tail index.<ref>{{Cite journal | last1 = Crovella | first1 = M. E. | last2 = Taqqu | first2 = M. S. | title = Estimating the Heavy Tail Index from Scaling Properties| journal = Methodology and Computing in Applied Probability | volume = 1 | pages = 55–79 | year = 1999 | doi = 10.1023/A:1010012224103 | url = http://www.cs.bu.edu/~crovella/paper-archive/aest.ps| pmid =  | pmc = }}</ref>
 
* [http://www.cs.bu.edu/~crovella/aest.html aest], [[C (programming language)|C]] tool for estimating the heavy-tail index.<ref>{{Cite journal | last1 = Crovella | first1 = M. E. | last2 = Taqqu | first2 = M. S. | title = Estimating the Heavy Tail Index from Scaling Properties| journal = Methodology and Computing in Applied Probability | volume = 1 | pages = 55–79 | year = 1999 | doi = 10.1023/A:1010012224103 | url = http://www.cs.bu.edu/~crovella/paper-archive/aest.ps| pmid =  | pmc = }}</ref>
   −
* 用于估计重尾指数的软件aest和C 。
+
* 用于估计重尾指数的软件[http://www.cs.bu.edu/~crovella/aest.html aest]和C。<ref>{{Cite journal | last1 = Crovella | first1 = M. E. | last2 = Taqqu | first2 = M. S. | title = Estimating the Heavy Tail Index from Scaling Properties| journal = Methodology and Computing in Applied Probability | volume = 1 | pages = 55–79 | year = 1999 | doi = 10.1023/A:1010012224103 | url = http://www.cs.bu.edu/~crovella/paper-archive/aest.ps| pmid =  | pmc = }}</ref>
    
== Estimation of heavy-tailed density 重尾密度的估计 ==
 
== Estimation of heavy-tailed density 重尾密度的估计 ==
第355行: 第364行:  
}}</ref>
 
}}</ref>
   −
Markovich中给出了估计重尾和超重尾概率密度函数的非参数方法。这些是基于可变带宽和长尾核估计器的方法。将初步数据以有限或无限间隔变换为新的随机变量,这样更便于估计,然后对获得的密度估计进行逆变换;以及“拼合方法”,它为密度的尾部提供了一定的参数模型,并为近似密度的模式提供了非参数模型。非参数估计器需要适当选择调整(平滑)参数,例如内核估计器的带宽和直方图的bin宽度。这种选择大众化数据驱动方法是基于均方误差(MSE)及其渐近及其上限的最小化的交叉验证及修改方法。通过使用著名的非参数统计数据(例如Kolmogorov-Smirnov's,von Mises和Anderson-Darling的统计数据)作为分布函数(dfs)空间中的度量,并将后来的统计数据的分位数作为已知的不确定性或差异值,来寻找差异。Bootstrap是另一种工具,可以通过不同的重采样选择方案使用未知MSE的近似值来查找平滑参数。
+
Markovich中给出了估计重尾和超重尾概率密度函数的非参数方法。<ref name="Markovich2007">{{cite book
 +
| author=Markovich N.M.
 +
| title=Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice
 +
| year=2007
 +
| series=Chitester: Wiley
 +
| isbn=978-0-470-72359-3
 +
}}</ref>这些是基于<font color="#ff8000">可变带宽 variable bandwidth</font>和<font color="#ff8000">长尾核估计器 long-tailed kernel estimators</font>的方法。将初步数据以有限或无限间隔变换为新的随机变量,这样更便于估计,然后对获得的密度估计进行逆变换;以及“拼合方法”,它为密度的尾部提供了确定的参数模型,并为近似密度模型提供了非参数模型。非参数估计器需要适当选择调整(平滑)参数,例如内核估计器的带宽和直方图的组距。这种选择大众化数据驱动方法是基于均方误差(MSE)及其渐近或上限的最小化的交叉验证及修改方法。<ref name="WandJon1995">{{cite book
 +
| author=Wand M.P., Jones M.C.
 +
| title=Kernel smoothing
 +
| year=1995
 +
| series=New York: Chapman and Hall
 +
| isbn=978-0412552700
 +
}}</ref>可以找到一种差异方法,通过使用著名的非参数统计数据(例如Kolmogorov-Smirnov's,von Mises和Anderson-Darling的统计量)作为分布函数(dfs)空间中的度量,并将后来的统计量的分位数作为已知的不确定性或差异值。<ref name="Markovich2007"/><font color="#ff8000">自助法 Bootstrap</font>是另一种工具,可以通过不同的重抽样方案使用未知MSE的近似值来查找平滑参数。<ref name="Hall1992">{{cite book
 +
| author=Hall P.
 +
| title=The Bootstrap and Edgeworth Expansion
 +
| year=1992
 +
| series=Springer
 +
| isbn=9780387945088
 +
}}</ref>
      第370行: 第397行:       −
* '''<font color="#ff8000"> 尖峭态分布Leptokurtic distribution</font>'''
+
* <font color="#ff8000">尖峭态分布 Leptokurtic distribution</font>
* '''<font color="#ff8000"> 广义极值分布Generalized extreme value distribution</font>'''
+
* <font color="#ff8000">广义极值分布 Generalized extreme value distribution</font>
* '''<font color="#ff8000"> 离群值Outlier</font>'''
+
* <font color="#ff8000">离群值 Outlier</font>
* '''<font color="#ff8000"> 长尾Long tail</font>'''
+
* <font color="#ff8000">长尾 Long tail</font>
* '''<font color="#ff8000"> 幂律Power law</font>'''
+
* <font color="#ff8000">幂律 Power law</font>
* '''<font color="#ff8000"> 随机的七个状态Seven states of randomness</font>'''
+
* <font color="#ff8000">随机的七个状态 Seven states of randomness</font>
* '''<font color="#ff8000"> 胖尾分布Fat-tailed distribution</font>'''
+
* <font color="#ff8000">胖尾分布 Fat-tailed distribution</font>
**'''<font color="#ff8000"> 塔勒布分布Taleb distribution</font>''''''<font color="#ff8000"> 圣杯分布Holy grail distribution</font>'''
+
**<font color="#ff8000">塔勒布分布 Taleb distribution</font>和<font color="#ff8000">圣杯分布 Holy grail distribution</font>
     
16

个编辑