更改

删除915字节、 2022年3月15日 (二) 16:03

无编辑摘要

第5行：第5行： −

在概率论中，~~~~重尾分布 Heavy-tailed distributions~~~~是指其尾部呈现出不受指数限制的概率分布<ref name="Asmussen">{{Cite book | doi = 10.1007/0-387-21525-5_10 | first = S. R. | last = Asmussen| chapter = Steady-State Properties of GI/G/1 | title = Applied Probability and Queues | series = Stochastic Modelling and Applied Probability | volume = 51 | pages = 266–301 | year = 2003 | isbn = 978-0-387-00211-8 | pmid = | pmc = }}</ref>：也就是说，它们的尾部比~~~~指数分布 exponential distribution ~~~~ “重”。在许多应用中，关注的是分布的右尾，但是分布的左尾可能也很重，或者两个尾都很重。

+

在概率论中，'''重尾分布 Heavy-tailed distributions'''是指其尾部呈现出不受指数限制的概率分布<ref name="Asmussen">{{Cite book | doi = 10.1007/0-387-21525-5_10 | first = S. R. | last = Asmussen| chapter = Steady-State Properties of GI/G/1 | title = Applied Probability and Queues | series = Stochastic Modelling and Applied Probability | volume = 51 | pages = 266–301 | year = 2003 | isbn = 978-0-387-00211-8 | pmid = | pmc = }}</ref>：也就是说，它们的尾部比'''指数分布 exponential distribution ''' “重”。在许多应用中，关注的是分布的右尾，但是分布的左尾可能也很重，或者两个尾都很重。

+

重尾分布有三个重要的子类：'''胖尾分布 Fat-tailed distribution'''，'''长尾分布 Long-tailed distribution'''和'''次指数分布 Subexponential distributions'''。实际上，所有常用的重尾分布都属于'''次指数分布类 subexponential class '''。

−

重尾分布有三个重要的子类：胖尾分布 Fat-tailed distribution，长尾分布 Long-tailed distribution和次指数分布 Subexponential distributions。实际上，所有常用的重尾分布都属于次指数分布类 subexponential class 。

−

+

在使用“重尾 Heavy-tailed”一词时仍存在一些歧义。于是就出现了另外两种定义。

−

~~在使用“重尾”~~ Heavy-~~tailed一词时仍存在一些歧义。于是就出现了另外两种定义。~~

有一些作者使用该术语来指代并非所有阶矩都是有限的那些分布，也有一些作者使用这个术语来指代那些没有有限方差的分布。

−

在这里，给出的是最常用的定义，包括其他定义所涵盖的所有分布，以及具有所有幂矩但通常被认为是重尾分布的~~~~对数正态分布 long-normal distributions ~~~~。（有时“重尾”用于任何具有比正态分布更重的尾巴的分布。）

+

在这里，给出的是最常用的定义，包括其他定义所涵盖的所有分布，以及具有所有幂矩但通常被认为是重尾分布的'''对数正态分布 long-normal distributions '''。（有时“重尾”用于任何具有比正态分布更重的尾巴的分布。）

== 定义 ==

−

=== 重尾分布的定义 ===

−

如果''<math>X</math>''的矩母函数, ''MX''(''<math>t</math>'')对于所有''<math>t</math>'' > 0都是无限的，则具有分布函数''<math>F</math>''的随机变量''<math>X</math>''的分布被称为重尾（右）。<ref name="ReferenceA">Rolski, Schmidli, Scmidt, Teugels, ''Stochastic Processes for Insurance and Finance'', 1999</ref>

第44行：第38行： −

也可以写成~~~~尾分布函数 the tail distribution function ~~~~：

+

也可以写成'''尾分布函数 the tail distribution function '''：

<math>

第51行：第45行：

</math>

−

第61行：第54行：

</math>

+

=== 长尾分布的定义 ===

−

分布函数为<math>F</math>的随机变量<math>X</math>具有长尾分布，如果对于所有<math>t>0</math>，都满足

第87行：第79行：

所有长尾分布都是重尾分布，但反过来不一定成立，且可以构造出非长尾分布的重尾分布。

+

===次指数分布 ===

−

+

次指数性是根据概率分布的'''卷积 Convolution '''定义的。对于具有共同分布函数<math>F</math>的两个独立同分布的随机变量<math> X_1,X_2</math>，<math>F</math>与自身的卷积，<math>F^{*2}</math>是二重卷积，使用Lebesgue–Stieltjes积分，方法如下：

−

次指数性是根据概率分布的~~~~卷积 Convolution ~~~~定义的。对于具有共同分布函数<math>F</math>的两个独立同分布的随机变量<math> X_1,X_2</math>，<math>F</math>与自身的卷积，<math>F^{*2}</math>是二重卷积，使用Lebesgue–Stieltjes积分，方法如下：

第107行：第98行：

尾分布函数<math>\overline{F}</math>定义为<math>\overline{F}(x) = 1-F(x)</math>。

−

第135行：第124行： −

这通常被称为~~~~单跳 single big jump~~~~<ref>{{Cite journal | last1 = Foss | first1 = S. | last2 = Konstantopoulos | first2 = T. | last3 = Zachary | first3 = S. | doi = 10.1007/s10959-007-0081-2 | title = Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments | journal = Journal of Theoretical Probability| volume = 20 | issue = 3 | pages = 581 | year = 2007 | arxiv = math/0509605| pmid = | url = http://www.math.nsc.ru/LBRT/v1/foss/fkz_revised.pdf| pmc = | citeseerx = 10.1.1.210.1699 }}</ref>或~~~~突变理论 catastrophe principle~~~~ <ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 9, 2014 | accessdate = January 9, 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>。

+

这通常被称为'''单跳 single big jump'''<ref>{{Cite journal | last1 = Foss | first1 = S. | last2 = Konstantopoulos | first2 = T. | last3 = Zachary | first3 = S. | doi = 10.1007/s10959-007-0081-2 | title = Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments | journal = Journal of Theoretical Probability| volume = 20 | issue = 3 | pages = 581 | year = 2007 | arxiv = math/0509605| pmid = | url = http://www.math.nsc.ru/LBRT/v1/foss/fkz_revised.pdf| pmc = | citeseerx = 10.1.1.210.1699 }}</ref>或'''突变理论 catastrophe principle''' <ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 9, 2014 | accessdate = January 9, 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>。

第142行：第131行：

所有次指数分布都是长尾分布，但可以构造出非次指数分布的长尾分布的示例。

+

== 常见的重尾分布 ==

所有常用的重尾分布都是次指数的。<ref name="Embrechts"/>

+

单尾的包括：

−

* ~~~~帕累托分布 Pareto distribution~~~~;

+

* '''帕累托分布 Pareto distribution''';

−

* ~~~~对数正态分布 Log-normal distribution~~~~;

+

* '''对数正态分布 Log-normal distribution''';

−

* ~~~~莱维分布 Lévy distribution~~~~;

+

* '''莱维分布 Lévy distribution''';

−

* 形状参数大于0但小于1的~~~~韦布尔分布 Weibull distribution~~~~;

+

* 形状参数大于0但小于1的'''韦布尔分布 Weibull distribution''';

−

* ~~~~伯尔分布 Burr distribution~~~~;

+

* '''伯尔分布 Burr distribution''';

−

* ~~~~对数逻辑分布 log-logistic distribution~~~~;

+

* '''对数逻辑分布 log-logistic distribution''';

−

* ~~~~对数伽玛分布 log-gamma distribution~~~~;

+

* '''对数伽玛分布 log-gamma distribution''';

−

* ~~~~弗雷歇分布 Fréchet distribution~~~~;

+

* '''弗雷歇分布 Fréchet distribution''';

−

* ~~~~对数柯西分布 log-Cauchy distribution~~~~，有时被描述为“超重尾”分布，因为它表现出对数衰减，从而产生比帕累托分布更重的尾。<ref>{{cite book|title=Laws of Small Numbers: Extremes and Rare Events|author=Falk, M., Hüsler, J. & Reiss, R.|page=80|year=2010|publisher=Springer|isbn=978-3-0348-0008-2}}</ref><ref>{{cite web|title=Statistical inference for heavy and super-heavy tailed distributions|url=http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|author=Alves, M.I.F., de Haan, L. & Neves, C.|date=March 10, 2006|access-date=November 1, 2011|archive-url=https://web.archive.org/web/20070623175435/http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|archive-date=June 23, 2007|url-status=dead}}</ref>

+

* '''对数柯西分布 log-Cauchy distribution'''，有时被描述为“超重尾”分布，因为它表现出对数衰减，从而产生比帕累托分布更重的尾。<ref>{{cite book|title=Laws of Small Numbers: Extremes and Rare Events|author=Falk, M., Hüsler, J. & Reiss, R.|page=80|year=2010|publisher=Springer|isbn=978-3-0348-0008-2}}</ref><ref>{{cite web|title=Statistical inference for heavy and super-heavy tailed distributions|url=http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|author=Alves, M.I.F., de Haan, L. & Neves, C.|date=March 10, 2006|access-date=November 1, 2011|archive-url=https://web.archive.org/web/20070623175435/http://docentes.deio.fc.ul.pt/fragaalves/SuperHeavy.pdf|archive-date=June 23, 2007|url-status=dead}}</ref>

−

*

+

双尾的包括：

−

* ~~~~柯西分布 Cauchy distribution~~~~本身就是稳定分布和t分布的特例；

+

* '''柯西分布 Cauchy distribution'''本身就是稳定分布和t分布的特例；

−

* ~~~~稳定分布族 The family of stable distributions~~~~<ref>{{cite web |author=John P. Nolan | title=Stable Distributions: Models for Heavy Tailed Data| year=2009 | url=http://academic2.american.edu/~jpnolan/stable/chap1.pdf | accessdate=2009-02-21}}</ref>，但该族中正态分布的特殊情况除外。一些稳定的分布是单面的（或以是半轴为支持），例如莱维分布。另请参见具有长尾分布和波动性聚类的财务模型。

+

* '''稳定分布族 The family of stable distributions'''<ref>{{cite web |author=John P. Nolan | title=Stable Distributions: Models for Heavy Tailed Data| year=2009 | url=http://academic2.american.edu/~jpnolan/stable/chap1.pdf | accessdate=2009-02-21}}</ref>，但该族中正态分布的特殊情况除外。一些稳定的分布是单面的（或以是半轴为支持），例如莱维分布。另请参见具有长尾分布和波动性聚类的财务模型。

* t分布

−

*~~~~偏对数正态级联分布 The skew lognormal cascade distribution~~~~。<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref>

+

*'''偏对数正态级联分布 The skew lognormal cascade distribution'''。<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref>

+

==与胖尾分布的关系 ==

第169行：第162行：

但是，某些分布的尾部趋近于零的速率比指数函数慢（表示它们是重尾），而比幂快（表示它们不是胖尾）。例如对数正态分布<ref>Stephen Lihn (2009). "Skew Lognormal Cascade Distribution". Archived from the original on 2014-04-07. Retrieved 2009-06-12.</ref>。当然，许多其他的重尾分布，例如对数逻辑分布和帕累托分布也属于胖尾分布。

+

== 尾指数估计 ==

−

对于尾指数估计的问题，有参数方法（参见Emprechts等人<ref name="Embrechts"/>）和非参数方法（例如，Novak<ref name="Novak2011">{{cite book

| author=Novak S.Y.

第180行：第172行：

| isbn=978-1-43983-574-6

}}</ref>）两种。

−

为了使用参数化方法估计尾指数，有些作者采用了GEV分布或帕累托分布；他们可能会运用极大似然估计方法（MLE）。

−

=== Pickand的尾指数估计===

−

+

对于<math>(X_n , n \geq 1)</math>的独立且相同的密度函数<math>F \in D(H(\xi))</math>的随机序列，是'''广义极值密度 the generalized extreme value density '''<math>H</math>的'''最大吸引域 the Maximum Attraction Domain '''<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref>，其中<math>\xi \in \mathbb{R}</math>。如果<math>\lim_{n\to\infty} k(n) = \infty </math>和<math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>，则Pickands尾部指数估计为<ref name="Embrechts"/><ref name="Pickands"/>

−

对于<math>(X_n , n \geq 1)</math>的独立且相同的密度函数<math>F \in D(H(\xi))</math>的随机序列，是~~~~广义极值密度 the generalized extreme value density ~~~~<math>H</math>的~~~~最大吸引域 the Maximum Attraction Domain ~~~~<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref>，其中<math>\xi \in \mathbb{R}</math>。如果<math>\lim_{n\to\infty} k(n) = \infty </math>和<math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>，则Pickands尾部指数估计为<ref name="Embrechts"/><ref name="Pickands"/>

第198行：第186行： +

其中<math>X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots ,X_{n}\right)</math>。此估计量的概率依概率收敛到<math>\xi</math>。

−

~~其中<math>X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots ,X_{n}\right)</math>。此估计量的概率依概率收敛到<math>\xi</math>。~~

=== Hill的尾指数估计 ===

−

令<math>(X_t , t \geq 1)</math>为具有分布函数<math>F \in D(H(\xi))</math>独立同分布的随机变量序列，其分布函数为广义极值分布<math> H </math>的最大吸引域，其中<math>\xi \in \mathbb{R}</math>。样本路径为<math>{X_t: 1 \leq t \leq n}</math>，其中<math>n</math>为样本大小。如果<math>\{k(n)\}</math>是中间阶数序列，即<math>k(n) \in \{1,\ldots,n-1\}, </math>，<math>k(n) \to \infty</math>和<math>k(n)/n \to 0</math>，则Hill尾指数估计器为<ref>Hill B.M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., v. 3, 1163–1174.</ref>：

第212行：第198行： +

其中<math>X_{(i,n)}</math>是<math>X_1, \dots, X_n</math>的第<math>i</math>次序统计量。该估计量随着<math>k(n) \to \infty </math>依概率收敛于<math>\xi</math>，并且在高阶的正则变化性质的限制条件下，是渐近正态的。<ref>Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.</ref>.<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref>。一致性和渐近正态性适用于一大类相关序列和异类序列<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref>，不管<math>X_t</math>是否被观测到，或者是来自大量模型和估计量（包括错误指定的模型和具有相关误差的模型）计算出的残差或筛选数据。<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref>

−

其中<math>X_{(i,n)}</math>是<math>X_1, \dots, X_n</math>的第<math>i</math>次序统计量。该估计量随着<math>k(n) \to \infty </math>依概率收敛于<math>\xi</math>，并且在高阶的正则变化性质的限制条件下，是渐近正态的。<ref>Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.</ref>.<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref>。一致性和渐近正态性适用于一大类相关序列和异类序列<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref>，不管<math>X_t</math>是否被观测到，或者是来自大量模型和估计量（包括错误指定的模型和具有相关误差的模型）计算出的残差或筛选数据。<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref>

===尾部指数的比率估计量 ===

−

尾指数的比率估计器（RE估计器）由Goldie和Smith提出<ref>Goldie C.M., Smith R.L. (1987) Slow variation with remainder:

theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.</ref>。它的构造类似于Hill估计器，但使用了非随机的“调整参数”

+

在Novak中可以找到Hill型和RE型估计量的比较。<ref name="Novak2011"/>

−

~~在Novak中可以找到Hill型和RE型估计量的比较。<ref name="Novak2011"/>~~

===应用软件===

* 用于估计重尾指数的软件[http://www.cs.bu.edu/~crovella/aest.html aest]和C。<ref>{{Cite journal | last1 = Crovella | first1 = M. E. | last2 = Taqqu | first2 = M. S. | title = Estimating the Heavy Tail Index from Scaling Properties| journal = Methodology and Computing in Applied Probability | volume = 1 | pages = 55–79 | year = 1999 | doi = 10.1023/A:1010012224103 | url = http://www.cs.bu.edu/~crovella/paper-archive/aest.ps| pmid = | pmc = }}</ref>

+

== 重尾密度的估计 ==

−

Markovich中给出了估计重尾和超重尾概率密度函数的非参数方法<ref name="Markovich2007">{{cite book

| author=Markovich N.M.

第243行：第224行： −

这些是基于~~~~可变带宽 variable bandwidth~~~~和~~~~长尾核估计 long-tailed kernel estimators~~~~的方法。将初步数据以有限或无限间隔变换为新的随机变量，这样更便于估计，然后对获得的密度估计进行逆变换；以及“拼合方法”，它为密度的尾部提供了确定的参数模型，并为近似密度模型提供了非参数模型。非参数估计器需要适当选择调整（平滑）参数，例如核估计的带宽和直方图的组距。这种选择众所周知的数据驱动方法是基于最小均方误差及它的渐近性和上界的交叉验证及修改方法。<ref name="WandJon1995">{{cite book

+

这些是基于'''可变带宽 variable bandwidth'''和'''长尾核估计 long-tailed kernel estimators'''的方法。将初步数据以有限或无限间隔变换为新的随机变量，这样更便于估计，然后对获得的密度估计进行逆变换；以及“拼合方法”，它为密度的尾部提供了确定的参数模型，并为近似密度模型提供了非参数模型。非参数估计器需要适当选择调整（平滑）参数，例如核估计的带宽和直方图的组距。这种选择众所周知的数据驱动方法是基于最小均方误差及它的渐近性和上界的交叉验证及修改方法。<ref name="WandJon1995">{{cite book

| author=Wand M.P., Jones M.C.

| title=Kernel smoothing

第252行：第233行： −

可以找到一种差异方法，通过使用众所周知的非参数统计量（例如Kolmogorov-Smirnov's，von Mises和Anderson-Darling的统计量）作为分布函数（dfs）空间中的度量，并将后来的统计量的分位数作为已知的不确定性或差异值。<ref name="Markovich2007"/>~~~~自助法 Bootstrap~~~~是另一种工具，可以通过不同的重抽样方案使用未知MSE的近似值来查找平滑参数。<ref name="Hall1992">{{cite book

+

可以找到一种差异方法，通过使用众所周知的非参数统计量（例如Kolmogorov-Smirnov's，von Mises和Anderson-Darling的统计量）作为分布函数（dfs）空间中的度量，并将后来的统计量的分位数作为已知的不确定性或差异值。<ref name="Markovich2007"/>'''自助法 Bootstrap'''是另一种工具，可以通过不同的重抽样方案使用未知MSE的近似值来查找平滑参数。<ref name="Hall1992">{{cite book

| author=Hall P.

| title=The Bootstrap and Edgeworth Expansion

第259行：第240行：

| isbn=9780387945088

}}</ref>

+

== 其他参考资料 ==

+

* '''尖峭态分布 Leptokurtic distribution'''

+

* '''广义极值分布 Generalized extreme value distribution'''

+

* '''离群值 Outlier'''

+

* '''长尾 Long tail'''

+

* '''[[幂律]] Power law'''

+

* '''随机的七个状态 Seven states of randomness'''

+

* '''胖尾分布 Fat-tailed distribution'''

+

**'''塔勒布分布 Taleb distribution'''和'''圣杯分布 Holy grail distribution'''

−

+

==参考文献 ==

−

* 尖峭态分布 Leptokurtic distribution

−

* 广义极值分布 Generalized extreme value distribution

−

* 离群值 Outlier

−

* 长尾 Long tail

−

* [[幂律]] Power law

−

* 随机的七个状态 Seven states of randomness

−

* 胖尾分布 Fat-tailed distribution

−

**塔勒布分布 Taleb distribution和圣杯分布 Holy grail distribution

−

~~== References~~ 参考文献 ==

第279行：第259行：

==编者推荐==

−

===课程推荐===

薄荷

7,129

个编辑

更改

重尾分布 (查看源代码)

2022年3月15日 (二) 16:03的版本