重尾分布

在概率论中，重尾分布 Heavy-tailed distributions是指其尾部呈现出不受指数限制的概率分布^[1]：也就是说，它们的尾部比指数分布 exponential distribution “重”。在许多应用中，关注的是分布的右尾，但是分布的左尾可能也很重，或者两个尾都很重。

重尾分布有三个重要的子类：胖尾分布 Fat-tailed distribution，长尾分布 Long-tailed distribution和次指数分布 Subexponential distributions。实际上，所有常用的重尾分布都属于次指数分布类 subexponential class 。

在使用“重尾” Heavy-tailed一词时仍存在一些歧义。于是就出现了另外两种定义。

有一些作者使用该术语来指代并非所有幂矩都是有限的那些分布，也有一些作者使用这个术语来指代那些没有有限方差的分布。

在这里，给出的是最常用的定义，包括替代定义所涵盖的所有分布，以及具有所有幂矩但通常被认为是重尾分布的对数正态分布 long-normal distributions 。（有时“重尾”用于任何具有比正态分布更重的尾巴的分布。）

定义

重尾分布的定义

如果[math]\displaystyle{ X }[/math]的矩生成函数, [math]\displaystyle{ M\lt sub\gt X\lt /sub\gt }[/math]([math]\displaystyle{ t }[/math])对于所有[math]\displaystyle{ t }[/math] > 0都是无限的，则具有分布函数[math]\displaystyle{ F }[/math]的随机变量[math]\displaystyle{ X }[/math]的分布被称为重尾（右）。^[2]

也就是说:[math]\displaystyle{ \int_{-\infty}^\infty e^{t x} \,dF(x) = \infty \quad \mbox{for all } t\gt 0. }[/math]

这意味着:[math]\displaystyle{ \lim_{x \to \infty} e^{t x}\Pr[X\gt x] = \infty \quad \mbox{for all } t\gt 0.\, }[/math]

也可以写成尾分布函数 the tail distribution function ：

[math]\displaystyle{ \overline{F}(x) ≡ \Pr[X\gt x] }[/math]

as

[math]\displaystyle{ \lim_{x \to \infty} e^{t x}\overline{F}(x) = \infty \quad \mbox{for all } t \gt 0.\, }[/math]

长尾分布的定义

如果对于所有t>0，则称具有分布函数F的随机变量X的分布为有较长的右尾，

[math]\displaystyle{ \lim_{x \to \infty} \Pr[X\gt x+t\mid X\gt x] =1, \, }[/math]

或等同于

[math]\displaystyle{ \overline{F}(x+t) \sim \overline{F}(x) \quad \mbox{as } x \to \infty. \, }[/math]

对于右尾长尾分布量具有直观的解释，即如果长尾量超过某个高水平，则概率将接近1，它将超过其他更高的水平。

所有长尾分布都是重尾分布，但反过来不一定成立，且可以构造出非长尾分布的重尾分布。

次指数分布

次指数性是根据概率分布的卷积 Convolution 定义的。对于具有共同分布函数[math]\displaystyle{ F }[/math]的两个独立且分布均匀的随机变量[math]\displaystyle{ X_1,X_2 }[/math]，[math]\displaystyle{ F }[/math]与自身的卷积，[math]\displaystyle{ F^{*2} }[/math]是卷积的平方，使用Lebesgue–Stieltjes积分，方法如下：

[math]\displaystyle{ \Pr[X_1+X_2 \leq x] = F^{*2}(x) = \int_{0}^x F(x-y)\,dF(y), }[/math]

n倍卷积[math]\displaystyle{ F^{*n} }[/math]定义如下：

[math]\displaystyle{ F^{*n}(x) = \int_{0}^x F(x-y)\,dF^{*n-1}(y). }[/math]

尾分布函数[math]\displaystyle{ \overline{F} }[/math]定义为[math]\displaystyle{ \overline{F}(x) = 1-F(x) }[/math]。

如果满足以下条件，则正半线上的分布[math]\displaystyle{ F }[/math]为次指数^[1]^[3]^[4]

[math]\displaystyle{ \overline{F^{*2}}(x) \sim 2\overline{F}(x) \quad \mbox{as } x \to \infty. }[/math]

这意味着^[5]，对于任何[math]\displaystyle{ n \geq 1 }[/math]，

[math]\displaystyle{ \overline{F^{*n}}(x) \sim n\overline{F}(x) \quad \mbox{as } x \to \infty. }[/math]

对此的概率解释^[5]是，对于具有共同分布[math]\displaystyle{ F }[/math]的[math]\displaystyle{ n }[/math]个独立随机变量[math]\displaystyle{ X_1,\ldots,X_n }[/math]的总和

[math]\displaystyle{ \Pr[X_1+ \cdots +X_n\gt x] \sim \Pr[\max(X_1, \ldots,X_n)\gt x] \quad \text{as } x \to \infty. }[/math]

这通常被称为单跳 single big jump^[6]或突变理论 catastrophe principle ^[7]。

如果分布[math]\displaystyle{ F I([0,\infty))\lt /m4ath\gt 为实数，则\lt math\gt F }[/math]为整个实数上的次指数分布。^[8]此时[math]\displaystyle{ I([0,\infty)) }[/math]是正半轴的指标函数。或者，当且仅当[math]\displaystyle{ X^+ = \max(0,X) }[/math]是次指数时，实数上支持的随机变量[math]\displaystyle{ X }[/math]才是次指数。

所有次指数分布都是长尾分布，但可以构造出非次指数分布的长尾分布的示例。

常见的重尾分布

所有常用的重尾分布都是次指数的。^[5]

Those that are one-tailed include: 单尾的包括：

帕累托分布 Pareto distribution;
对数正态分布 Log-normal distribution;
莱维分布 Lévy distribution;
形状参数大于0但小于1的韦布尔分布 Weibull distribution;
伯尔分布 Burr distribution;
对数逻辑分布 log-logistic distribution;
对数伽玛分布 log-gamma distribution;
弗雷歇分布 Fréchet distribution;
对数柯西分布 log-Cauchy distribution，有时被描述为“超重尾”分布，因为它表现出对数衰减，从而产生比帕累托分布更重的尾。^[9]^[10]

Those that are two-tailed include: 双尾的包括：

柯西分布 Cauchy distribution本身就是稳定分布和t分布的特例；
稳定分布族 The family of stable distributions^[11]，但该族中正态分布的特殊情况除外。一些稳定的分布是单面的（或有半线的支持），例如莱维分布。另请参见具有长尾分布和波动性聚类的财务模型。
t分布
偏对数正态级联分布 The skew lognormal cascade distribution。^[12]

与胖尾分布的关系

胖尾分布是这样的分布，对于较大的x，概率密度函数为[math]\displaystyle{ x^{-a} }[/math]趋于零。由于这样的幂总是受到指数分布概率密度函数的限制，因此，胖尾分布始终是重尾分布。但是，某些分布的尾部趋近于零的速率比指数函数慢（表示它们是重尾），而比幂快（表示它们不是胖尾）。例如对数正态分布^[13]。当然，许多其他的重尾分布，例如对数逻辑分布和帕累托分布也属于胖尾分布。

尾指数估计

对于尾指数估计的问题，有参数方法（参见Emprechts等人^[5]）和非参数方法（例如，Novak^[14]）两种。

为了使用参数化方法估计尾指数，有些作者采用了GEV分布或帕累托分布；他们可能会运用极大似然估计方法（MLE）。

Pickand的尾指数估算器

对于[math]\displaystyle{ (X_n , n \geq 1) }[/math]的独立且相同的密度函数[math]\displaystyle{ F \in D(H(\xi)) }[/math]的随机序列，是广义极值密度 the generalized extreme value density [math]\displaystyle{ H }[/math]的最大吸引域 the Maximum Attraction Domain ^[15]，其中[math]\displaystyle{ \xi \in \mathbb{R} }[/math]。如果[math]\displaystyle{ \lim_{n\to\infty} k(n) = \infty }[/math]和[math]\displaystyle{ \lim_{n\to\infty} \frac{k(n)}{n}= 0 }[/math]，则Pickands尾部指数估计为^[5]^[15]

[math]\displaystyle{ \xi^\text{Pickands}_{(k(n),n)} =\frac{1}{\ln 2} \ln \left( \frac{X_{(n-k(n)+1,n)} - X_{(n-2k(n)+1,n)}}{X_{(n-2k(n)+1,n)} - X_{(n-4k(n)+1,n)}}\right) }[/math]

其中[math]\displaystyle{ X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots ,X_{n}\right) }[/math]。此估计量的概率收敛到[math]\displaystyle{ \xi }[/math]。

Hill的尾指数估算器

令[math]\displaystyle{ (X_t , t \geq 1) }[/math]为具有分布函数[math]\displaystyle{ F \in D(H(\xi)) }[/math]独立且均匀分布的随机变量序列，其分布函数为广义极值分布[math]\displaystyle{ H }[/math]的最大吸引域，其中[math]\displaystyle{ \xi \in \mathbb{R} }[/math]。样本路径为[math]\displaystyle{ {X_t: 1 \leq t \leq n} }[/math]，其中[math]\displaystyle{ n }[/math]为样本大小。如果[math]\displaystyle{ \{k(n)\} }[/math]是中间阶数序列，即[math]\displaystyle{ k(n) \in \{1,\ldots,n-1\}, }[/math]，[math]\displaystyle{ k(n) \to \infty }[/math]和[math]\displaystyle{ k(n)/n \to 0 }[/math]，则Hill尾指数估计器为^[16]：

[math]\displaystyle{ \xi^\text{Hill}_{(k(n),n)} = \left(\frac 1 {k(n)} \sum_{i=n-k(n)+1}^n \ln(X_{(i,n)}) - \ln (X_{(n-k(n)+1,n)})\right)^{-1}, }[/math]

其中[math]\displaystyle{ X_{(i,n)} }[/math]是[math]\displaystyle{ X_1, \dots, X_n }[/math]的第[math]\displaystyle{ i }[/math]次序统计量。该估计量依概率收敛于[math]\displaystyle{ \xi }[/math]，并且在基于高阶的正则变化性质的情况下，是限制[math]\displaystyle{ k(n) \to \infty }[/math]的渐近正态^[17].^[18]。一致性和渐近正态性适用于一大类相关序列和异类序列^[19]^[20]，而不管是否观测到[math]\displaystyle{ X_t }[/math]，或者来自大量模型和估计量（包括错误指定的模型和具有相关误差的模型）计算出的残差或筛选数据。^[21]^[22]^[23]

尾部指数的比率估计器

尾指数的比率估计器（RE估计器）由Goldie和Smith提出^[24]。它的构造类似于Hill估计器，但使用了非随机的“调整参数”

在Novak中可以找到Hill型和RE型估计量的比较。^[14]

应用软件

用于估计重尾指数的软件aest和C。^[25]

重尾密度的估计

Markovich中给出了估计重尾和超重尾概率密度函数的非参数方法。^[26]这些是基于可变带宽 variable bandwidth和长尾核估计器 long-tailed kernel estimators的方法。将初步数据以有限或无限间隔变换为新的随机变量，这样更便于估计，然后对获得的密度估计进行逆变换；以及“拼合方法”，它为密度的尾部提供了确定的参数模型，并为近似密度模型提供了非参数模型。非参数估计器需要适当选择调整（平滑）参数，例如内核估计器的带宽和直方图的组距。这种选择大众化数据驱动方法是基于均方误差（MSE）及其渐近或上限的最小化的交叉验证及修改方法。^[27]可以找到一种差异方法，通过使用著名的非参数统计数据（例如Kolmogorov-Smirnov's，von Mises和Anderson-Darling的统计量）作为分布函数（dfs）空间中的度量，并将后来的统计量的分位数作为已知的不确定性或差异值。^[26]自助法 Bootstrap是另一种工具，可以通过不同的重抽样方案使用未知MSE的近似值来查找平滑参数。^[28]

其他参考资料

尖峭态分布 Leptokurtic distribution
广义极值分布 Generalized extreme value distribution
离群值 Outlier
长尾 Long tail
幂律 Power law
随机的七个状态 Seven states of randomness
胖尾分布 Fat-tailed distribution
- 塔勒布分布 Taleb distribution和圣杯分布 Holy grail distribution

References 参考文献

↑ ^1.0 ^1.1 Asmussen, S. R. (2003). "Steady-State Properties of GI/G/1". Applied Probability and Queues. Stochastic Modelling and Applied Probability. 51. pp. 266–301. doi:10.1007/0-387-21525-5_10. ISBN 978-0-387-00211-8.
↑ Rolski, Schmidli, Scmidt, Teugels, Stochastic Processes for Insurance and Finance, 1999
↑ Chistyakov, V. P. (1964). "A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes". ResearchGate (in English). Retrieved April 7, 2019.
↑ Teugels, Jozef L. (1975). "The Class of Subexponential Distributions". University of Louvain: Annals of Probability. Retrieved April 7, 2019.
↑ ^5.0 ^5.1 ^5.2 ^5.3 ^5.4 Embrechts P.; Klueppelberg C.; Mikosch T. (1997). Modelling extremal events for insurance and finance. Stochastic Modelling and Applied Probability. 33. Berlin: Springer. doi:10.1007/978-3-642-33483-2. ISBN 978-3-642-08242-9.
↑ Foss, S.; Konstantopoulos, T.; Zachary, S. (2007). "Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments" (PDF). Journal of Theoretical Probability. 20 (3): 581. arXiv:math/0509605. CiteSeerX 10.1.1.210.1699. doi:10.1007/s10959-007-0081-2.
↑ Wierman, Adam (January 9, 2014). "Catastrophes, Conspiracies, and Subexponential Distributions (Part III)". Rigor + Relevance blog. RSRG, Caltech. Retrieved January 9, 2014.
↑ Willekens, E. (1986). "Subexponentiality on the real line". Technical Report. K.U. Leuven.
↑ Falk, M., Hüsler, J. & Reiss, R. (2010). Laws of Small Numbers: Extremes and Rare Events. Springer. p. 80. ISBN 978-3-0348-0008-2.
↑ Alves, M.I.F., de Haan, L. & Neves, C. (March 10, 2006). "Statistical inference for heavy and super-heavy tailed distributions" (PDF). Archived from the original (PDF) on June 23, 2007. Retrieved November 1, 2011.{{cite web}}: CS1 maint: multiple names: authors list (link)
↑ John P. Nolan (2009). "Stable Distributions: Models for Heavy Tailed Data" (PDF). Retrieved 2009-02-21.
↑ Stephen Lihn (2009). "Skew Lognormal Cascade Distribution". Archived from the original on 2014-04-07. Retrieved 2009-06-12.
↑ 模板:Contradict-inline
↑ ^14.0 ^14.1 Novak S.Y. (2011). Extreme value methods with applications to finance. London: CRC. ISBN 978-1-43983-574-6.
↑ ^15.0 ^15.1 Pickands III, James (Jan 1975). "Statistical Inference Using Extreme Order Statistics". The Annals of Statistics. 3 (1): 119–131. doi:10.1214/aos/1176343003. JSTOR 2958083.
↑ Hill B.M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., v. 3, 1163–1174.
↑ Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.
↑ Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.
↑ Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.
↑ Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.
↑ Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.
↑ Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.
↑ Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.
↑ Goldie C.M., Smith R.L. (1987) Slow variation with remainder: theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.
↑ Crovella, M. E.; Taqqu, M. S. (1999). "Estimating the Heavy Tail Index from Scaling Properties". Methodology and Computing in Applied Probability. 1: 55–79. doi:10.1023/A:1010012224103.
↑ ^26.0 ^26.1 Markovich N.M. (2007). Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice. Chitester: Wiley. ISBN 978-0-470-72359-3.
↑ Wand M.P., Jones M.C. (1995). Kernel smoothing. New York: Chapman and Hall. ISBN 978-0412552700.
↑ Hall P. (1992). The Bootstrap and Edgeworth Expansion. Springer. ISBN 9780387945088.

编者推荐

课程推荐

巴拉巴西网络科学

本课程中，邀请了10位全国最顶尖的复杂科学专家为您全面系统讲解网络科学，帮助大家完成从散点思维到网络思维，直至网络科学思维的跃升。

巴拉巴西网络科学

复杂网络2020

本课程是对复杂性科学的一个概述，包含10个章节，每节都会涵盖复杂系统的一个主要概念。

本中文词条由Jie编译，Smile审校，思无涯咿呀咿呀编辑。欢迎在讨论页面留言。

本词条内容源自wikipedia及公开资料，遵守 CC3.0协议。

[Asmussen-1] 1.0 ^1.1 Asmussen, S. R. (2003). "Steady-State Properties of GI/G/1". Applied Probability and Queues. Stochastic Modelling and Applied Probability. 51. pp. 266–301. doi:10.1007/0-387-21525-5_10. ISBN 978-0-387-00211-8.

[ReferenceA-2] Rolski, Schmidli, Scmidt, Teugels, Stochastic Processes for Insurance and Finance, 1999

[3] Chistyakov, V. P. (1964). "A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes". ResearchGate (in English). Retrieved April 7, 2019.

[4] Teugels, Jozef L. (1975). "The Class of Subexponential Distributions". University of Louvain: Annals of Probability. Retrieved April 7, 2019.

[Embrechts-5] 5.0 ^5.1 ^5.2 ^5.3 ^5.4 Embrechts P.; Klueppelberg C.; Mikosch T. (1997). Modelling extremal events for insurance and finance. Stochastic Modelling and Applied Probability. 33. Berlin: Springer. doi:10.1007/978-3-642-33483-2. ISBN 978-3-642-08242-9.

[6] Foss, S.; Konstantopoulos, T.; Zachary, S. (2007). "Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments" (PDF). Journal of Theoretical Probability. 20 (3): 581. arXiv:math/0509605. CiteSeerX 10.1.1.210.1699. doi:10.1007/s10959-007-0081-2.

[7] Wierman, Adam (January 9, 2014). "Catastrophes, Conspiracies, and Subexponential Distributions (Part III)". Rigor + Relevance blog. RSRG, Caltech. Retrieved January 9, 2014.

[8] Willekens, E. (1986). "Subexponentiality on the real line". Technical Report. K.U. Leuven.

[9] Falk, M., Hüsler, J. & Reiss, R. (2010). Laws of Small Numbers: Extremes and Rare Events. Springer. p. 80. ISBN 978-3-0348-0008-2.

[10] Alves, M.I.F., de Haan, L. & Neves, C. (March 10, 2006). "Statistical inference for heavy and super-heavy tailed distributions" (PDF). Archived from the original (PDF) on June 23, 2007. Retrieved November 1, 2011.{{cite web}}: CS1 maint: multiple names: authors list (link)

[11] John P. Nolan (2009). "Stable Distributions: Models for Heavy Tailed Data" (PDF). Retrieved 2009-02-21.

[12] Stephen Lihn (2009). "Skew Lognormal Cascade Distribution". Archived from the original on 2014-04-07. Retrieved 2009-06-12.

[13] 模板:Contradict-inline

[Novak2011-14] 14.0 ^14.1 Novak S.Y. (2011). Extreme value methods with applications to finance. London: CRC. ISBN 978-1-43983-574-6.

[Pickands-15] 15.0 ^15.1 Pickands III, James (Jan 1975). "Statistical Inference Using Extreme Order Statistics". The Annals of Statistics. 3 (1): 119–131. doi:10.1214/aos/1176343003. JSTOR 2958083.

[16] Hill B.M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., v. 3, 1163–1174.

[17] Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.

[18] Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.

[19] Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.

[20] Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.

[21] Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.

[22] Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.

[23] Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.

[24] Goldie C.M., Smith R.L. (1987) Slow variation with remainder: theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.

[25] Crovella, M. E.; Taqqu, M. S. (1999). "Estimating the Heavy Tail Index from Scaling Properties". Methodology and Computing in Applied Probability. 1: 55–79. doi:10.1023/A:1010012224103.

[Markovich2007-26] 26.0 ^26.1 Markovich N.M. (2007). Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice. Chitester: Wiley. ISBN 978-0-470-72359-3.

[WandJon1995-27] Wand M.P., Jones M.C. (1995). Kernel smoothing. New York: Chapman and Hall. ISBN 978-0412552700.

[Hall1992-28] Hall P. (1992). The Bootstrap and Edgeworth Expansion. Springer. ISBN 9780387945088.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]