更改

添加8,268字节 、 2020年10月17日 (六) 22:34
第217行: 第217行:  
*The [[Student's t-distribution|t-distribution]].
 
*The [[Student's t-distribution|t-distribution]].
 
*The skew lognormal cascade distribution.<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref>
 
*The skew lognormal cascade distribution.<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref>
 +
 +
 +
 +
== Relationship to fat-tailed distributions ==
 +
A [[fat-tailed distribution]] is a distribution for which the probability density function, for large x, goes to zero as a power <math>x^{-a}</math>.  Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed.  Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the [[log-normal distribution]] {{Contradict-inline|article=fat-tailed distribution|reason=Fat-tailed page says log-normals are in fact fat-tailed.|date=June 2019}}.  Many other heavy-tailed distributions such as the [[log-logistic distribution|log-logistic]] and [[Pareto distribution|Pareto]] distribution are, however, also fat-tailed.
 +
 +
 +
 +
== Estimating the tail-index{{definition|date=January 2018}} ==
 +
 +
There are parametric (see Embrechts et al.<ref name="Embrechts"/>) and non-parametric (see, e.g., Novak<ref name="Novak2011">{{cite book
 +
| author=Novak S.Y.
 +
| title=Extreme value methods with applications to finance
 +
| year=2011
 +
| series=London: CRC
 +
| isbn=978-1-43983-574-6
 +
}}</ref>) approaches to the problem of the tail-index estimation.
 +
 +
To estimate the tail-index using the parametric approach, some authors employ  [[GEV distribution]] or [[Pareto distribution]]; they may apply the maximum-likelihood estimator (MLE).
 +
 +
=== Pickand's tail-index estimator ===
 +
 +
With <math>(X_n , n \geq 1)</math> a random sequence of independent and same  density function <math>F \in D(H(\xi))</math>, the Maximum Attraction Domain<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref>  of the generalized extreme value density <math> H </math>, where <math>\xi \in \mathbb{R}</math>. If <math>\lim_{n\to\infty} k(n) = \infty  </math> and  <math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>, then the ''Pickands'' tail-index estimation is<ref name="Embrechts"/><ref name="Pickands"/>
 +
:<math>
 +
\xi^\text{Pickands}_{(k(n),n)} =\frac{1}{\ln 2} \ln \left(  \frac{X_{(n-k(n)+1,n)} - X_{(n-2k(n)+1,n)}}{X_{(n-2k(n)+1,n)} - X_{(n-4k(n)+1,n)}}\right)
 +
</math>
 +
where <math>X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots  ,X_{n}\right)</math>. This estimator converges in probability to <math>\xi</math>.
 +
 +
=== Hill's tail-index estimator ===
 +
 +
Let <math>(X_t , t \geq 1)</math> be a sequence of independent and identically distributed random variables with distribution function <math>F \in D(H(\xi))</math>, the maximum domain of attraction of the [[generalized extreme value distribution]] <math> H </math>, where <math>\xi \in \mathbb{R}</math>. The sample path is <math>{X_t: 1 \leq t \leq n}</math> where <math>n</math> is the sample size. If
 +
<math>\{k(n)\}</math> is an intermediate order sequence, i.e. <math>k(n) \in \{1,\ldots,n-1\}, </math>, <math>k(n) \to \infty</math> and  <math>k(n)/n \to 0</math>, then the Hill tail-index estimator is<ref>Hill B.M. (1975) A simple general approach to inference about  the tail of a distribution. Ann. Stat., v. 3, 1163–1174.</ref>
 +
 +
: <math>
 +
\xi^\text{Hill}_{(k(n),n)} = \left(\frac 1 {k(n)} \sum_{i=n-k(n)+1}^n \ln(X_{(i,n)}) - \ln (X_{(n-k(n)+1,n)})\right)^{-1},
 +
</math>
 +
 +
where <math>X_{(i,n)}</math> is the <math>i</math>-th [[order statistic]] of <math>X_1, \dots, X_n</math>.
 +
This estimator converges in probability to <math>\xi</math>, and is asymptotically normal provided <math>k(n) \to \infty  </math> is restricted based on a higher order regular variation property<ref>Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.</ref>
 +
.<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref> Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences,<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref> irrespective of whether <math>X_t</math> is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent.<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref>
 +
 +
=== Ratio estimator of the tail-index ===
 +
 +
The ratio estimator (RE-estimator) of the tail-index was introduced by Goldie 
 +
and Smith.<ref>Goldie C.M., Smith R.L. (1987) Slow variation with remainder:
 +
theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.</ref>
 +
It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".
 +
 +
A comparison of Hill-type and RE-type estimators can be found in Novak.<ref name="Novak2011"/>
 +
 +
===Software===
 +
* [http://www.cs.bu.edu/~crovella/aest.html aest], [[C (programming language)|C]] tool for estimating the heavy-tail index.<ref>{{Cite journal | last1 = Crovella | first1 = M. E. | last2 = Taqqu | first2 = M. S. | title = Estimating the Heavy Tail Index from Scaling Properties| journal = Methodology and Computing in Applied Probability | volume = 1 | pages = 55–79 | year = 1999 | doi = 10.1023/A:1010012224103 | url = http://www.cs.bu.edu/~crovella/paper-archive/aest.ps| pmid =  | pmc = }}</ref>
 +
 +
 +
 +
==Estimation of heavy-tailed density==
 +
 +
Nonparametric approaches to estimate heavy- and superheavy-tailed probability density functions were given in
 +
Markovich.<ref name="Markovich2007">{{cite book
 +
| author=Markovich N.M.
 +
| title=Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice
 +
| year=2007
 +
| series=Chitester: Wiley
 +
| isbn=978-0-470-72359-3
 +
}}</ref> These are approaches based on variable bandwidth and long-tailed kernel estimators;  on the preliminary data transform to a new random variable at finite or infinite intervals which is more convenient for the estimation and then inverse transform of the obtained density estimate; and "piecing-together approach" which provides a certain parametric model for the tail of the density and a non-parametric model to approximate the mode of the density. Nonparametric estimators require an appropriate selection of tuning (smoothing) parameters like a bandwidth of kernel estimators and the bin width of the histogram. The well known data-driven methods of such selection are a cross-validation and its modifications,  methods based on the minimization of the mean squared error (MSE) and its asymptotic and their upper bounds.<ref name="WandJon1995">{{cite book
 +
| author=Wand M.P., Jones M.C.
 +
| title=Kernel smoothing
 +
| year=1995
 +
| series=New York: Chapman and Hall
 +
| isbn=978-0412552700
 +
}}</ref> A discrepancy method which uses well-known nonparametric statistics like Kolmogorov-Smirnov's, von Mises and Anderson-Darling's ones as a metric in the space of distribution functions (dfs) and quantiles of the later statistics as a known uncertainty or a discrepancy value can be found in.<ref name="Markovich2007"/> Bootstrap is another tool to find smoothing parameters using  approximations of unknown MSE by different schemes of re-samples selection, see e.g.<ref name="Hall1992">{{cite book
 +
| author=Hall P.
 +
| title=The Bootstrap and Edgeworth Expansion
 +
| year=1992
 +
| series=Springer
 +
| isbn=9780387945088
 +
}}</ref>
 +
 +
 +
 +
==See also==
 +
*[[Leptokurtic distribution]]
 +
*[[Generalized extreme value distribution]]
 +
*[[Outlier]]
 +
*[[Long tail]]
 +
*[[Power law]]
 +
*[[Seven states of randomness]]
 +
*[[Fat-tailed distribution]]
 +
**[[Taleb distribution]] and [[Holy grail distribution]]
     
961

个编辑