第217行: |
第217行: |
| *The [[Student's t-distribution|t-distribution]]. | | *The [[Student's t-distribution|t-distribution]]. |
| *The skew lognormal cascade distribution.<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref> | | *The skew lognormal cascade distribution.<ref>{{cite web | author=Stephen Lihn | title=Skew Lognormal Cascade Distribution | year=2009 | url=http://www.skew-lognormal-cascade-distribution.org/ | access-date=2009-06-12 | archive-url=https://web.archive.org/web/20140407075213/http://www.skew-lognormal-cascade-distribution.org/ | archive-date=2014-04-07 | url-status=dead }}</ref> |
| + | |
| + | |
| + | |
| + | == Relationship to fat-tailed distributions == |
| + | A [[fat-tailed distribution]] is a distribution for which the probability density function, for large x, goes to zero as a power <math>x^{-a}</math>. Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the [[log-normal distribution]] {{Contradict-inline|article=fat-tailed distribution|reason=Fat-tailed page says log-normals are in fact fat-tailed.|date=June 2019}}. Many other heavy-tailed distributions such as the [[log-logistic distribution|log-logistic]] and [[Pareto distribution|Pareto]] distribution are, however, also fat-tailed. |
| + | |
| + | |
| + | |
| + | == Estimating the tail-index{{definition|date=January 2018}} == |
| + | |
| + | There are parametric (see Embrechts et al.<ref name="Embrechts"/>) and non-parametric (see, e.g., Novak<ref name="Novak2011">{{cite book |
| + | | author=Novak S.Y. |
| + | | title=Extreme value methods with applications to finance |
| + | | year=2011 |
| + | | series=London: CRC |
| + | | isbn=978-1-43983-574-6 |
| + | }}</ref>) approaches to the problem of the tail-index estimation. |
| + | |
| + | To estimate the tail-index using the parametric approach, some authors employ [[GEV distribution]] or [[Pareto distribution]]; they may apply the maximum-likelihood estimator (MLE). |
| + | |
| + | === Pickand's tail-index estimator === |
| + | |
| + | With <math>(X_n , n \geq 1)</math> a random sequence of independent and same density function <math>F \in D(H(\xi))</math>, the Maximum Attraction Domain<ref name=Pickands>{{cite journal|last=Pickands III|first=James|title=Statistical Inference Using Extreme Order Statistics|journal=The Annals of Statistics|date=Jan 1975|volume=3|issue=1|pages=119–131|jstor=2958083|doi=10.1214/aos/1176343003|doi-access=free}}</ref> of the generalized extreme value density <math> H </math>, where <math>\xi \in \mathbb{R}</math>. If <math>\lim_{n\to\infty} k(n) = \infty </math> and <math>\lim_{n\to\infty} \frac{k(n)}{n}= 0</math>, then the ''Pickands'' tail-index estimation is<ref name="Embrechts"/><ref name="Pickands"/> |
| + | :<math> |
| + | \xi^\text{Pickands}_{(k(n),n)} =\frac{1}{\ln 2} \ln \left( \frac{X_{(n-k(n)+1,n)} - X_{(n-2k(n)+1,n)}}{X_{(n-2k(n)+1,n)} - X_{(n-4k(n)+1,n)}}\right) |
| + | </math> |
| + | where <math>X_{(n-k(n)+1,n)}=\max \left(X_{n-k(n)+1},\ldots ,X_{n}\right)</math>. This estimator converges in probability to <math>\xi</math>. |
| + | |
| + | === Hill's tail-index estimator === |
| + | |
| + | Let <math>(X_t , t \geq 1)</math> be a sequence of independent and identically distributed random variables with distribution function <math>F \in D(H(\xi))</math>, the maximum domain of attraction of the [[generalized extreme value distribution]] <math> H </math>, where <math>\xi \in \mathbb{R}</math>. The sample path is <math>{X_t: 1 \leq t \leq n}</math> where <math>n</math> is the sample size. If |
| + | <math>\{k(n)\}</math> is an intermediate order sequence, i.e. <math>k(n) \in \{1,\ldots,n-1\}, </math>, <math>k(n) \to \infty</math> and <math>k(n)/n \to 0</math>, then the Hill tail-index estimator is<ref>Hill B.M. (1975) A simple general approach to inference about the tail of a distribution. Ann. Stat., v. 3, 1163–1174.</ref> |
| + | |
| + | : <math> |
| + | \xi^\text{Hill}_{(k(n),n)} = \left(\frac 1 {k(n)} \sum_{i=n-k(n)+1}^n \ln(X_{(i,n)}) - \ln (X_{(n-k(n)+1,n)})\right)^{-1}, |
| + | </math> |
| + | |
| + | where <math>X_{(i,n)}</math> is the <math>i</math>-th [[order statistic]] of <math>X_1, \dots, X_n</math>. |
| + | This estimator converges in probability to <math>\xi</math>, and is asymptotically normal provided <math>k(n) \to \infty </math> is restricted based on a higher order regular variation property<ref>Hall, P.(1982) On some estimates of an exponent of regular variation. J. R. Stat. Soc. Ser. B., v. 44, 37–42.</ref> |
| + | .<ref>Haeusler, E. and J. L. Teugels (1985) On asymptotic normality of Hill's estimator for the exponent of regular variation. Ann. Stat., v. 13, 743–756.</ref> Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences,<ref>Hsing, T. (1991) On tail index estimation using dependent data. Ann. Stat., v. 19, 1547–1569.</ref><ref>Hill, J. (2010) On tail index estimation for dependent, heterogeneous data. Econometric Th., v. 26, 1398–1436.</ref> irrespective of whether <math>X_t</math> is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent.<ref>Resnick, S. and Starica, C. (1997). Asymptotic behavior of Hill’s estimator for autoregressive data. Comm. Statist. Stochastic Models 13, 703–721.</ref><ref>Ling, S. and Peng, L. (2004). Hill’s estimator for the tail index of an ARMA model. J. Statist. Plann. Inference 123, 279–293.</ref><ref>Hill, J. B. (2015). Tail index estimation for a filtered dependent time series. Stat. Sin. 25, 609–630.</ref> |
| + | |
| + | === Ratio estimator of the tail-index === |
| + | |
| + | The ratio estimator (RE-estimator) of the tail-index was introduced by Goldie |
| + | and Smith.<ref>Goldie C.M., Smith R.L. (1987) Slow variation with remainder: |
| + | theory and applications. Quart. J. Math. Oxford, v. 38, 45–71.</ref> |
| + | It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter". |
| + | |
| + | A comparison of Hill-type and RE-type estimators can be found in Novak.<ref name="Novak2011"/> |
| + | |
| + | ===Software=== |
| + | * [http://www.cs.bu.edu/~crovella/aest.html aest], [[C (programming language)|C]] tool for estimating the heavy-tail index.<ref>{{Cite journal | last1 = Crovella | first1 = M. E. | last2 = Taqqu | first2 = M. S. | title = Estimating the Heavy Tail Index from Scaling Properties| journal = Methodology and Computing in Applied Probability | volume = 1 | pages = 55–79 | year = 1999 | doi = 10.1023/A:1010012224103 | url = http://www.cs.bu.edu/~crovella/paper-archive/aest.ps| pmid = | pmc = }}</ref> |
| + | |
| + | |
| + | |
| + | ==Estimation of heavy-tailed density== |
| + | |
| + | Nonparametric approaches to estimate heavy- and superheavy-tailed probability density functions were given in |
| + | Markovich.<ref name="Markovich2007">{{cite book |
| + | | author=Markovich N.M. |
| + | | title=Nonparametric Analysis of Univariate Heavy-Tailed data: Research and Practice |
| + | | year=2007 |
| + | | series=Chitester: Wiley |
| + | | isbn=978-0-470-72359-3 |
| + | }}</ref> These are approaches based on variable bandwidth and long-tailed kernel estimators; on the preliminary data transform to a new random variable at finite or infinite intervals which is more convenient for the estimation and then inverse transform of the obtained density estimate; and "piecing-together approach" which provides a certain parametric model for the tail of the density and a non-parametric model to approximate the mode of the density. Nonparametric estimators require an appropriate selection of tuning (smoothing) parameters like a bandwidth of kernel estimators and the bin width of the histogram. The well known data-driven methods of such selection are a cross-validation and its modifications, methods based on the minimization of the mean squared error (MSE) and its asymptotic and their upper bounds.<ref name="WandJon1995">{{cite book |
| + | | author=Wand M.P., Jones M.C. |
| + | | title=Kernel smoothing |
| + | | year=1995 |
| + | | series=New York: Chapman and Hall |
| + | | isbn=978-0412552700 |
| + | }}</ref> A discrepancy method which uses well-known nonparametric statistics like Kolmogorov-Smirnov's, von Mises and Anderson-Darling's ones as a metric in the space of distribution functions (dfs) and quantiles of the later statistics as a known uncertainty or a discrepancy value can be found in.<ref name="Markovich2007"/> Bootstrap is another tool to find smoothing parameters using approximations of unknown MSE by different schemes of re-samples selection, see e.g.<ref name="Hall1992">{{cite book |
| + | | author=Hall P. |
| + | | title=The Bootstrap and Edgeworth Expansion |
| + | | year=1992 |
| + | | series=Springer |
| + | | isbn=9780387945088 |
| + | }}</ref> |
| + | |
| + | |
| + | |
| + | ==See also== |
| + | *[[Leptokurtic distribution]] |
| + | *[[Generalized extreme value distribution]] |
| + | *[[Outlier]] |
| + | *[[Long tail]] |
| + | *[[Power law]] |
| + | *[[Seven states of randomness]] |
| + | *[[Fat-tailed distribution]] |
| + | **[[Taleb distribution]] and [[Holy grail distribution]] |
| | | |
| | | |