更改

重尾分布 (查看源代码)

2020年10月17日 (六) 13:12的版本

删除6,211字节、 2020年10月17日 (六) 13:12

→‎Subexponential distributions 长尾分布的定义

第127行：第127行：

=== Subexponential distributions 长尾分布的定义 ===

−

A fat-tailed distribution is a distribution for which the probability density function, for large x, goes to zero as a power x^{-a}. Since such a power is always bounded below by the probability density function of an exponential distribution, fat-tailed distributions are always heavy-tailed. Some distributions, however, have a tail which goes to zero slower than an exponential function (meaning they are heavy-tailed), but faster than a power (meaning they are not fat-tailed). An example is the log-normal distribution . Many other heavy-tailed distributions such as the log-logistic and Pareto distribution are, however, also fat-tailed.

−

肥尾分布是一个分布，对于大的 x，概率密度函数变为0的 x ^ {-a }的幂。由于这样的幂总是以概率密度函数的指数分布为界，胖尾分布总是重尾分布。然而，有些分布有一条尾巴，它比指数函数分布慢到零(意味着它们是重尾分布) ，但比幂分布快(意味着它们不是厚尾分布)。对数正态分布就是一个例子。然而，许多其他的重尾分布，例如 log-logistic 分布和帕累托分布分布也是厚尾分布。

−

Subexponentiality is defined in terms of [[Convolution of probability distributions|convolutions of probability distributions]]. For two independent, identically distributed [[random variables]] <math> X_1,X_2</math> with common distribution function <math>F</math> the convolution of <math>F</math> with itself, <math>F^{*2}</math> is convolution square, using [[Lebesgue–Stieltjes integration]], by:

−

:<math>

−

~~There are parametric (see Embrechts et al.) approaches to the problem of the tail-index estimation.~~

−

~~有参数(参见 Embrechts 等人。)对尾部指数估计问题的探讨。~~

−

\Pr[X_1+X_2 \leq x] = F^{*2}(x) = \int_{0}^x F(x-y)\,dF(y),

−

</math>

−

~~To estimate the tail-index using the parametric approach, some authors employ GEV distribution or Pareto distribution; they may apply the maximum-likelihood estimator (MLE).~~

−

~~为了使用参数方法估计尾部指数，一些作者使用了 GEV 分布或帕累托分布，他们可以使用最大似然估计(MLE)。~~

−

and the ''n''-fold convolution <math>F^{*n}</math> is defined inductively by the rule:

−

:<math>

−

F^{*n}(x) = \int_{0}^x F(x-y)\,dF^{*n-1}(y).

−

</math>

−

With (X_n , n \geq 1) a random sequence of independent and same density function F \in D(H(\xi)), the Maximum Attraction Domain of the generalized extreme value density H , where \xi \in \mathbb{R}. If \lim_{n\to\infty} k(n) = \infty and \lim_{n\to\infty} \frac{k(n)}{n}= 0, then the Pickands tail-index estimation is

−

在 d (h (xi))中具有独立相等密度函数 f 的随机序列，得到了广义极值密度 h 的最大吸引域，其中 xi 在 mathbb { r }中。如果 lim _ { n to infty } k (n) = infty，lim _ { n to infty } frac { k (n)}{ n } = 0，则 Pickands 尾指数估计为

−

The tail distribution function <math>\overline{F}</math> is defined as <math>\overline{F}(x) = 1-F(x)</math>.

−

~~<math>~~

−

~~《数学》~~

−

~~\xi^\text{Hill}_{(k(n),n)} = \left(\frac 1 {k(n)} \sum_{i=n-k(n)+1}^n \ln(X_{(i,n)}) - \ln (X_{(n-k(n)+1,n)})\right)^{-1},~~

−

~~1{ n)} = left (frac 1{ k (n)} sum { i = n-k (n) + 1} ^ n ln (x _ { i，n)})-ln (x _ { n-k (n) + 1，n)}) right) ^ {-1} ,~~

A distribution <math>F</math> on the positive half-line is subexponential<ref name="Asmussen"/><ref>{{Cite web|url=https://www.researchgate.net/publication/242637603_A_Theorem_on_Sums_of_Independent_Positive_Random_Variables_and_Its_Applications_to_Branching_Random_Processes|title=A Theorem on Sums of Independent Positive Random Variables and Its Applications to Branching Random Processes|last=Chistyakov|first=V. P.|date=1964|website=ResearchGate|language=en|archive-url=|archive-date=|access-date=April 7, 2019}}</ref><ref>{{Cite web|url=https://projecteuclid.org/download/pdf_1/euclid.aop/1176996225|title=The Class of Subexponential Distributions|last=Teugels|first=Jozef L.|authorlink=|date=1975|website=|publisher=Annals of Probability|publication-place=[[KU Leuven|University of Louvain]]|archive-url=|archive-date=|access-date=April 7, 2019}}</ref> if

−

~~</math>~~

−

数学

−

:<math>

−

~~where X_{(i,n)} is the i-th order statistic of X_1, \dots, X_n.~~

−

~~其中 x _ {(i，n)}是 x _ 1，点，x _ n 的 i 阶统计量。~~

−

\overline{F^{*2}}(x) \sim 2\overline{F}(x) \quad \mbox{as } x \to \infty.

−

~~This estimator converges in probability to \xi, and is asymptotically normal provided k(n) \to \infty is restricted based on a higher order regular variation property~~

−

~~该估计量在概率上收敛到 xi，并且基于高阶正则变差性质，在 k (n)为信度的条件下，它是渐近正态的~~

−

</math>

−

. Consistency and asymptotic normality extend to a large class of dependent and heterogeneous sequences, irrespective of whether X_t is observed, or a computed residual or filtered data from a large class of models and estimators, including mis-specified models and models with errors that are dependent.

−

.相合性和渐近正态性扩展到一大类相依和异质序列，无论是否观测到 x _ t，还是来自一大类模型和估计器的计算残差或过滤数据，包括误差相依的模型和模型。

−

This implies<ref name="Embrechts">{{cite book |author1=Embrechts P. |author2=Klueppelberg C. |author3=Mikosch T. |title=Modelling extremal events for insurance and finance |publisher=Springer | series = Stochastic Modelling and Applied Probability|location=Berlin |year=1997 | volume=33| doi = 10.1007/978-3-642-33483-2|isbn=978-3-642-08242-9 }}</ref> that, for any <math>n \geq 1</math>,

−

:<math>

−

~~The ratio estimator (RE-estimator) of the tail-index was introduced by Goldie~~

−

~~引入了尾部指数的比率估计量(re- 估计量)~~

−

\overline{F^{*n}}(x) \sim n\overline{F}(x) \quad \mbox{as } x \to \infty.

−

~~and Smith.~~

−

~~还有史密斯。~~

−

</math>

−

~~It is constructed similarly to Hill's estimator but uses a non-random "tuning parameter".~~

−

~~它的构造类似于希尔估计器，但使用了一个非随机的“调谐参数”。~~

−

The probabilistic interpretation<ref name="Embrechts"/> of this is that, for a sum of <math>n</math> [[statistical independence|independent]] [[random variables]] <math>X_1,\ldots,X_n</math> with common distribution <math>F</math>,

−

~~A comparison of Hill-type and RE-type estimators can be found in Novak.~~

−

~~希尔型和稀土型估计量的比较可以在 Novak 找到。~~

−

:<math>

−

\Pr[X_1+ \cdots +X_n>x] \sim \Pr[\max(X_1, \ldots,X_n)>x] \quad \text{as } x \to \infty.

−

</math>

−

~~Nonparametric approaches to estimate heavy- and superheavy-tailed probability density functions were given in~~

−

~~给出了估计重尾和超重尾概率密度函数的非参数方法~~

−

Markovich. These are approaches based on variable bandwidth and long-tailed kernel estimators; on the preliminary data transform to a new random variable at finite or infinite intervals which is more convenient for the estimation and then inverse transform of the obtained density estimate; and "piecing-together approach" which provides a certain parametric model for the tail of the density and a non-parametric model to approximate the mode of the density. Nonparametric estimators require an appropriate selection of tuning (smoothing) parameters like a bandwidth of kernel estimators and the bin width of the histogram. The well known data-driven methods of such selection are a cross-validation and its modifications, methods based on the minimization of the mean squared error (MSE) and its asymptotic and their upper bounds. A discrepancy method which uses well-known nonparametric statistics like Kolmogorov-Smirnov's, von Mises and Anderson-Darling's ones as a metric in the space of distribution functions (dfs) and quantiles of the later statistics as a known uncertainty or a discrepancy value can be found in.

−

男名男子名。这些方法包括: 基于变带宽和长尾核估计的方法; 在有限或无限区间内将初始数据转换为一个新的随机变量的方法，这种方法更便于对所得密度估计进行估计和反变换; 以及“拼接方法” ，这种方法为密度的尾部提供了一个确定的参数模型和一个非参数模型来逼近密度的模式。非参数估计需要适当选择调整(平滑)参数，如核估计的带宽和直方图的容器宽度。众所周知的数据驱动选择方法是交叉验证及其修正，基于最小均方差及其渐近和上界的方法。在分布函数空间(dfs)和后续统计量的分位数空间(dfs)中，利用著名的无母数统计，如 Kolmogorov-Smirnov、 von Mises 和 Anderson-Darling 的分布函数，作为已知的不确定性或不一致值，可以找到一种差异方法。

This is often known as the principle of the single big jump<ref>{{Cite journal | last1 = Foss | first1 = S. | last2 = Konstantopoulos | first2 = T. | last3 = Zachary | first3 = S. | doi = 10.1007/s10959-007-0081-2 | title = Discrete and Continuous Time Modulated Random Walks with Heavy-Tailed Increments | journal = Journal of Theoretical Probability| volume = 20 | issue = 3 | pages = 581 | year = 2007 | arxiv = math/0509605| pmid = | url = http://www.math.nsc.ru/LBRT/v1/foss/fkz_revised.pdf| pmc = | citeseerx = 10.1.1.210.1699 }}</ref> or catastrophe principle.<ref>{{cite web| url = http://rigorandrelevance.wordpress.com/2014/01/09/catastrophes-conspiracies-and-subexponential-distributions-part-iii/ | title = Catastrophes, Conspiracies, and Subexponential Distributions (Part III) | first = Adam | last = Wierman | authorlink = Adam Wierman | date = January 9, 2014 | accessdate = January 9, 2014 | website = Rigor + Relevance blog | publisher = RSRG, Caltech}}</ref>

−

A distribution <math>F</math> on the whole real line is subexponential if the distribution

−

<math>F I([0,\infty))</math> is.<ref>{{cite journal | last = Willekens | first = E. | title = Subexponentiality on the real line | journal = Technical Report | publisher = K.U. Leuven | year = 1986}}</ref> Here <math>I([0,\infty))</math> is the [[indicator function]] of the positive half-line. Alternatively, a random variable <math>X</math> supported on the real line is subexponential if and only if <math>X^+ = \max(0,X)</math> is subexponential.

−

All subexponential distributions are long-tailed, but examples can be constructed of long-tailed distributions that are not subexponential.

Jie

961

个编辑