更改

删除7,874字节 、 2023年2月5日 (日) 17:20
无编辑摘要
第27行: 第27行:  
时间序列是面板数据的一种类型,面板数据是更大的类别。面板数据是一个多维的数据集,而时间序列数据集是一个一维的面板(正如截面数据集一样)。一个数据集可能同时表现出面板数据和时间序列数据的特征。判断是面板数据还是时间序列的方法之一是探究使一条数据记录与其他记录不同的因素。如果答案是时间数据字段,那么这就是一个时间序列数据集候选。如果确定一个独特的记录需要一个时间数据字段和一个与时间无关的额外标识符(如学生证、股票代码、国家代码),那么它就是面板数据的候选。如果区别在于非时间标识符,那么该数据集就是一个截面数据集候选。
 
时间序列是面板数据的一种类型,面板数据是更大的类别。面板数据是一个多维的数据集,而时间序列数据集是一个一维的面板(正如截面数据集一样)。一个数据集可能同时表现出面板数据和时间序列数据的特征。判断是面板数据还是时间序列的方法之一是探究使一条数据记录与其他记录不同的因素。如果答案是时间数据字段,那么这就是一个时间序列数据集候选。如果确定一个独特的记录需要一个时间数据字段和一个与时间无关的额外标识符(如学生证、股票代码、国家代码),那么它就是面板数据的候选。如果区别在于非时间标识符,那么该数据集就是一个截面数据集候选。
   −
==Analysis分析==
+
==分析==
      第42行: 第42行:     
绘制折线图是分析常规时间序列的直观方法。右侧显示了一个使用电子表格程序制作的美国结核病发病率示例图表。病例的数量被标准化为每10万人的比率,并计算出该比率每年的变化百分比。几乎稳定下降的线条表明,结核病发病率在大多数年份都在下降,但该比率的变化百分比高达+/-10%,且在1975年和20世纪90年代初前后出现了 "激增"。图中应用了两个纵轴,可以在一个图表中比较两个时间序列。
 
绘制折线图是分析常规时间序列的直观方法。右侧显示了一个使用电子表格程序制作的美国结核病发病率示例图表。病例的数量被标准化为每10万人的比率,并计算出该比率每年的变化百分比。几乎稳定下降的线条表明,结核病发病率在大多数年份都在下降,但该比率的变化百分比高达+/-10%,且在1975年和20世纪90年代初前后出现了 "激增"。图中应用了两个纵轴,可以在一个图表中比较两个时间序列。
  −
            
一项对企业数据分析师的研究发现,探索性时间的序列分析有两个挑战:发现新模式,以及为这些模式找到解释<ref name=":6">{{Cite journal|last=Sarkar|first=Advait|last2=Spott|first2=Martin|last3=Blackwell|first3=Alan F.|last4=Jamnik|first4=Mateja|date=2016|title=Visual discovery and model-driven explanation of time series patterns|url=https://doi.org/10.1109/VLHCC.2016.7739668|journal=2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)|publisher=IEEE|doi=10.1109/vlhcc.2016.7739668}}</ref>。将时间序列数据可视化为热力图矩阵的工具可以帮助解释这些模式。
 
一项对企业数据分析师的研究发现,探索性时间的序列分析有两个挑战:发现新模式,以及为这些模式找到解释<ref name=":6">{{Cite journal|last=Sarkar|first=Advait|last2=Spott|first2=Martin|last3=Blackwell|first3=Alan F.|last4=Jamnik|first4=Mateja|date=2016|title=Visual discovery and model-driven explanation of time series patterns|url=https://doi.org/10.1109/VLHCC.2016.7739668|journal=2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)|publisher=IEEE|doi=10.1109/vlhcc.2016.7739668}}</ref>。将时间序列数据可视化为热力图矩阵的工具可以帮助解释这些模式。
  −
  −
         
其他技巧包括:
 
其他技巧包括:
   
* 通过自相关分析去检验序列相关性;
 
* 通过自相关分析去检验序列相关性;
 
* 通过频谱分析来检查与季节性无关的周期性行为。例如,太阳黑子活动在一个周期内(11年)的变化。周期性行为常见的例子也包括天体现象、天气模式、神经活动、商品价格和经济活动;
 
* 通过频谱分析来检查与季节性无关的周期性行为。例如,太阳黑子活动在一个周期内(11年)的变化。周期性行为常见的例子也包括天体现象、天气模式、神经活动、商品价格和经济活动;
第60行: 第54行:     
===曲线拟合===
 
===曲线拟合===
{{main|Curve fitting}}
+
曲线拟合<ref name=":7">Sandra Lach Arlinghaus, PHB Practical Handbook of Curve Fitting. CRC Press, 1994.</ref><ref name=":8">William M. Kolb. Curve Fitting for Programmable Calculators. Syntec, Incorporated, 1984.</ref> 是构建一条曲线或数学函数的过程,它对一系列的数据点具有最佳的拟合效果<ref name=":9">S.S. Halli, K.V. Rao. 1992. Advanced Techniques of Population Analysis. {{isbn|0306439972}} Page 165 (''cf''. ... functions are fulfilled if we have a good to moderate fit for the observed data.)</ref>,但也可能会受到一些限制<ref name=":10">[https://archive.org/details/signalnoisewhymo00silv]''[[The Signal and the Noise]]'': Why So Many Predictions Fail-but Some Don't.'' By Nate Silver''</ref><ref name=":11">[https://books.google.com/books?id=hhdVr9F-JfAC Data Preparation for Data Mining]: Text. By Dorian Pyle.</ref>。曲线拟合包括插值<ref name=":12">Numerical Methods in Engineering with MATLAB®. By [[Jaan Kiusalaas]]. Page 24.</ref><ref name=":13">[https://books.google.com/books?id=YlkgAwAAQBAJ&printsec=frontcover#v=onepage&q=%22curve%20fitting%22&f=false Numerical Methods in Engineering with Python 3]. By Jaan Kiusalaas. Page 21.</ref>(需要精确地拟合数据)与平滑<ref name=":14">[https://books.google.com/books?id=UjnB0FIWv_AC&printsec=frontcover#v=onepage&q&f=false Numerical Methods of Curve Fitting]. By P. G. Guest, Philip George Guest. Page 349.</ref><ref name=":15">See also: [[Mollifier]]</ref>(构造一个 "平滑 "的函数来近似地拟合数据)。与曲线拟合相近的回归分析<ref name=":16">[http://www.facm.ucl.ac.be/intranet/books/statistics/Prism-Regression-Book.unlocked.pdf Fitting Models to Biological Data Using Linear and Nonlinear Regression]. By Harvey Motulsky, Arthur Christopoulos.</ref><ref name=":17">Regression Analysis By Rudolf J. Freund, William J. Wilson, Ping Sa. Page 269.</ref>更侧重于统计推断的问题。例如,在拟合有随机误差的数据的曲线中,推测有多少不确定性存在。拟合曲线可以作为数据可视化的辅助工具<ref name=":18">Visual Informatics. Edited by Halimah Badioze Zaman, Peter Robinson, Maria Petrou, Patrick Olivier, Heiko Schröder. Page 689.</ref><ref name=":19">[https://books.google.com/books?id=rdJvXG1k3HsC&printsec=frontcover#v=onepage&q&f=false Numerical Methods for Nonlinear Engineering Models]. By John R. Hauser. Page 227.</ref>,在没有数据的情况下推断函数的值<ref name=":20">Methods of Experimental Physics: Spectroscopy, Volume 13, Part 1. By Claire Marton. Page 150.</ref>,并总结两个或多个变量之间的关系<ref name=":21">Encyclopedia of Research Design, Volume 1. Edited by Neil J. Salkind. Page 266.</ref>。外推法是指在观测到的数据范围之外使用拟合曲线<ref name=":22">[https://books.google.com/books?id=ba0hAQAAQBAJ&printsec=frontcover#v=onepage&q&f=false Community Analysis and Planning Techniques]. By Richard E. Klosterman. Page 1.</ref>,它有一定程度的不确定性<ref name=":23">An Introduction to Risk and Uncertainty in the Evaluation of Environmental Investments. DIANE Publishing. [https://books.google.com/books?id=rJ23LWaZAqsC&pg=PA69 Pg 69]</ref>,因为它既可能是反映观测数据,也可能是反映用于构建曲线的方法。
 
  −
Curve fitting<ref name=":7">Sandra Lach Arlinghaus, PHB Practical Handbook of Curve Fitting. CRC Press, 1994.</ref><ref name=":8">William M. Kolb. Curve Fitting for Programmable Calculators. Syntec, Incorporated, 1984.</ref> is the process of constructing a [[curve]], or [[function (mathematics)|mathematical function]], that has the best fit to a series of [[data]] points,<ref name=":9">S.S. Halli, K.V. Rao. 1992. Advanced Techniques of Population Analysis. {{isbn|0306439972}} Page 165 (''cf''. ... functions are fulfilled if we have a good to moderate fit for the observed data.)</ref> possibly subject to constraints.<ref name=":10">[https://archive.org/details/signalnoisewhymo00silv]''[[The Signal and the Noise]]'': Why So Many Predictions Fail-but Some Don't.'' By Nate Silver''</ref><ref name=":11">[https://books.google.com/books?id=hhdVr9F-JfAC Data Preparation for Data Mining]: Text. By Dorian Pyle.</ref> Curve fitting can involve either [[interpolation]],<ref name=":12">Numerical Methods in Engineering with MATLAB®. By [[Jaan Kiusalaas]]. Page 24.</ref><ref name=":13">[https://books.google.com/books?id=YlkgAwAAQBAJ&printsec=frontcover#v=onepage&q=%22curve%20fitting%22&f=false Numerical Methods in Engineering with Python 3]. By Jaan Kiusalaas. Page 21.</ref> where an exact fit to the data is required, or [[smoothing]],<ref name=":14">[https://books.google.com/books?id=UjnB0FIWv_AC&printsec=frontcover#v=onepage&q&f=false Numerical Methods of Curve Fitting]. By P. G. Guest, Philip George Guest. Page 349.</ref><ref name=":15">See also: [[Mollifier]]</ref> in which a "smooth" function is constructed that approximately fits the data.  A related topic is [[regression analysis]],<ref name=":16">[http://www.facm.ucl.ac.be/intranet/books/statistics/Prism-Regression-Book.unlocked.pdf Fitting Models to Biological Data Using Linear and Nonlinear Regression]. By Harvey Motulsky, Arthur Christopoulos.</ref><ref name=":17">Regression Analysis By Rudolf J. Freund, William J. Wilson, Ping Sa. Page 269.</ref> which focuses more on questions of [[statistical inference]] such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization,<ref name=":18">Visual Informatics. Edited by Halimah Badioze Zaman, Peter Robinson, Maria Petrou, Patrick Olivier, Heiko Schröder. Page 689.</ref><ref name=":19">[https://books.google.com/books?id=rdJvXG1k3HsC&printsec=frontcover#v=onepage&q&f=false Numerical Methods for Nonlinear Engineering Models]. By John R. Hauser. Page 227.</ref> to infer values of a function where no data are available,<ref name=":20">Methods of Experimental Physics: Spectroscopy, Volume 13, Part 1. By Claire Marton. Page 150.</ref> and to summarize the relationships among two or more variables.<ref name=":21">Encyclopedia of Research Design, Volume 1. Edited by Neil J. Salkind. Page 266.</ref> [[Extrapolation]] refers to the use of a fitted curve beyond the [[range (statistics)|range]] of the observed data,<ref name=":22">[https://books.google.com/books?id=ba0hAQAAQBAJ&printsec=frontcover#v=onepage&q&f=false Community Analysis and Planning Techniques]. By Richard E. Klosterman. Page 1.</ref> and is subject to a [[Uncertainty|degree of uncertainty]]<ref name=":23">An Introduction to Risk and Uncertainty in the Evaluation of Environmental Investments. DIANE Publishing. [https://books.google.com/books?id=rJ23LWaZAqsC&pg=PA69 Pg 69]</ref> since it may reflect the method used to construct the curve as much as it reflects the observed data.
  −
 
  −
正文:曲线拟合Curve fitting
  −
 
  −
曲线拟合<ref name=":7" /><ref name=":8" /> 是构建一条曲线Curve或数学函数Mathematical function的过程,它对一系列的数据Data点具有最佳的拟合效果<ref name=":9" />,可能会受到一些限制<ref name=":10" /><ref name=":11" />。曲线拟合包括插值Interpolation<ref name=":12" /><ref name=":13" />(需要精确地拟合数据)与平滑Smoothing<ref name=":14" /><ref name=":15" />(构造一个 "平滑 "的函数来近似地拟合数据)。与曲线拟合相近的回归分析Regression analysis<ref name=":16" /><ref name=":17" />更侧重于统计推断Statistical inference的问题。例如,在拟合有随机误差的数据的曲线中,有多少不确定性存在。拟合曲线可以作为数据可视化的辅助工具<ref name=":18" /><ref name=":19" />,在没有数据的情况下推断函数的值<ref name=":20" />,并总结两个或多个变量之间的关系<ref name=":21" />。外推法Extrapolation是指在观测到的数据范围Range之外使用拟合曲线<ref name=":22" />,它有一定程度的不确定性Degree of uncertainty<ref name=":23" />,因为它既可能是反映观测数据,也可能是反映用于构建曲线的方法。
  −
 
     −
The construction of economic time series involves the estimation of some components for some dates by [[interpolation]] between values ("benchmarks") for earlier and later dates. Interpolation is estimation of an unknown quantity between two known quantities (historical data), or drawing conclusions about missing information from the available information ("reading between the lines").<ref name=":24">Hamming, Richard. Numerical methods for scientists and engineers. Courier Corporation, 2012.</ref> Interpolation is useful where the data surrounding the missing data is available and its trend, seasonality, and longer-term cycles are known. This is often done by using a related series known for all relevant dates.<ref name=":25">Friedman, Milton. "[http://www.nber.org/chapters/c2062.pdf The interpolation of time series by related series]." Journal of the American Statistical Association 57.300 (1962): 729–757.</ref> Alternatively [[polynomial interpolation]] or [[spline interpolation]] is used where piecewise [[polynomial]] functions are fit into time intervals such that they fit smoothly together. A different problem which is closely related to interpolation is the approximation of a complicated function by a simple function (also called [[Polynomial regression|regression]]).The main difference between regression and interpolation is that polynomial regression gives a single polynomial that models the entire data set.  Spline interpolation, however, yield a piecewise continuous function composed of many polynomials to model the data set.
+
经济时间序列的构建涉及通过在早期和晚期的基准值之间进行插值来估计某些日期的某些组成部分。插值法是在两个已知量(历史数据)之间估计一个未知量,或从现有信息中得出关于缺失信息的结论("从字里行间阅读"<ref name=":24">Hamming, Richard. Numerical methods for scientists and engineers. Courier Corporation, 2012.</ref>。如果与缺失数据相关的数据是可用的,并且其趋势、季节性和长期周期是已知的,那么插值法就很有用。插值法通常是通过使用已知所有相关日期的相关序列来实现的<ref name=":25">Friedman, Milton. "[http://www.nber.org/chapters/c2062.pdf The interpolation of time series by related series]." Journal of the American Statistical Association 57.300 (1962): 729–757.</ref>。或者使用多项式插值或样条插值,将分段多项式函数拟合到时间间隔中,使其平滑地拟合在一起。一个与插值密切相关的问题是用一个简单的函数来逼近一个复杂的函数(也称为回归)。回归和插值的主要区别是,多项式回归给出一个单一的多项式来模拟整个数据集。而插值则产生一个由许多多项式组成的分段连续函数来模拟数据集。
   −
经济时间序列的构建涉及通过在早期和晚期的值(“基准”)之间进行插值Interpolation来估计某些日期的某些组成部分。插值法是在两个已知量(历史数据)之间估计一个未知量,或从现有信息中得出关于缺失信息的结论("从字里行间阅读")<ref name=":24" />。如果围绕缺失数据的数据是可用的,并且其趋势、季节性和长期周期是已知的,那么插值法就很有用。插值法通常是通过使用已知所有相关日期的相关序列来实现的<ref name=":25" />。或者使用多项式插值Polynomial interpolation或样条插值Spline interpolation,将分段多项式Polynomial函数拟合到时间间隔中,使其平滑地拟合在一起。一个与插值密切相关的问题是用一个简单的函数来逼近一个复杂的函数(也称为回归Regression)。回归和插值的主要区别是,多项式回归给出一个单一的多项式来模拟整个数据集。而样条插值则产生一个由许多多项式组成的分段连续函数来模拟数据集。
     −
[[Extrapolation]] is the process of estimating, beyond the original observation range, the value of a variable on the basis of its relationship with another variable. It is similar to [[interpolation]], which produces estimates between known observations, but extrapolation is subject to greater [[uncertainty]] and a higher risk of producing meaningless results.
+
外推法是指在原始观察范围之外,根据一个变量与另一个变量的关系来估计其数值的过程。它与插值I类似,插值在已知的观测值之间产生估计值,但外推法的不确定性更大,产生无意义结果的风险也更大。
   −
外推法Extrapolation是指在原始观察范围之外,根据一个变量与另一个变量的关系来估计其数值的过程。它与插值Interpolation类似,插值在已知的观测值之间产生估计值,但外推法的不确定性Uncertainty更大,产生无意义结果的风险也更大。
+
===函数逼近问题===
 
  −
===Function approximation===
   
{{main|Function approximation}}
 
{{main|Function approximation}}
In general, a function approximation problem asks us to select a [[function (mathematics)|function]] among a well-defined class that closely matches ("approximates") a target function in a task-specific way.
  −
One can distinguish two major classes of function approximation problems: First, for known target functions, [[approximation theory]]  is the branch of [[numerical analysis]] that investigates how certain known functions (for example, [[special function]]s) can be approximated by a specific class of functions (for example, [[polynomial]]s or [[rational function]]s) that often have desirable properties (inexpensive computation, continuity, integral and limit values, etc.).
  −
  −
  −
In general, a function approximation problem asks us to select a function among a well-defined class that closely matches ("approximates") a target function in a task-specific way.
  −
One can distinguish two major classes of function approximation problems: First, for known target functions, approximation theory  is the branch of numerical analysis that investigates how certain known functions (for example, special functions) can be approximated by a specific class of functions (for example, polynomials or rational functions) that often have desirable properties (inexpensive computation, continuity, integral and limit values, etc.).
  −
  −
= = = = = = = = 一般来说,一个函数逼近问题要求我们在一个定义良好的类中选择一个函数,这个类以一种特定于任务的方式与目标函数非常匹配(“近似”)。人们可以区分两类主要的函数逼近问题: 首先,对于已知的目标函数,逼近理论是数值分析的一个分支,研究某些已知函数(例如,特殊函数)如何可以用一类特定的函数(例如,多项式或有理函数)来近似,这类函数通常具有理想的性质(廉价计算、连续性、积分和极限值等等)。).
     −
Second, the target function, call it ''g'', may be unknown; instead of an explicit formula, only a set of points (a time series) of the form (''x'', ''g''(''x'')) is provided.  Depending on the structure of the [[domain of a function|domain]] and [[codomain]] of ''g'', several techniques for approximating ''g'' may be applicable.  For example, if ''g'' is an operation on the [[real number]]s, techniques of [[interpolation]], [[extrapolation]], [[regression analysis]], and [[curve fitting]] can be used.  If the [[codomain]] (range or target set) of ''g'' is a finite set, one is dealing with a [[statistical classification|classification]] problem instead. A related problem of ''online'' time series approximation<ref>Gandhi, Sorabh, Luca Foschini, and Subhash Suri. "[https://ieeexplore.ieee.org/abstract/document/5447930/ Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order]." Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 2010.</ref> is to summarize the data in one-pass and construct an approximate representation that can support a variety of time series queries with bounds on worst-case error.
     −
Second, the target function, call it g, may be unknown; instead of an explicit formula, only a set of points (a time series) of the form (x, g(x)) is provided.  Depending on the structure of the domain and codomain of g, several techniques for approximating g may be applicable.  For example, if g is an operation on the real numbers, techniques of interpolation, extrapolation, regression analysis, and curve fitting can be used.  If the codomain (range or target set) of g is a finite set, one is dealing with a classification problem instead. A related problem of online time series approximationGandhi, Sorabh, Luca Foschini, and Subhash Suri. "Space-efficient online approximation of time series data: Streams, amnesia, and out-of-order." Data Engineering (ICDE), 2010 IEEE 26th International Conference on. IEEE, 2010. is to summarize the data in one-pass and construct an approximate representation that can support a variety of time series queries with bounds on worst-case error.
+
一般来说,一个函数逼近问题要求我们在一个定义良好的类中选择一个函数,这个类以一种特定于任务的方式与目标函数非常匹配。对于已知的目标函数,逼近理论是数值分析的一个分支,研究某些已知函数如何可以用一类特定的函数(例如,多项式或有理函数)来近似,这类函数通常具有理想的性质(连续性、积分和极限值等等)
   −
其次,目标函数,称为 g,可能是未知的; 代替黎曼显式公式,只有一组形式(x,g (x))的点(时间序列)被提供。根据 g 的畴和余畴的结构,几种近似 g 的方法可能是适用的。例如,如果 g 是对实数的运算,可以使用插值、外推、回归分析和曲线拟合等技术。如果 g 的余域(范围或目标集)是一个有限集,那么我们就是在处理一个分类问题。在线时间序列的一个相关问题接近于 andhi,Sorabh,Luca Foschini,和 Subhash Suri。时间序列数据的空间有效在线近似: 数据流、失忆和无序数据工程,2010年 IEEE 第26届国际会议。2010.是对数据进行一次总结,构造一个近似表示,可以支持各种时间序列查询,最坏情况下的错误界限。
     −
To some extent, the different problems ([[regression analysis|regression]], [[Statistical classification|classification]], [[fitness approximation]]) have received a unified treatment in [[statistical learning theory]], where they are viewed as [[supervised learning]] problems.
+
时间序列分析的目标函数,例如 g,可能是未知的。根据 g 的畴和余畴的结构,几种近似 g 的方法可能是适用的。例如,如果 g 是对实数的运算,可以使用插值、外推、回归分析和曲线拟合等技术。如果 g 的余域(范围或目标集)是一个有限集,那么我们就是在处理一个分类问题。
   −
To some extent, the different problems (regression, classification, fitness approximation) have received a unified treatment in statistical learning theory, where they are viewed as supervised learning problems.
      
在某种程度上,不同的问题(回归、分类、适应度逼近)在统计学习理论中得到了统一的处理,它们被视为监督式学习问题。
 
在某种程度上,不同的问题(回归、分类、适应度逼近)在统计学习理论中得到了统一的处理,它们被视为监督式学习问题。
   −
===Prediction and forecasting===
+
===时间序列分析的预测功能===
 
In [[statistics]], [[prediction]] is a part of [[statistical inference]]. One particular approach to such inference is known as [[predictive inference]], but the prediction can be undertaken within any of the several approaches to statistical inference. Indeed, one description of statistics is that it provides a means of transferring knowledge about a sample of a population to the whole population, and to other related populations, which is not necessarily the same as prediction over time. When information is transferred across time, often to specific points in time, the process is known as [[forecasting]].
 
In [[statistics]], [[prediction]] is a part of [[statistical inference]]. One particular approach to such inference is known as [[predictive inference]], but the prediction can be undertaken within any of the several approaches to statistical inference. Indeed, one description of statistics is that it provides a means of transferring knowledge about a sample of a population to the whole population, and to other related populations, which is not necessarily the same as prediction over time. When information is transferred across time, often to specific points in time, the process is known as [[forecasting]].
 
* Fully formed statistical models for [[stochastic simulation]] purposes, so as to generate alternative versions of the time series, representing what might happen over non-specific time-periods in the future
 
* Fully formed statistical models for [[stochastic simulation]] purposes, so as to generate alternative versions of the time series, representing what might happen over non-specific time-periods in the future
35

个编辑