符号回归

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索

此词条暂由彩云小译翻译,翻译字数共712,未经人工整理和审校,带来阅读不便,请见谅。

模板:Use American English

文件:Genetic Program Tree.png
Expression tree as it can be used in symbolic regression to represent a function.

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

Symbolic regression (SR) is a type of regression analysis that searches the space of mathematical expressions to find the model that best fits a given dataset, both in terms of accuracy and simplicity.

符号回归(Symbolic regression,SR)是一种回归分析回归方法,它搜索数学表达式的空间,找到最适合给定数据集的模型,无论是在精确性还是简单性方面。

No particular model is provided as a starting point to the algorithm. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using genetic programming,[1] as well as more recently methods utilizing Bayesian methods[2] and physics-inspired AI.[3] Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy.[4] Further methods such as Exact Learning attempt to transform the fitting problem into a moments problem in a natural function space, usually built around generalisations of the Meijer-G function.[5]

No particular model is provided as a starting point to the algorithm. Instead, initial expressions are formed by randomly combining mathematical building blocks such as mathematical operators, analytic functions, constants, and state variables. Usually, a subset of these primitives will be specified by the person operating it, but that's not a requirement of the technique. The symbolic regression problem for mathematical functions has been tackled with a variety of methods, including recombining equations most commonly using genetic programming, as well as more recently methods utilizing Bayesian methods and physics-inspired AI. Another non-classical alternative method to SR is called Universal Functions Originator (UFO), which has a different mechanism, search-space, and building strategy. Further methods such as Exact Learning attempt to transform the fitting problem into a moments problem in a natural function space, usually built around generalisations of the Meijer-G function.

没有提供特定的模型作为算法的起点。相反,初始表达式是通过随机组合数学构造块,如数学运算符,解析函数,常数和状态变量。通常,这些原语的一个子集将由操作它的人指定,但这并不是该技术的要求。数学函数的符号回归问题已经用多种方法解决,包括最常用的遗传规划重组方程,以及最近使用贝叶斯方法和物理启发的人工智能的方法。另一种非经典的 SR 替代方法是泛函发起人(UFO) ,它具有不同的机制、搜索空间和构建策略。进一步的方法,例如精确学习,试图将拟合问题转化为自然函数空间中的矩问题,通常建立在 Meijer-G 函数的概括之上。

By not requiring a priori specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures,[6] thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system.

By not requiring a priori specification of a model, symbolic regression isn't affected by human bias, or unknown gaps in domain knowledge. It attempts to uncover the intrinsic relationships of the dataset, by letting the patterns in the data itself reveal the appropriate models, rather than imposing a model structure that is deemed mathematically tractable from a human perspective. The fitness function that drives the evolution of the models takes into account not only error metrics (to ensure the models accurately predict the data), but also special complexity measures, thus ensuring that the resulting models reveal the data's underlying structure in a way that's understandable from a human perspective. This facilitates reasoning and favors the odds of getting insights about the data-generating system.

符号回归不需要模型的先验规范,因此不会受到人为偏差的影响,也不会受到领域知识空白的影响。它试图揭示数据集的内在关系,通过让数据本身的模式揭示适当的模型,而不是强加一个从人的角度被认为是数学上可追溯的模型结构。驱动模型演化的适应度函数不仅考虑了错误度量(以确保模型准确地预测数据) ,还考虑了特殊的复杂度度量,从而确保生成的模型以一种从人类角度可以理解的方式揭示数据的底层结构。这有助于推理,并有利于获得关于数据生成系统的洞察力。

Difference from classical regression

Difference from classical regression

= = 与经典回归的差 =

While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters.

While conventional regression techniques seek to optimize the parameters for a pre-specified model structure, symbolic regression avoids imposing prior assumptions, and instead infers the model from the data. In other words, it attempts to discover both model structures and model parameters.

传统的回归技术寻求优化预先指定的模型结构的参数,符号回归避免强加先验假设,而是从数据推断模型。换句话说,它试图同时发现模型结构和模型参数。

This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system.

This approach has the disadvantage of having a much larger space to search, because not only the search space in symbolic regression is infinite, but there are an infinite number of models which will perfectly fit a finite data set (provided that the model complexity isn't artificially limited). This means that it will possibly take a symbolic regression algorithm longer to find an appropriate model and parametrization, than traditional regression techniques. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data; but in the end, using symbolic regression is a decision that has to be balanced with how much is known about the underlying system.

这种方法的缺点是搜索空间要大得多,因为不仅符号回归中的搜索空间是无限的,而且有无限多的模型完全适合有限的数据集(只要模型的复杂性不受人为限制)。这意味着符号回归算法可能比传统的回归技术需要更长的时间来寻找合适的模型和参数化。根据产生数据的系统的现有知识,通过限制向算法提供的构建块集,可以减少这个问题; 但最终,使用符号回归是一个必须与对底层系统的了解程度相平衡的决定。

Nevertheless, this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity.

Nevertheless, this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space, the result is likely to be a selection of high-scoring models (and their corresponding set of parameters). Examining this collection could provide better insight into the underlying process, and allows the user to identify an approximation that better fits their needs in terms of accuracy and simplicity.

然而,符号回归的这一特性也有其优点: 由于进化算法需要多样性来有效地探索搜索空间,结果很可能是一组高得分模型(及其相应的参数集)的选择。检查这个集合可以更好地了解底层流程,并允许用户确定一个近似值,这个近似值在精确性和简单性方面更好地满足他们的需要。

Software

Software

= 软件 =

End-user software

  • HeuristicLab, a software environment for heuristic and evolutionary algorithms, including symbolic regression (free, open source)
  • GeneXProTools, - an implementation of Gene expression programming technique for various problems including symbolic regression (commercial)
  • Multi Expression Programming X, an implementation of Multi expression programming for symbolic regression and classification (free, open source)
  • Eureqa, evolutionary symbolic regression software (commercial), and software library
  • PySR, symbolic regression environment written in Python and Julia, using regularized evolution, simulated annealing, and gradient-free optimization (free, open source)

= = = = = 终端用户软件 eureheuristiclab,一个启发式和进化算法的软件环境,包括符号回归(自由,开源)

  • GeneXProTools,-一个用于各种问题的基因表达式编程技术的实现,包括符号回归(商业)
  • 多表达式编程 x,一个用于符号回归和分类的多表达式编程(自由,开源)
  • 实现,进化符号回归软件(商业) ,以及软件库
  • PySR,用 Python 和 Julia 编写的符号回归环境,使用规则化进化、模拟退火和无梯度优化(自由,开源)

See also

  • Closed-form expression § Conversion from numerical forms
  • Genetic programming
  • Gene expression programming
  • Kolmogorov complexity
  • Linear genetic programming
  • Mathematical optimization
  • Multi expression programming
  • Regression analysis
  • Reverse mathematics
  • Discovery system (AI research)

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

References

引用错误:Closing tag missing for <references>




}}

Further reading

  • Mark J. Willis; Hugo G. Hiden; Ben McKay; Gary A. Montague; Peter Marenbach (1997). "Genetic programming: An introduction and survey of applications" (PDF). IEE Conference Publications. IEE. pp. 314–319.
  • Wouter Minnebo; Sean Stijven (2011). "Chapter 4: Symbolic Regression" (PDF). Empowering Knowledge Computing with Variable Selection (M.Sc. thesis). University of Antwerp.
  • John R. Koza; Martin A. Keane; James P. Rice (1993). "Performance improvement of machine learning via automatic discovery of facilitating functions as applied to a problem of symbolic system identification" (PDF). IEEE International Conference on Neural Networks. San Francisco: IEEE. pp. 191–198.

= 进一步阅读 =

External links

  • (Java applet) — approximates a function by evolving combinations of simple arithmetic operators, using algorithms developed by John Koza.

= = 外部链接 =

  • (Java applet)ーー使用 John Koza 开发的算法,通过简单算术运算符的进化组合来逼近函数。

Category:Regression analysis Category:Genetic programming Category:Computer algebra

类别: 回归分析类别: 遗传程序设计类别: 计算机代数


This page was moved from wikipedia:en:Symbolic regression. Its edit history can be viewed at 符号回归/edithistory