| The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees and decision rules (1960s), and support vector machines (1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets. | | The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees and decision rules (1960s), and support vector machines (1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets. |
− | 从数据中手工提取图案已经发生了几个世纪了。早期识别数据中模式的方法包括贝叶斯定理(17世纪)和回归分析定理(19世纪)。计算机技术的扩散、普及和不断增强的能力极大地提高了数据的收集、存储和操作能力。随着数据集的规模和复杂性的增长,直接的“实际操作”数据分析越来越多地借助于间接的、自动化的数据处理,辅之以计算机科学领域的其他发现,特别是在机器学习领域,如神经网络、数据聚类、遗传算法(1950年代)、决策树和决策规则(1960年代) ,以及支持向量机(1990年代)。数据挖掘就是应用这些方法来发现大型数据集中的隐藏模式的过程。它利用数据在数据库中存储和索引的方式,更有效地执行实际的学习和发现算法,从而弥补了从应用统计学和人工智能(通常提供数学背景)到数据库管理之间的差距,使这些方法能够应用于更大的数据集。
| + | 从数据中手动提取模式的方法已经持续了好几个世纪了。早期识别数据模式的方法包括17世纪的'''<font color="#ff8000">贝叶斯定理 Bayes' Theorem</font>'''和19世纪的'''<font color="#ff8000">回归分析 Regression Analysis</font>'''。计算机技术的扩散、其普遍性和日益强大的能力极大地提高了数据的收集、存储和操作能力。随着数据集的规模和复杂性的增长,手动分析数据的方法越来越多地被更强的间接、自动化的数据处理所取代,这都得益于计算机科学其他领域取得的新的进步,特别是机器学习领域的'''<font color="#ff8000">神经网络 Neural Networks</font>'''、'''<font color="#ff8000">聚类分析 Cluster Analysis</font>'''、遗传算法 Genetic Algorithms</font>'''(1950年代),'''<font color="#ff8000">决策树 Decision Tree</font>'''和'''<font color="#ff8000">决策规则 Decision Rules</font>'''(1960年代)以及'''<font color="#ff8000">支持向量机 Support Vector Machines</font>'''(1990年代)等。数据挖掘就是应用这些方法来发现大型数据集中的隐藏模式的过程。它利用数据在数据库中存储和索引的方式,更有效地执行实际的学习和发现算法,从而弥补了从应用统计学和人工智能(通常提供数学背景)到数据库管理之间的差距,使这些方法能够应用于更大的数据集。 |