更改

数据挖掘 (查看源代码)

2020年8月29日 (六) 16:52的版本

添加1,460字节、 2020年8月29日 (六) 16:52

无编辑摘要

第17行：第17行：

The term "data mining" is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence (e.g., machine learning) and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons. Often the more general terms (large scale) data analysis and analytics – or, when referring to actual methods, artificial intelligence and machine learning – are more appropriate.

−

“数据挖掘”这种形容其实并不十分恰当，因为我们的目标是从大量数据中提取模式和知识，而不是数据本身的提取(挖掘)。它也是一个流行词，经常用于任何形式的大规模数据或信息处理(收集、提取、仓储、分析和统计) ，以及计算机决策支持系统的任何应用，包括人工智能(如机器学习)和商业智能。数据挖掘: 使用 Java 的实用机器学习工具和技术(主要包括机器学习材料)最初被命名为实用机器学习，而数据挖掘这个术语只是出于市场营销的原因而添加的。通常更一般的术语(大规模)数据分析和分析——或者，当涉及到实际的方法时，人工智能和机器学习——更合适。

+

“数据挖掘”这种形容其实并不十分恰当，因为我们的目标是从大量数据中提取模式和知识，而不是数据本身的提取(挖掘)。它是一个流行语，经常用于任何形式的大规模数据或信息处理（收集、提取、仓储、分析和统计）的场景下，以及''' 计算机决策系统 Decision Support System，DSS'''的任何应用当中，包括人工智能（例如机器学习）和商业智能。《数据挖掘：使用Java的实用机器学习工具和技术》（主要涵盖机器学习材料）一书最初被命名为“实用机器学习”，而数据挖掘一词只是为了营销的原因而增加。经常更一般的术语例如（大规模）数据分析和分析——或当提到实际的方法时使用人工智能和机器学习这样的词语更加合适。

−

第25行：第24行：

The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, but do belong to the overall KDD process as additional steps.

−

~~实际的数据挖掘任务是对大量数据进行半自动或自动分析，以提取以前未知的、有趣的模式，如数据记录组~~(数据聚类)、异常记录组(异常检测)和依赖关系(关联规则挖掘，序列挖掘)。这通常涉及使用数据库技术，如空间索引。这些模式可以被看作是输入数据的一种汇总，并且可以用于进一步的分析，或者，例如，机器学习和预测分析。例如，数据挖掘步骤可以识别数据中的多个组，然后可以使用该步骤通过决策支持系统获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分，而是作为附加步骤属于整个 KDD 过程。

+

实际上数据挖掘任务是对大量数据进行半自动或全自动分析，以提取出从前未知的且有趣的模式，如数据记录组(数据聚类)、异常记录组(异常检测)和依赖关系(关联规则挖掘，序列挖掘)。这通常涉及使用数据库技术，如空间索引。这些模式可以被看作是输入数据的一种汇总，并且可以用于进一步的分析，或者，例如，机器学习和预测分析。例如，数据挖掘步骤可以识别数据中的多个组，然后可以使用该步骤通过决策支持系统获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分，而是作为附加步骤属于整个 KDD 过程。

−

+

如数据记录组（'''聚类分析 Cluster Analysis'''）、异常记录（'''异常检测 Anomaly Detection'''）和依赖关系（'''关联规则挖掘 Association Rule Mining'''、'''序列模式挖掘 Sequential Pattern Mining'''）。这通常涉及到使用数据库技术，如空间索引。这些模式可以被看作是输入数据的一种规律总结，可以用于进一步的分析，或者，例如，在机器学习和预测分析中。例如，通过数据挖掘可以出识别数据中的多个组，然后这些组可以通过使用决策支持系统来获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分，而是属于整个KDD过程的附加步骤。

The difference between [[data analysis]] and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.<ref>Olson, D. L. (2007). Data mining in business services. ''Service Business'', ''1''(3), 181-193. {{doi|10.1007/s11628-006-0014-7}}</ref>

第33行：第32行：

The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.

−

数据分析和数据挖掘的区别在于，数据分析用于测试数据集上的模型和假设，例如，不管数据量多少，分析营销活动的有效性; 相比之下，数据挖掘使用机器学习和统计模型来发现大量数据中的秘密或隐藏模式。

+

'''数据分析 Data Analysis'''和数据挖掘的区别在于，数据分析用于测试数据集上的模型和假设，例如，分析营销活动的有效性，而不考虑数据量的多少；相反，数据挖掘使用机器学习和统计模型来发现“大量”数据中的秘密或隐藏模式。

第43行：第42行：

相关术语数据挖掘、数据捕捞和数据窥探是指使用数据挖掘方法对较大的人口数据集中的某些部分进行抽样，这些部分(或可能)太小，无法对所发现的任何模式的有效性做出可靠的统计推断。然而，这些方法可以用来创造新的假说，以测试较大的数据总体。

+

相关术语'''“数据疏浚” Data Dredging'''、“数据钓鱼”和“数据窥探”是指使用数据挖掘方法对较大的人口数据集中的一部分进行抽样，这些数据集太小（或可能太小），无法对所发现的任何模式的有效性作出可靠的统计推断。然而，这些方法可以用于创建新的假设，以针对更大的数据群体进行测试。

Yillia Jing

463

个编辑

更改

数据挖掘 (查看源代码)

2020年8月29日 (六) 16:52的版本

导航菜单

搜索