更改

数据挖掘 (查看源代码)

2020年9月11日 (五) 13:28的版本

添加420字节、 2020年9月11日 (五) 13:28

无编辑摘要

第20行：第20行：

“数据挖掘”这种形容其实并不十分恰当，因为我们的目标是从大量数据中提取模式和知识，而不是数据本身的提取(挖掘)。它是一个流行语，经常用于任何形式的大规模数据或信息处理（收集、提取、仓储、分析和统计）的场景下，以及'''<font color="#ff8000"> 计算机决策系统 Decision Support System，DSS</font>'''的任何应用当中，包括人工智能（例如机器学习）和商业智能。《数据挖掘：使用Java的实用机器学习工具和技术》（主要涵盖机器学习材料）一书最初被命名为“实用机器学习”，而数据挖掘一词只是为了营销的原因而增加。经常更一般的术语例如（大规模）数据分析和分析——或当提到实际的方法时使用人工智能和机器学习这样的词语更加合适。

+

--[[用户:Zengsihang|Zengsihang]]（[[用户讨论:Zengsihang|讨论]]）【审校】“经常更一般的术语例如（大规模）数据分析和分析——或当提到实际的方法时使用人工智能和机器学习这样的词语更加合适”一句改为“经常来说，更一般的术语如（大规模）数据分析，或实际的方法如人工智能和机器学习，是更合适的表达方式”

The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records ([[cluster analysis]]), unusual records ([[anomaly detection]]), and dependencies ([[association rule mining]], [[sequential pattern mining]]). This usually involves using database techniques such as [[spatial index|spatial indices]]. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and [[predictive analytics]]. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a [[decision support system]]. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, but do belong to the overall KDD process as additional steps.

Zengsihang

12

个编辑