更改

删除11,354字节 、 2020年9月22日 (二) 09:43
无编辑摘要
第12行: 第12行:       −
The term "data mining" is a [[misnomer]], because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (''mining'') of data itself.<ref name="han-kamber">{{cite book|title=Data mining: concepts and techniques|last1=Han|first1=Jiawei|last2=Kamber|first2=Micheline|date=2001|publisher=[[Morgan Kaufmann]]|isbn=978-1-55860-489-6|page=5|quote=Thus, data mining should have been more appropriately named "knowledge mining from data," which is unfortunately somewhat long|authorlink1=Jiawei Han}}</ref> It also is a [[buzzword]]<ref>[http://www.okairp.org/documents/2005%20Fall/F05_ROMEDataQualityETC.pdf OKAIRP 2005 Fall Conference, Arizona State University] {{Webarchive|url=https://web.archive.org/web/20140201170452/http://www.okairp.org/documents/2005%20Fall/F05_ROMEDataQualityETC.pdf|date=2014-02-01}}</ref> and is frequently applied to any form of large-scale data or [[information processing]] ([[Data collection|collection]], [[information extraction|extraction]], [[Data warehouse|warehousing]], analysis, and statistics) as well as any application of [[Decision support system|computer decision support system]], including [[artificial intelligence]] (e.g., machine learning) and [[business intelligence]]. The book ''Data mining: Practical machine learning tools and techniques with Java''<ref name="witten">{{cite book|title=Data Mining: Practical Machine Learning Tools and Techniques|last1=Witten|first1=Ian H.|last2=Frank|first2=Eibe|last3=Hall|first3=Mark A.|date=30 January 2011|publisher=Elsevier|isbn=978-0-12-374856-0|edition=3|authorlink1=Ian H. Witten}}</ref> (which covers mostly machine learning material) was originally to be named just ''Practical machine learning'', and the term ''data mining'' was only added for marketing reasons.<ref>{{Cite journal|author1=Bouckaert, Remco R.|author2=Frank, Eibe|author3=Hall, Mark A.|author4=Holmes, Geoffrey|author5=Pfahringer, Bernhard|author6=Reutemann, Peter|author7=Witten, Ian H.|authorlink7=Ian H. Witten|year=2010|title=WEKA Experiences with a Java open-source project|journal=Journal of Machine Learning Research|volume=11|pages=2533–2541|quote=the original title, "Practical machine learning", was changed&nbsp;... The term "data mining" was [added] primarily for marketing reasons.|postscript={{inconsistent citations}}}}</ref> Often the more general terms (''large scale'') ''[[data analysis]]'' and ''[[analytics]]'' – or, when referring to actual methods, ''artificial intelligence'' and ''machine learning'' – are more appropriate.
+
The term "data mining" is a [[misnomer]], because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (''mining'') of data itself. It also is a [[buzzword]]and is frequently applied to any form of large-scale data or [[information processing]] ([[Data collection|collection]], [[information extraction|extraction]], [[Data warehouse|warehousing]], analysis, and statistics) as well as any application of [[Decision support system|computer decision support system]], including [[artificial intelligence]] (e.g., machine learning) and [[business intelligence]]. The book ''Data mining: Practical machine learning tools and techniques with Java''(which covers mostly machine learning material) was originally to be named just ''Practical machine learning'', and the term ''data mining'' was only added for marketing reasons. Often the more general terms (''large scale'') ''[[data analysis]]'' and ''[[analytics]]'' – or, when referring to actual methods, ''artificial intelligence'' and ''machine learning'' – are more appropriate.
    
The term "data mining" is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence (e.g., machine learning) and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons. Often the more general terms (large scale) data analysis and analytics – or, when referring to actual methods, artificial intelligence and machine learning – are more appropriate.
 
The term "data mining" is a misnomer, because the goal is the extraction of patterns and knowledge from large amounts of data, not the extraction (mining) of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence (e.g., machine learning) and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons. Often the more general terms (large scale) data analysis and analytics – or, when referring to actual methods, artificial intelligence and machine learning – are more appropriate.
   −
“数据挖掘”这种形容其实并不十分恰当,因为我们的目标是从大量数据中提取模式和知识,而不是数据本身的提取(挖掘)。它是一个流行语,经常用于任何形式的大规模数据或信息处理(收集、提取、仓储、分析和统计)的场景下,以及'''<font color="#ff8000"> 计算机决策系统 Decision Support System,DSS</font>'''的任何应用当中,包括人工智能(例如机器学习)和商业智能。《数据挖掘:使用Java的实用机器学习工具和技术》(主要涵盖机器学习材料)一书最初被命名为《实用机器学习》,而数据挖掘一词只是为了营销的原因而增加。经常更一般的术语例如(大规模)数据分析和分析——或当提到实际的方法时使用人工智能和机器学习这样的词语更加合适。
+
“数据挖掘”这种形容其实并不'''太'''恰当,因为我们的目标是从大量数据中提取模式和知识,而不是数据本身的提取(挖掘)。<ref name="han-kamber">{{cite book|title=Data mining: concepts and techniques|last1=Han|first1=Jiawei|last2=Kamber|first2=Micheline|date=2001|publisher=[[Morgan Kaufmann]]|isbn=978-1-55860-489-6|page=5|quote=Thus, data mining should have been more appropriately named "knowledge mining from data," which is unfortunately somewhat long|authorlink1=Jiawei Han}}</ref>“它是一个经常被用于各种大规模数据或信息处理(收集、提取、存储、分析和统计),以及包括人工智能(例如机器学习)和商业智能的'''<font color="#ff8000"> 计算机决策系统 Decision Support System,DSS</font>'''等场合的流行语”<ref>[http://www.okairp.org/documents/2005%20Fall/F05_ROMEDataQualityETC.pdf OKAIRP 2005 Fall Conference, Arizona State University] {{Webarchive|url=https://web.archive.org/web/20140201170452/http://www.okairp.org/documents/2005%20Fall/F05_ROMEDataQualityETC.pdf|date=2014-02-01}}</ref>。 《数据挖掘:使用Java的实用机器学习工具和技术》<ref name="witten">{{cite book|title=Data Mining: Practical Machine Learning Tools and Techniques|last1=Witten|first1=Ian H.|last2=Frank|first2=Eibe|last3=Hall|first3=Mark A.|date=30 January 2011|publisher=Elsevier|isbn=978-0-12-374856-0|edition=3|authorlink1=Ian H. Witten}}</ref> (主要提供了一些机器学习的资料)一书最初被命名为《实用机器学习》,而数据挖掘一词只是为了销量更好而增加的。<ref>{{Cite journal|author1=Bouckaert, Remco R.|author2=Frank, Eibe|author3=Hall, Mark A.|author4=Holmes, Geoffrey|author5=Pfahringer, Bernhard|author6=Reutemann, Peter|author7=Witten, Ian H.|authorlink7=Ian H. Witten|year=2010|title=WEKA Experiences with a Java open-source project|journal=Journal of Machine Learning Research|volume=11|pages=2533–2541|quote=the original title, "Practical machine learning", was changed&nbsp;... The term "data mining" was [added] primarily for marketing reasons.|postscript={{inconsistent citations}}}}</ref>经常来说,更一般的术语如(大规模)数据分析,或实际的方法如人工智能和机器学习,是更合适的表达方式。
    
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“经常更一般的术语例如(大规模)数据分析和分析——或当提到实际的方法时使用人工智能和机器学习这样的词语更加合适”一句改为“经常来说,更一般的术语如(大规模)数据分析,或实际的方法如人工智能和机器学习,是更合适的表达方式”
 
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“经常更一般的术语例如(大规模)数据分析和分析——或当提到实际的方法时使用人工智能和机器学习这样的词语更加合适”一句改为“经常来说,更一般的术语如(大规模)数据分析,或实际的方法如人工智能和机器学习,是更合适的表达方式”
第32行: 第32行:  
The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, but do belong to the overall KDD process as additional steps.
 
The actual data mining task is the semi-automatic or automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining, sequential pattern mining). This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics. For example, the data mining step might identify multiple groups in the data, which can then be used to obtain more accurate prediction results by a decision support system. Neither the data collection, data preparation, nor result interpretation and reporting is part of the data mining step, but do belong to the overall KDD process as additional steps.
   −
实际的数据挖掘任务是对大量数据进行半自动或全自动分析,以提取出从前未知的且有趣的模式,如数据记录组(数据聚类)、异常记录组(异常检测)和依赖关系(关联规则挖掘,序列挖掘)。这通常涉及使用数据库技术,如空间索引。这些模式可以被看作是输入数据的一种汇总,并且可以用于进一步的分析,例如机器学习和预测分析。例如,数据挖掘步骤可以识别数据中的多个组,然后可以使用该步骤通过决策支持系统获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分,而是作为附加步骤属于整个 KDD 过程。
+
实际的数据挖掘任务是对大量数据进行半自动或全自动分析,以发掘从前未知的且新奇的模式,如数据记录组(数据聚类)、异常记录组(异常检测)和依赖关系(关联规则挖掘,序列挖掘)。这通常涉及使用数据库技术,如空间索引。这些模式可以被看作是输入数据的一种汇总,并且可以用于进一步的分析,例如机器学习和预测分析。例如,数据挖掘的过程中可以把数据分成多个组,然后可以使用该步骤通过决策支持系统获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分,而是作为附加步骤属于整个 KDD 过程。
    
如数据记录组('''<font color="#ff8000">聚类分析 Cluster Analysis</font>''')、异常记录('''<font color="#ff8000">异常检测 Anomaly Detection</font>''')和依赖关系('''<font color="#ff8000">关联规则挖掘 Association Rule Mining</font>'''、'''<font color="#ff8000">序列模式挖掘 Sequential Pattern Mining</font>''')。这通常涉及到使用数据库技术,如空间索引。这些模式可以被看作是输入数据的一种规律总结,可以用于进一步的分析,或者,例如,在机器学习和预测分析中。例如,通过数据挖掘可以出识别数据中的多个组,然后这些组可以通过使用决策支持系统来获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分,而是整个KDD过程附加的步骤。
 
如数据记录组('''<font color="#ff8000">聚类分析 Cluster Analysis</font>''')、异常记录('''<font color="#ff8000">异常检测 Anomaly Detection</font>''')和依赖关系('''<font color="#ff8000">关联规则挖掘 Association Rule Mining</font>'''、'''<font color="#ff8000">序列模式挖掘 Sequential Pattern Mining</font>''')。这通常涉及到使用数据库技术,如空间索引。这些模式可以被看作是输入数据的一种规律总结,可以用于进一步的分析,或者,例如,在机器学习和预测分析中。例如,通过数据挖掘可以出识别数据中的多个组,然后这些组可以通过使用决策支持系统来获得更准确的预测结果。数据收集、数据准备、结果解释和报告都不是数据挖掘步骤的一部分,而是整个KDD过程附加的步骤。
第39行: 第39行:     
   --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]]) 【审校】“ 例如,数据挖掘步骤可以识别数据中的多个组”改为“例如数据挖掘的过程中可以把数据分成多个组”
 
   --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]]) 【审校】“ 例如,数据挖掘步骤可以识别数据中的多个组”改为“例如数据挖掘的过程中可以把数据分成多个组”
  −
The difference between [[data analysis]] and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.<ref>Olson, D. L. (2007). Data mining in business services. ''Service Business'', ''1''(3), 181-193. {{doi|10.1007/s11628-006-0014-7}}</ref>
  −
  −
The difference between data analysis and data mining is that data analysis is used to test models and hypotheses on the dataset, e.g., analyzing the effectiveness of a marketing campaign, regardless of the amount of data; in contrast, data mining uses machine learning and statistical models to uncover clandestine or hidden patterns in a large volume of data.
      
'''<font color="#ff8000">数据分析 Data Analysis</font>'''和数据挖掘的区别在于,数据分析用于测试数据集上的模型和假设,例如,分析营销活动的有效性,而不是考虑数据量的多少;相反,数据挖掘使用机器学习和统计模型来发现“大量”数据中的秘密和隐藏的模式。
 
'''<font color="#ff8000">数据分析 Data Analysis</font>'''和数据挖掘的区别在于,数据分析用于测试数据集上的模型和假设,例如,分析营销活动的有效性,而不是考虑数据量的多少;相反,数据挖掘使用机器学习和统计模型来发现“大量”数据中的秘密和隐藏的模式。
  −
      
The related terms ''[[data dredging]]'', ''data fishing'', and ''data snooping'' refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.
 
The related terms ''[[data dredging]]'', ''data fishing'', and ''data snooping'' refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.
第52行: 第46行:  
The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.
 
The related terms data dredging, data fishing, and data snooping refer to the use of data mining methods to sample parts of a larger population data set that are (or may be) too small for reliable statistical inferences to be made about the validity of any patterns discovered. These methods can, however, be used in creating new hypotheses to test against the larger data populations.
   −
相关术语'''<font color="#ff8000">“数据疏浚” Data Dredging</font>'''、“数据钓鱼”和“数据窥探”是指使用数据挖掘的方法对较大的人口数据集中的一部分进行抽样,这些数据集可能太小,无法对所发现的任何模式的有效性作出可靠的统计推断。但是,这些方法可以用于提出新的假设,以针对更大的数据群体进行测试。
+
相关术语'''<font color="#ff8000">“数据疏浚” Data Dredging</font>'''、“数据钓鱼”和“数据窥探”是指使用数据挖掘的方法对较大规模的数据集中的一部分进行抽样,这些数据集可能太小,无法可靠统计推断发现模式的有效性。但是,这些方法可以用于提出新的假设,以针对更大的数据群体进行测试。
    
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“使用数据挖掘方法对较大的人口数据集中的一部分进行抽样”中的“较大的人口数据集”改为“较大规模的数据集”
 
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“使用数据挖掘方法对较大的人口数据集中的一部分进行抽样”中的“较大的人口数据集”改为“较大规模的数据集”
第61行: 第55行:  
==起源 Etymology==
 
==起源 Etymology==
   −
In the 1960s, statisticians and economists used terms like ''data fishing'' or ''data dredging'' to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. The term "data mining" was used in a similarly critical way by economist [[Michael Lovell]] in an article published in the ''[[Review of Economic Studies]]'' in 1983.<ref>{{Cite journal|last=Lovell|first=Michael C.|date=1983|title=Data Mining|journal=The Review of Economics and Statistics|volume=65|issue=1|pages=1–12|doi=10.2307/1924403|jstor=1924403}}</ref><ref>{{cite book |first=Wojciech W. |last=Charemza |first2=Derek F. |last2=Deadman |title=New Directions in Econometric Practice |location=Aldershot |publisher=Edward Elgar |year=1992 |chapter=Data Mining |pages=14–31 |isbn=1-85278-461-X }}</ref> Lovell indicates that the practice "masquerades under a variety of aliases, ranging from "experimentation" (positive) to "fishing" or "snooping" (negative).
+
在20世纪60年代,统计学家和经济学家们曾经使用“数据钓鱼”或”数据疏浚“等术语来指代他们认为在没有先验假设的情况下进行数据分析的糟糕做法。经济学家迈克尔•洛弗尔 Michael Lovell 在1983年<ref>{{Cite journal|last=Lovell|first=Michael C.|date=1983|title=Data Mining|journal=The Review of Economics and Statistics|volume=65|issue=1|pages=1–12|doi=10.2307/1924403|jstor=1924403}}</ref><ref>{{cite book |first=Wojciech W. |last=Charemza |first2=Derek F. |last2=Deadman |title=New Directions in Econometric Practice |location=Aldershot |publisher=Edward Elgar |year=1992 |chapter=Data Mining |pages=14–31 |isbn=1-85278-461-X }}</ref>《经济研究评论》(Review of Economic Studies)上发表的一篇文章中,也以类似的批判方式使用了“数据挖掘”这个术语。Lovell 指出,这种做法有很多别名,比如正面说法"实验",负面说法“钓鱼”、“窥探”等。
 
  −
In the 1960s, statisticians and economists used terms like data fishing or data dredging to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. The term "data mining" was used in a similarly critical way by economist Michael Lovell in an article published in the Review of Economic Studies in 1983. Lovell indicates that the practice "masquerades under a variety of aliases, ranging from "experimentation" (positive) to "fishing" or "snooping" (negative).
  −
 
  −
在20世纪60年代,统计学家和经济学家们曾经使用“数据钓鱼”或”数据疏浚“等术语来指代他们认为在没有先验假设的情况下进行数据分析的糟糕做法。经济学家迈克尔•洛弗尔 Michael Lovell 在1983年《经济研究评论》(Review of Economic Studies)上发表的一篇文章中,也以类似的批判方式使用了“数据挖掘”这个术语。Lovell 指出,这种做法“伪装成各种别名,从“实验”(正面)到“钓鱼”或“窥探”(负面)。
      
   --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]]) 【审校】“这种做法“伪装成各种别名,从“实验”(正面)到“钓鱼”或“窥探”(负面)。”改为“这种做法有很多别名,比如正面说法"实验",负面说法“钓鱼”、“窥探”等。
 
   --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]]) 【审校】“这种做法“伪装成各种别名,从“实验”(正面)到“钓鱼”或“窥探”(负面)。”改为“这种做法有很多别名,比如正面说法"实验",负面说法“钓鱼”、“窥探”等。
   −
The term ''data mining'' appeared around 1990 in the database community, generally with positive connotations. For a short time in 1980s, a phrase "database mining"™, was used, but since it was trademarked by HNC, a San Diego-based company, to pitch their Database Mining Workstation;<ref name="Mena">{{cite book |last=Mena |first=Jesús |year=2011 |title=Machine Learning Forensics for Law Enforcement, Security, and Intelligence |location=Boca Raton, FL |publisher=CRC Press (Taylor & Francis Group) |isbn=978-1-4398-6069-4 }}</ref> researchers consequently turned to ''data mining''. Other terms used include ''data archaeology'', ''information harvesting'', ''information discovery'', ''knowledge extraction'', etc. [[Gregory I. Piatetsky-Shapiro|Gregory Piatetsky-Shapiro]] coined the term "knowledge discovery in databases" for the first workshop on the same topic [http://www.kdnuggets.com/meetings/kdd89/ (KDD-1989)] and this term became more popular in [[Artificial intelligence|AI]] and [[machine learning]] community. However, the term data mining became more popular in the business and press communities.<ref>{{cite web |last1=Piatetsky-Shapiro |first1=Gregory |authorlink1=Gregory Piatetsky-Shapiro |last2=Parker |first2=Gary |url=http://www.kdnuggets.com/data_mining_course/x1-intro-to-data-mining-notes.html |title=Lesson: Data Mining, and Knowledge Discovery: An Introduction |publisher=KD Nuggets |year=2011 |work=Introduction to Data Mining |accessdate=30 August 2012 }}</ref> Currently, the terms ''data mining'' and ''knowledge discovery'' are used interchangeably.
+
数据挖掘这个术语在1990年左右在数据库领域出现,通常有着积极的含义。在20世纪80年代的一段短暂的时间里,人们曾使用过“数据库挖掘”这种表达,但由于这个词被圣地亚哥的HNC公司注册为商标,因此研究人员改用了数据挖掘这个词。<ref name="Mena">{{cite book |last=Mena |first=Jesús |year=2011 |title=Machine Learning Forensics for Law Enforcement, Security, and Intelligence |location=Boca Raton, FL |publisher=CRC Press (Taylor & Francis Group) |isbn=978-1-4398-6069-4 }}</ref>曾用过的其他术语包括数据考古学、信息收集、信息发现、知识提取等。格雷戈里·皮亚特斯基·夏皮罗 Gregory Piatetsky-Shapiro 在关于这个主题的第一个研讨会[ http://www.kdnuggets.com/meetings/kdd89/ (KDD-1989)] 上首次提出了“数据库中的知识发现 Knowledge Discovery in Databases,KDD”这个术语。此后,这个术语在人工智能和机器学习群体中变得更加流行。然而,数据挖掘这个术语在商业和出版界变得越来越流行。<ref>{{cite web |last1=Piatetsky-Shapiro |first1=Gregory |authorlink1=Gregory Piatetsky-Shapiro |last2=Parker |first2=Gary |url=http://www.kdnuggets.com/data_mining_course/x1-intro-to-data-mining-notes.html |title=Lesson: Data Mining, and Knowledge Discovery: An Introduction |publisher=KD Nuggets |year=2011 |work=Introduction to Data Mining |accessdate=30 August 2012 }}</ref> 目前,数据挖掘和知识发现 knowledge discovery这两个术语可以互换使用。
 
  −
The term data mining appeared around 1990 in the database community, generally with positive connotations. For a short time in 1980s, a phrase "database mining"™, was used, but since it was trademarked by HNC, a San Diego-based company, to pitch their Database Mining Workstation; researchers consequently turned to data mining. Other terms used include data archaeology, information harvesting, information discovery, knowledge extraction, etc. Gregory Piatetsky-Shapiro coined the term "knowledge discovery in databases" for the first workshop on the same topic (KDD-1989) and this term became more popular in AI and machine learning community. However, the term data mining became more popular in the business and press communities. Currently, the terms data mining and knowledge discovery are used interchangeably.
  −
 
  −
数据挖掘这个术语在1990年左右出现在数据库领域,通常有着积极的内涵。在20世纪80年代的一段短暂的时间里,人们曾使用过“数据库挖掘”这种表达,但由于这个词被总部位于圣地亚哥的 HNC 公司注册为商标,因此研究人员转向了数据挖掘。曾用过的其他术语包括数据考古学、信息收集、信息发现、知识提取等。格雷戈里·皮亚特斯基·夏皮罗 Gregory Piatetsky-Shapiro 在关于这个主题的第一个研讨会[ http://www.kdnuggets.com/meetings/kdd89/ (KDD-1989)] 上首次提出了“数据库中的知识发现 Knowledge Discovery in Databases,KDD”这个术语。此后,这个术语在人工智能和机器学习领域中变得更加流行。然而,数据挖掘这个术语在商业和出版界变得越来越流行。目前,数据挖掘和知识发现 knowledge discovery这两个术语可以互换使用。
      
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“但由于这个词被总部位于圣地亚哥的 HNC 公司注册为商标”中的“总部位于圣地亚哥的HNC公司”改为“圣地亚哥的HNC公司”
 
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“但由于这个词被总部位于圣地亚哥的 HNC 公司注册为商标”中的“总部位于圣地亚哥的HNC公司”改为“圣地亚哥的HNC公司”
第83行: 第69行:  
   --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]]) 【审校】“因此研究人员转向了数据挖掘”改为“因此研究人员改用了数据挖掘这个词”
 
   --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]]) 【审校】“因此研究人员转向了数据挖掘”改为“因此研究人员改用了数据挖掘这个词”
   −
In the academic community, the major forums for research started in 1995 when the First International Conference on Data Mining and Knowledge Discovery ([[KDD-95]]) was started in Montreal under [[AAAI]] sponsorship. It was co-chaired by [[Usama Fayyad]] and Ramasamy Uthurusamy. A year later, in 1996, Usama Fayyad launched the journal by Kluwer called [[Data Mining and Knowledge Discovery]] as its founding editor-in-chief. Later he started the [[SIGKDD]] Newsletter SIGKDD Explorations.<ref name=SIGKDD-explorations>{{cite journal|last1=Fayyad|first1=Usama|title=First Editorial by Editor-in-Chief|journal=SIGKDD Explorations|date=15 June 1999|volume=13|issue=1|pages=102|doi=10.1145/2207243.2207269|url=http://www.kdd.org/explorations/view/june-1999-volume-1-issue-1|accessdate=27 December 2010|ref=SIGKDD-explorations}}</ref> The KDD International conference became the primary highest quality conference in data mining with an acceptance rate of research paper submissions below 18%. The journal ''Data Mining and Knowledge Discovery'' is the primary research journal of the field.
+
学术界主要的研究论坛始于1995年,当时,在AAAI的赞助下,第一届数据挖掘和知识发现国际会议(KDD-95)在蒙特利尔召开。会议由乌萨马·法耶兹 Usama Fayyad和拉玛萨米·乌图鲁萨米 Ramasamy Uthurusamy共同主持。一年后,1996年Usama Fayyad创办了杂志《数据挖掘与知识发现》(datamining and Knowledge Discovery),担任创始主编。后来他创办了SIGKDD时事通讯探索。<ref name=SIGKDD-explorations>{{cite journal|last1=Fayyad|first1=Usama|title=First Editorial by Editor-in-Chief|journal=SIGKDD Explorations|date=15 June 1999|volume=13|issue=1|pages=102|doi=10.1145/2207243.2207269|url=http://www.kdd.org/explorations/view/june-1999-volume-1-issue-1|accessdate=27 December 2010|ref=SIGKDD-explorations}}</ref> KDD国际会议也成为了数据挖掘领域质量最高的主要会议,其研究论文提交的接受率低于18%,而《数据挖掘与知识发现》也成为了该领域的主要研究期刊。
 
  −
 
  −
In the academic community, the major forums for research started in 1995 when the First International Conference on Data Mining and Knowledge Discovery (KDD-95) was started in Montreal under AAAI sponsorship. It was co-chaired by Usama Fayyad and Ramasamy Uthurusamy. A year later, in 1996, Usama Fayyad launched the journal by Kluwer called Data Mining and Knowledge Discovery as its founding editor-in-chief. Later he started the SIGKDD Newsletter SIGKDD Explorations.The KDD International conference became the primary highest quality conference in data mining with an acceptance rate of research paper submissions below 18%. The journal Data Mining and Knowledge Discovery is the primary research journal of the field.
  −
 
  −
学术界主要的研究论坛始于1995年,当时,在AAAI的赞助下,第一届数据挖掘和知识发现国际会议(KDD-95)在蒙特利尔召开。会议由乌萨马·法耶兹 Usama Fayyad和拉玛萨米·乌图鲁萨米 Ramasamy Uthurusamy共同主持。一年后,1996年Usama Fayyad创办了杂志《数据挖掘与知识发现》(datamining and Knowledge Discovery),担任创始主编。后来他创办了SIGKDD时事通讯探索。KDD国际会议也成为了数据挖掘领域质量最高的主要会议,其研究论文提交的接受率低于18%,而《数据挖掘与知识发现》也成为了该领域的主要研究期刊。
        第94行: 第75行:  
==背景 Background==
 
==背景 Background==
   −
The manual extraction of patterns from [[data]] has occurred for centuries. Early methods of identifying patterns in data include [[Bayes' theorem]] (1700s) and [[regression analysis]] (1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As [[data set]]s have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as [[neural networks]], [[cluster analysis]], [[genetic algorithms]] (1950s), [[decision tree learning|decision trees]] and [[decision rules]] (1960s), and [[support vector machines]] (1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns<ref name="Kantardzic">{{cite book |last=Kantardzic |first=Mehmed |title=Data Mining: Concepts, Models, Methods, and Algorithms |year=2003 |publisher=John Wiley & Sons |isbn=978-0-471-22852-3 |oclc=50055336 |url-access=registration |url=https://archive.org/details/dataminingconcep0000kant }}</ref> in large data sets. It bridges the gap from [[applied statistics]] and artificial intelligence (which usually provide the mathematical background) to [[database management]] by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets.
     −
The manual extraction of patterns from data has occurred for centuries. Early methods of identifying patterns in data include Bayes' theorem (1700s) and regression analysis (1800s). The proliferation, ubiquity and increasing power of computer technology have dramatically increased data collection, storage, and manipulation ability. As data sets have grown in size and complexity, direct "hands-on" data analysis has increasingly been augmented with indirect, automated data processing, aided by other discoveries in computer science, specially in the field of machine learning, such as neural networks, cluster analysis, genetic algorithms (1950s), decision trees and decision rules (1960s), and support vector machines (1990s). Data mining is the process of applying these methods with the intention of uncovering hidden patterns in large data sets. It bridges the gap from applied statistics and artificial intelligence (which usually provide the mathematical background) to database management by exploiting the way data is stored and indexed in databases to execute the actual learning and discovery algorithms more efficiently, allowing such methods to be applied to ever-larger data sets.
+
从数据中手动提取模式的方法已经持续了好几个世纪了。早期识别数据模式的方法包括17世纪的'''<font color="#ff8000">贝叶斯定理 Bayes' Theorem</font>'''和19世纪的'''<font color="#ff8000">回归分析 Regression Analysis</font>'''。计算机技术的广泛使用和其能力的日益提高极大地提高了数据的收集、存储和操作能力。随着数据集的规模和复杂性的增长,直接、手动的分析数据的方法越来越多地被更有力的间接、自动化的数据处理所取代,这都得益于计算机科学其他领域取得的新的进步,特别是机器学习领域的'''<font color="#ff8000">神经网络 Neural Networks</font>'''、'''<font color="#ff8000">聚类分析 Cluster Analysis</font>'''、'''<font color="#ff8000">遗传算法 Genetic Algorithms</font>'''(1950年代),'''<font color="#ff8000">决策树 Decision Tree</font>'''和'''<font color="#ff8000">决策规则 Decision Rules</font>'''(1960年代)以及'''<font color="#ff8000">支持向量机 Support Vector Machines</font>'''(1990年代)等。数据挖掘就是应用这些方法来发现大型数据集中的隐藏模式<ref name="Kantardzic">{{cite book |last=Kantardzic |first=Mehmed |title=Data Mining: Concepts, Models, Methods, and Algorithms |year=2003 |publisher=John Wiley & Sons |isbn=978-0-471-22852-3 |oclc=50055336 |url-access=registration |url=https://archive.org/details/dataminingconcep0000kant }}</ref>的过程。它利用数据在数据库中存储和索引的方式,更有效地执行实际的学习和发现算法,从而弥补了从应用统计学和人工智能(通常提供数学背景)到数据库管理之间的差距,使这些方法能够应用于更大的数据集。
 
  −
从数据中手动提取模式的方法已经持续了好几个世纪了。早期识别数据模式的方法包括17世纪的'''<font color="#ff8000">贝叶斯定理 Bayes' Theorem</font>'''和19世纪的'''<font color="#ff8000">回归分析 Regression Analysis</font>'''。计算机技术的扩散、其普遍性和日益强大的能力极大地提高了数据的收集、存储和操作能力。随着数据集的规模和复杂性的增长,手动分析数据的方法越来越多地被更有力的间接、自动化的数据处理所取代,这都得益于计算机科学其他领域取得的新的进步,特别是机器学习领域的'''<font color="#ff8000">神经网络 Neural Networks</font>'''、'''<font color="#ff8000">聚类分析 Cluster Analysis</font>'''、'''<font color="#ff8000">遗传算法 Genetic Algorithms</font>'''(1950年代),'''<font color="#ff8000">决策树 Decision Tree</font>'''和'''<font color="#ff8000">决策规则 Decision Rules</font>'''(1960年代)以及'''<font color="#ff8000">支持向量机 Support Vector Machines</font>'''(1990年代)等。数据挖掘就是应用这些方法来发现大型数据集中的隐藏模式的过程。它利用数据在数据库中存储和索引的方式,更有效地执行实际的学习和发现算法,从而弥补了从应用统计学和人工智能(通常提供数学背景)到数据库管理之间的差距,使这些方法能够应用于更大的数据集。
      
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“手动分析数据的方法越来越多地被更强的间接、自动化的数据处理所取代”中的“手动分析数据”改为“直接、手动的分析数据”
 
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“手动分析数据的方法越来越多地被更强的间接、自动化的数据处理所取代”中的“手动分析数据”改为“直接、手动的分析数据”
第105行: 第83行:  
==发展过程 Process==
 
==发展过程 Process==
   −
The ''knowledge discovery in databases (KDD) process'' is commonly defined with the stages:
  −
  −
The knowledge discovery in databases (KDD) process is commonly defined with the stages:
      
知识发现 Knowledge Discovery in Databases ,KDD过程通常定义为以下几个阶段:
 
知识发现 Knowledge Discovery in Databases ,KDD过程通常定义为以下几个阶段:
       +
# 选择
   −
# Selection
+
# 预处理
   −
Selection
+
# 转换
   −
选择
+
# 数据挖掘
   −
# Pre-processing
+
# 解释 / 评估。
 
  −
Pre-processing
  −
 
  −
预处理
  −
 
  −
# Transformation
  −
 
  −
Transformation
  −
 
  −
转换
  −
 
  −
# ''Data mining''
  −
 
  −
Data mining
  −
 
  −
数据挖掘
  −
 
  −
# Interpretation/evaluation.<ref name="Fayyad" />
  −
 
  −
Interpretation/evaluation.
  −
 
  −
解释 / 评估。
  −
 
  −
 
  −
 
  −
It exists, however, in many variations on this theme, such as the [[Cross-industry standard process for data mining]] (CRISP-DM) which defines six phases:
  −
 
  −
It exists, however, in many variations on this theme, such as the Cross-industry standard process for data mining (CRISP-DM) which defines six phases:
      
知识发现还存在于与这个主题相关的其他主题中,例如在'''<font color="#ff8000">数据挖掘的跨行业标准流程 Cross-industry standard process for data mining,CRISP-DM</font>'''中它定义了以下六个阶段:
 
知识发现还存在于与这个主题相关的其他主题中,例如在'''<font color="#ff8000">数据挖掘的跨行业标准流程 Cross-industry standard process for data mining,CRISP-DM</font>'''中它定义了以下六个阶段:
    +
# 商业理解
   −
# Business understanding
+
# 数据理解
 
  −
Business understanding
  −
 
  −
商业理解
  −
 
  −
# Data understanding
  −
 
  −
Data understanding
  −
 
  −
数据理解
  −
 
  −
# Data preparation
  −
 
  −
Data preparation
  −
 
  −
数据准备
     −
# Modeling
+
# 数据准备
   −
Modeling
+
# 建模
   −
建模
+
# 评估
   −
# Evaluation
+
# 部署
 
  −
Evaluation
  −
 
  −
评估
  −
 
  −
# Deployment
  −
 
  −
Deployment
  −
 
  −
部署
  −
 
  −
 
  −
 
  −
or a simplified process such as (1) Pre-processing, (2) Data Mining, and (3) Results Validation.
  −
 
  −
or a simplified process such as (1) Pre-processing, (2) Data Mining, and (3) Results Validation.
      
或一个简化的过程,包括:(1)预处理,(2)数据挖掘,(3)结果验证。
 
或一个简化的过程,包括:(1)预处理,(2)数据挖掘,(3)结果验证。
   −
 
+
2002、2004、2007、2014年的调查显示,CRISP-DM标准是数据挖掘者最常用的标准,在这些调查中,唯一使用的其他数据挖掘标准是SEMMA<ref>[[Gregory Piatetsky-Shapiro]] (2002) [http://www.kdnuggets.com/polls/2002/methodology.htm ''KDnuggets Methodology Poll''], [[Gregory Piatetsky-Shapiro]] (2004) [http://www.kdnuggets.com/polls/2004/data_mining_methodology.htm ''KDnuggets Methodology Poll''], [[Gregory Piatetsky-Shapiro]] (2007) [http://www.kdnuggets.com/polls/2007/data_mining_methodology.htm ''KDnuggets Methodology Poll''], [[Gregory Piatetsky-Shapiro]] (2014) [http://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-methodology.html ''KDnuggets Methodology Poll'']</ref>。然而,使用CRISP-DM的人数是其3-4倍。一些研究小组已经发表了关于数据挖掘过程模型的研究,例如阿泽维多 Azevedo<ref name="kurgan">Lukasz Kurgan and Petr Musilek (2006); [http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=451120 ''A survey of Knowledge Discovery and Data Mining process models'']. The Knowledge Engineering Review. Volume 21 Issue 1, March 2006, pp&nbsp;1–24, Cambridge University Press, New York, NY, USA {{DOI|10.1017/S0269888906000737}}</ref>和 桑托斯Santos曾在2008年对CRISP-DM和SEMMA这两套数据挖掘流程标准进行了比较。<ref name="AzevedoSantos">Azevedo, A. and Santos, M. F. [http://www.iadis.net/dl/final_uploads/200812P033.pdf KDD, SEMMA and CRISP-DM: a parallel overview] {{webarchive|url=https://web.archive.org/web/20130109114939/http://www.iadis.net/dl/final_uploads/200812P033.pdf |date=2013-01-09 }}. In Proceedings of the IADIS European Conference on Data Mining 2008, pp&nbsp;182–185.</ref>
 
  −
Polls conducted in 2002, 2004, 2007 and 2014 show that the CRISP-DM methodology is the leading methodology used by data miners.<ref>[[Gregory Piatetsky-Shapiro]] (2002) [http://www.kdnuggets.com/polls/2002/methodology.htm ''KDnuggets Methodology Poll''], [[Gregory Piatetsky-Shapiro]] (2004) [http://www.kdnuggets.com/polls/2004/data_mining_methodology.htm ''KDnuggets Methodology Poll''], [[Gregory Piatetsky-Shapiro]] (2007) [http://www.kdnuggets.com/polls/2007/data_mining_methodology.htm ''KDnuggets Methodology Poll''], [[Gregory Piatetsky-Shapiro]] (2014) [http://www.kdnuggets.com/polls/2014/analytics-data-mining-data-science-methodology.html ''KDnuggets Methodology Poll'']</ref> The only other data mining standard named in these polls was [[SEMMA]]. However, 3–4 times as many people reported using CRISP-DM. Several teams of researchers have published reviews of data mining process models,<ref name="kurgan">Lukasz Kurgan and Petr Musilek (2006); [http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=451120 ''A survey of Knowledge Discovery and Data Mining process models'']. The Knowledge Engineering Review. Volume 21 Issue 1, March 2006, pp&nbsp;1–24, Cambridge University Press, New York, NY, USA {{DOI|10.1017/S0269888906000737}}</ref> and Azevedo and Santos conducted a comparison of CRISP-DM and SEMMA in 2008.<ref name="AzevedoSantos">Azevedo, A. and Santos, M. F. [http://www.iadis.net/dl/final_uploads/200812P033.pdf KDD, SEMMA and CRISP-DM: a parallel overview] {{webarchive|url=https://web.archive.org/web/20130109114939/http://www.iadis.net/dl/final_uploads/200812P033.pdf |date=2013-01-09 }}. In Proceedings of the IADIS European Conference on Data Mining 2008, pp&nbsp;182–185.</ref>
  −
 
  −
Polls conducted in 2002, 2004, 2007 and 2014 show that the CRISP-DM methodology is the leading methodology used by data miners. The only other data mining standard named in these polls was SEMMA. However, 3–4 times as many people reported using CRISP-DM. Several teams of researchers have published reviews of data mining process models, and Azevedo and Santos conducted a comparison of CRISP-DM and SEMMA in 2008.
  −
 
  −
在这些调查中,唯一使用的其他数据挖掘标准是SEMMA。然而,使用CRISP-DM的人数是其3-4倍。一些研究小组已经发表了关于数据挖掘过程模型的研究,例如阿泽维多 Azevedo和 桑托斯Santos曾在2008年对CRISP-DM和SEMMA这两套数据挖掘流程标准进行了比较。
      
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】开头添加“2002、2004、2007、2014年的调查显示,CRISP-DM标准是数据挖掘者最常用的标准”
 
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】开头添加“2002、2004、2007、2014年的调查显示,CRISP-DM标准是数据挖掘者最常用的标准”
    
===预处理 Pre-processing===
 
===预处理 Pre-processing===
  −
Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a [[data mart]] or [[data warehouse]]. Pre-processing is essential to analyze the [[Multivariate statistics|multivariate]] data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing [[statistical noise|noise]] and those with [[missing data]].
  −
  −
Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing noise and those with missing data.
      
在使用数据挖掘算法之前,必须先对目标数据集进行整合。由于数据挖掘只能发现数据中实际存在的模式,目标数据集必须足够大以包含这些模式,同时保持足够简洁以便在可接受的时间限制内进行挖掘。数据的公共源是数据集市或数据仓库。在数据挖掘之前,对多变量数据集进行预处理是必不可少的。然后清理目标集。数据清理去除了包含噪声的观测值和缺失数据的观测值。
 
在使用数据挖掘算法之前,必须先对目标数据集进行整合。由于数据挖掘只能发现数据中实际存在的模式,目标数据集必须足够大以包含这些模式,同时保持足够简洁以便在可接受的时间限制内进行挖掘。数据的公共源是数据集市或数据仓库。在数据挖掘之前,对多变量数据集进行预处理是必不可少的。然后清理目标集。数据清理去除了包含噪声的观测值和缺失数据的观测值。
第218行: 第125行:  
===数据挖掘 Data mining===
 
===数据挖掘 Data mining===
   −
Data mining involves six common classes of tasks:<ref name="Fayyad">{{cite web |last1=Fayyad |first1=Usama |authorlink1=Usama Fayyad |last2=Piatetsky-Shapiro |first2=Gregory|authorlink2=Gregory Piatetsky-Shapiro |last3=Smyth |first3=Padhraic |title=From Data Mining to Knowledge Discovery in Databases |year=1996 |url=http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf |accessdate = 17 December 2008 }}</ref>
+
数据挖掘涉及六类常见的任务:<ref name="Fayyad">{{cite web |last1=Fayyad |first1=Usama |authorlink1=Usama Fayyad |last2=Piatetsky-Shapiro |first2=Gregory|authorlink2=Gregory Piatetsky-Shapiro |last3=Smyth |first3=Padhraic |title=From Data Mining to Knowledge Discovery in Databases |year=1996 |url=http://www.kdnuggets.com/gpspubs/aimag-kdd-overview-1996-Fayyad.pdf |accessdate = 17 December 2008 }}</ref>
 
  −
Data mining involves six common classes of tasks:
  −
 
  −
数据挖掘涉及六类常见的任务:
  −
 
     −
 
+
* '''<font color="#ff8000">异常检测 Anomaly detection</font>'''(异常值/变化/偏差检测):识别异常数据记录,这可能是有趣的信息或需要进一步调查的数据错误。
* [[Anomaly detection]] (outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation.
  −
 
  −
'''<font color="#ff8000">异常检测 Anomaly detection</font>'''(异常值/变化/偏差检测):识别异常数据记录,发现可能是有趣的或需要进一步调查的数据错误。
      
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“发现可能是有趣的或需要进一步调查的数据错误”改为“这可能是有趣的信息或需要进一步调查的数据错误”
 
   --[[用户:Zengsihang|Zengsihang]]([[用户讨论:Zengsihang|讨论]]) 【审校】“发现可能是有趣的或需要进一步调查的数据错误”改为“这可能是有趣的信息或需要进一步调查的数据错误”
 
+
* '''<font color="#ff8000">关联规则学习 Association rule learning</font>'''(依赖关系建模):探寻变量之间的关系。例如,超市可能会收集顾客购买习惯的数据。通过使用关联规则学习,超市可以确定哪些产品经常被一起购买,并将这些信息用于营销策略改进。这种研究有时被称为“市场篮子分析”。
* [[Association rule learning]] (dependency modeling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis.
  −
 
  −
'''<font color="#ff8000">关联规则学习 Association rule learning</font>'''(依赖关系建模):探寻变量之间的关系。例如,超市可能会收集顾客购买习惯的数据。通过使用关联规则学习,超市可以确定哪些产品经常被一起购买,并将这些信息用于营销策略改进。这种研究有时被称为“市场篮子分析”。
      
* [[Cluster analysis|Clustering]] – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
 
* [[Cluster analysis|Clustering]] – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data.
863

个编辑