更改

添加8字节 、 2020年8月24日 (一) 20:29
第181行: 第181行:       −
===Pre-processing===
+
===预处理 Pre-processing===
    
Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a [[data mart]] or [[data warehouse]]. Pre-processing is essential to analyze the [[Multivariate statistics|multivariate]] data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing [[statistical noise|noise]] and those with [[missing data]].
 
Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a [[data mart]] or [[data warehouse]]. Pre-processing is essential to analyze the [[Multivariate statistics|multivariate]] data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing [[statistical noise|noise]] and those with [[missing data]].
第188行: 第188行:     
在使用数据挖掘算法之前,必须先组装目标数据集。由于数据挖掘只能发现数据中实际存在的模式,目标数据集必须足够大以包含这些模式,同时保持足够简洁以便在可接受的时间限制内进行挖掘。数据的公共源是数据集市或数据仓库。在数据挖掘之前,对多变量数据集进行预处理是必不可少的。然后清理目标集。数据清理去除了包含噪声的观测值和缺失数据的观测值。
 
在使用数据挖掘算法之前,必须先组装目标数据集。由于数据挖掘只能发现数据中实际存在的模式,目标数据集必须足够大以包含这些模式,同时保持足够简洁以便在可接受的时间限制内进行挖掘。数据的公共源是数据集市或数据仓库。在数据挖掘之前,对多变量数据集进行预处理是必不可少的。然后清理目标集。数据清理去除了包含噪声的观测值和缺失数据的观测值。
  −
      
===Data mining===
 
===Data mining===
463

个编辑