Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a [[data mart]] or [[data warehouse]]. Pre-processing is essential to analyze the [[Multivariate statistics|multivariate]] data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing [[statistical noise|noise]] and those with [[missing data]]. | Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while remaining concise enough to be mined within an acceptable time limit. A common source for data is a [[data mart]] or [[data warehouse]]. Pre-processing is essential to analyze the [[Multivariate statistics|multivariate]] data sets before data mining. The target set is then cleaned. Data cleaning removes the observations containing [[statistical noise|noise]] and those with [[missing data]]. |