第11行: |
第11行: |
| Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively. | | Data analysis is a process of inspecting, cleansing, transforming and modeling data with the goal of discovering useful information, informing conclusions and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively. |
| | | |
− | '''<font color='#ff8000'>数据分析Data analysis</font>'''是一个对数据进行检查、'''<font color='#ff8000'>清理</font>'''、'''<font color='#ff8000'>转换</font>'''和'''<font color='#ff8000'>建模</font>'''的过程,其目的是发现有用的信息,为结论提供信息和支持决策。数据分析有多个方面和方法,包含了各种名称下的不同技术,被用于不同的商业、科学和社会科学领域。在当今的商业世界,数据分析在做出更科学的决策和帮助企业更有效地运营方面发挥着重要作用。 | + | '''<font color='#ff8000'>数据分析Data analysis</font>'''是一个对数据进行检查、'''<font color='#ff8000'>清理</font>'''、'''<font color='#ff8000'>转换</font>'''和'''<font color='#ff8000'>建模</font>'''的过程,其目的是发现有用的信息,为结论提供信息和支持决策。数据分析有多个方面和方法,包含了不同名称的技术,被用于不同的商业、自然科学和社会科学领域。在当今的商业世界,数据分析在更科学地做出的决策和帮助企业更有效地运营方面发挥着重要作用。 |
| | | |
| | | |
第19行: |
第19行: |
| Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis. | | Data mining is a particular data analysis technique that focuses on statistical modeling and knowledge discovery for predictive rather than purely descriptive purposes, while business intelligence covers data analysis that relies heavily on aggregation, focusing mainly on business information. In statistical applications, data analysis can be divided into descriptive statistics, exploratory data analysis (EDA), and confirmatory data analysis (CDA). EDA focuses on discovering new features in the data while CDA focuses on confirming or falsifying existing hypotheses. Predictive analytics focuses on application of statistical models for predictive forecasting or classification, while text analytics applies statistical, linguistic, and structural techniques to extract and classify information from textual sources, a species of unstructured data. All of the above are varieties of data analysis. |
| | | |
− | '''<font color='#ff8000'>数据挖掘</font>'''是一种特殊的数据分析技术,侧重于统计建模和知识发现,用于预测目的,而不是纯粹的描述目的,而商业智能涵盖了严重依赖于聚合的数据分析,主要侧重于商业信息。在统计应用中,数据分析可以分为'''<font color='#ff8000'>描述统计学分析</font>'''、'''<font color='#ff8000'>探索性数据分析</font>'''和'''<font color='#ff8000'>验证性数据分析</font>'''。Eda 侧重于发现数据中的新特征,而 CDA 侧重于确认或伪造现有的假设。预测分析的重点是应用统计模型进行预测预测或分类,而'''<font color='#ff8000'>文本分析</font>'''则应用统计学、语言学和结构化技术从文本来源中提取和分类信息,这是'''<font color='#ff8000'>非结构化数据</font>'''的一种。以上都是各种各样的数据分析。 | + | '''<font color='#ff8000'>数据挖掘</font>'''是一种特殊的数据分析技术,侧重于统计建模和知识发现的预测目的(而不是纯粹的描述目的)。同时,商业智能涵盖了严重依赖于聚合的数据分析,主要侧重于商业信息。在统计应用中,数据分析可以分为'''<font color='#ff8000'>描述统计学descriptive statistics</font>'''、'''<font color='#ff8000'>探索性数据分析exploratory data analysis (EDA)</font>'''和'''<font color='#ff8000'>验证性数据分析confirmatory data analysis (CDA)</font>'''。EDA 侧重于发现数据中的新特征,而 CDA 侧重于确认或证伪现有的假设。'''<font color='#ff8000'>预测分析Predictive analytics</font>'''的重点是应用统计模型进行预测或分类,而'''<font color='#ff8000'>文本分析text analytics</font>'''则应用统计学、语言学和结构化技术从文本源中提取和分类信息(文本是一种'''<font color='#ff8000'>非结构化数据</font>''')。以上是各种各样的数据分析。 |
| | | |
| | | |
第27行: |
第27行: |
| Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. | | Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. |
| | | |
− | '''<font color='#ff8000'>数据集成</font>'''是数据分析的先驱,数据分析与数据可视化和数据传播密切相关。 | + | '''<font color='#ff8000'>数据整合 Data integration</font>'''是数据分析的先驱,数据分析与'''<font color='#ff8000'>数据可视化data visualization</font>'''和'''<font color='#ff8000'>数据传播data dissemination</font>'''密切相关。 |
| | | |
| | | |
| | | |
− | ==The process of data analysis== | + | ==The process of data analysis 数据分析的流程== |
| | | |
| [[File:Data visualization process v1.png|right|350px|thumb|Data science process flowchart from ''Doing Data Science'', by Schutt & O'Neil (2013)]] | | [[File:Data visualization process v1.png|right|350px|thumb|Data science process flowchart from ''Doing Data Science'', by Schutt & O'Neil (2013)]] |