第15行: |
第15行: |
| Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level): | | Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level): |
| | | |
− | 知识的发现,是指不断地识别、分析和可视化数据中的内在模式,这是一个持续迭代和交互的过程。网络分析、链路分析和社会网络分析都是知识发现的方法,但它们都是属于'''<font color="#ff8000"> 先验方法</font>'''。大多数知识发现方法遵循以下几个步骤('''<font color="#32CD32">在最高级别</font>''') :
| + | 知识的发现,是指不断地识别、分析和可视化数据中的内在模式,这是一个持续迭代和交互的过程。网络分析、链路分析和社会网络分析都是知识发现的方法,它们都是属于'''<font color="#ff8000"> 先验方法</font>'''。大多数知识发现方法遵循以下几个步骤('''<font color="#32CD32">在最高级别</font>''') : |
| | | |
| | | |
第49行: |
第49行: |
| Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search. | | Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search. |
| | | |
− | 在得到数据后,需进行数据的收集和处理,但此过程存在一些固有的问题,包括信息超载和数据错误等。在数据被收集后,它将转换成一种人和计算机分析程序都能有效使用的格式。之后基于数据,计算机生成的或人工操作的可视化工具进行如网络图这样的作图。目前有几种算法可以帮助人类进行数据分析-Dijkstra算法,广度优先搜索和深度优先搜索。
| + | 数据的收集和处理是首先进行的过程,但此过程存在一些固有的问题,包括信息超载和数据错误等。在数据被收集后,它将转换成一种人和计算机分析程序都能有效使用的格式。之后基于数据,可使用计算机生成的或人工操作的可视化工具进行作图(如网络图)。目前有几种算法可以帮助人类进行数据分析-Dijkstra算法,广度优先搜索和深度优先搜索。 |
| | | |
| | | |
第177行: |
第177行: |
| In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered. | | In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered. |
| | | |
− | 除了关联矩阵外,活动矩阵也可用于生成对执法具有实用和使用价值的可操作的信息。正如这个术语可能暗示的那样,活动矩阵关注的是人们基于地点的行动和活动。而关联矩阵关注的是人,组织和/或属性之间的关系。这两类矩阵之间的区别虽然很小,但就已完成或经过分析的数据来看,区别还是很重要的。
| + | 除了关联矩阵外,活动矩阵也可用于生成对执法活动具有实用和使用价值的可操作的信息。正如这个术语可能暗示的那样,活动矩阵关注的是人们基于地点的行动和活动。而关联矩阵关注的是人,组织和/或属性之间的关系。这两类矩阵之间的区别虽然很小,但就已完成或经过分析的数据来看,区别还是很重要的。 |
| | | |
| | | |
第232行: |
第232行: |
| With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks). | | With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks). |
| | | |
− | 由于大量数据和信息以电子形式存储,用户可能会面临拥有多种不相关的信息来源却不知如何分析的难题。需要使用数据分析技术,以便有效和高效地利用数据。Palshikar 将数据分析技术分为两大类(统计模型、时间序列分析、聚类分类、异常检测匹配算法)和人工智能(AI)技术(数据挖掘、专家系统、模式识别、机器学习技术、神经网络)。
| + | 由于大量数据和信息以电子形式存储,用户可能会面临拥有多种不相关的信息来源却不知如何分析的难题。数据分析技术的使用可以帮助有效和高效地利用数据。Palshikar 将数据分析技术分为两大类(统计模型、时间序列分析、聚类分类、异常检测匹配算法)和人工智能(AI)技术(数据挖掘、专家系统、模式识别、机器学习技术、神经网络)。 |
| | | |
| | | |
第240行: |
第240行: |
| Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood. | | Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood. |
| | | |
− | Bolton & Hand 将统计数据分析定义为有监督或无监督的方法。监督式学习方法要求在系统中有明确的规则来指出什么是预期行为,什么是意外行为。非监督式学习方法在审视数据时,通过将数据与正常值的比较,来发现统计异常值。监督式学习方法能处理的场景是有限的,因为这种方法需要基于以前的模式建立训练规则。非监督式学习方法可以对更广泛的问题进行检测。但是,如果数据的行为规范没有很好的建立或被机器理解,其结果可能会导致较高的假阳性率(本身不是正常值,但识别为正常值,说明算法预测了“正确”或“有”的判断,但却判断错误了)。 | + | Bolton & Hand 将统计数据分析定义为有监督或无监督的方法。监督式学习方法要求在系统中有明确的规则来指出什么是预期行为,什么是意外行为。非监督式学习方法在审视数据时,通过将数据与正常值的比较,来发现统计异常值。监督式学习方法能处理的场景是有限的,因为这种方法需要基于以前的模式建立训练规则。非监督式学习方法可以对更广泛的问题采取进攻。但是,如果数据的行为规范没有很好的建立或被机器理解,其结果可能会导致较高的假阳性率(本身不是正常值,但识别为正常值,说明算法预测了“正确”或“有”的判断,但却判断错误了)。 |
| | | |
| | | |
第290行: |
第290行: |
| Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred. | | Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred. |
| | | |
− | 另外,Picarelli认为,使用链路分析技术可以用来查明并有可能防止奥姆真理教的非法活动。“我们必须小心‘牵连犯罪’。与恐怖分子有联系并不能证明有罪——但确实得进行调查。” 在审查较为敏感的数据以防止尚未发生的犯罪或非法活动时,如何同时不违背合理依据、隐私权和结社自由等法律概念将变得困难。 | + | 另外,Picarelli认为,使用链路分析技术可以用来查明并有可能防止奥姆真理教的非法活动。“我们必须小心‘牵连犯罪’。与恐怖分子有联系并不能证明有罪——但确实得进行调查。” 在审查较为敏感的数据以防止尚未发生的犯罪或非法活动时,如何同时不违背合理依据、隐私权和结社自由等法律概念将变得很困难。 |
| | | |
| | | |