更改

链路分析 (查看源代码)

2020年8月14日 (五) 21:43的版本

添加814字节、 2020年8月14日 (五) 21:43

无编辑摘要

第5行：第5行：

In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity (fraud detection, counterterrorism, and intelligence), computer security analysis, search engine optimization, market research, medical research, and art.

−

~~在网络理论中，链路分析是一种用于评估节点之间关系~~(连接)~~的数据分析技术。该技术可以鉴别各种类型节点~~(对象)之间的关系，包括组织、人群和市场交易双方。链路分析已被应用于诸多领域，如打击犯罪活动(如欺诈侦查、反恐和情报)、计算机安全分析、搜索引擎优化、市场调查、医学研究和艺术。

+

在''' 网络理论 Network Theory'''中，''' 链路分析 Link Analysis'''是一种用于评估节点之间关系(连接)的''' 数据分析 Data Analysis'''技术。该技术可以鉴别各种类型节点(对象)之间的关系，包括组织、人群和市场交易双方。链路分析已被应用于诸多领域，如打击犯罪活动(如欺诈侦查、反恐和情报)、计算机安全分析、搜索引擎优化、市场调查、医学研究和艺术。

第15行：第15行：

Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):

−

知识的发现，是指不断地识别、分析和可视化数据中的内在模式，这是一个持续迭代和交互的过程。网络分析、链路分析和社会网络分析都是知识发现的方法，它们都是属于''' 先验方法'''。大多数知识发现方法遵循以下几个步骤('''在最高级别''') :

+

''' 知识的发现 Knowledge Discovery'''，是指不断地识别、分析和可视化数据中的内在模式，这是一个持续迭代和交互的过程。网络分析、链路分析和'''社会网络分析 Social Network Analysis'''都是知识发现的方法，它们都是属于''' 先验方法'''。大多数知识发现方法遵循以下几个步骤('''在最高级别''') :

第49行：第49行：

Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search.

−

数据的收集和处理是首先进行的过程，但此过程存在一些固有的问题，包括信息超载和数据错误等。在数据被收集后，它将转换成一种人和计算机分析程序都能有效使用的格式。之后基于数据，可使用计算机生成的或人工操作的可视化工具进行作图（如网络图）。目前有几种算法可以帮助人类进行数据分析-~~Dijkstra算法，广度优先搜索和深度优先搜索。~~

+

数据的收集和处理是首先进行的过程，但此过程存在一些固有的问题，包括'''信息超载 Information Overload'''和数据错误等。在数据被收集后，它将转换成一种人和计算机分析程序都能有效使用的格式。之后基于数据，可使用计算机生成的或人工操作的可视化工具进行作图（如网络图）。目前有几种算法可以帮助人类进行数据分析-'''Dijkstra算法'''，'''广度优先搜索 Breadth-First Search'''和''' 深度优先搜索 Depth-First Search'''。

第240行：第240行：

Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood.

−

Bolton & Hand 将统计数据分析定义为有监督或无监督的方法。监督式学习方法要求在系统中有明确的规则来指出什么是预期行为，什么是意外行为。非监督式学习方法在审视数据时，通过将数据与正常值的比较，来发现统计异常值。监督式学习方法能处理的场景是有限的，因为这种方法需要基于以前的模式建立训练规则。非监督式学习方法可以对更广泛的问题采取进攻。但是，如果数据的行为规范没有很好的建立或被机器理解，其结果可能会导致较高的假阳性率（本身不是正常值，但识别为正常值，说明算法预测了“正确”或“有”的判断，但却判断错误了）。

+

Bolton & Hand 将统计数据分析定义为有监督或无监督的方法。'''监督式学习方法 Supervised Learning Methods'''要求在系统中有明确的规则来指出什么是预期行为，什么是意外行为。'''非监督式学习方法 Unsupervised Learning Methods'''在审视数据时，通过将数据与正常值的比较，来发现统计异常值。监督式学习方法能处理的场景是有限的，因为这种方法需要基于以前的模式建立训练规则。非监督式学习方法可以对更广泛的问题采取进攻。但是，如果数据的行为规范没有很好的建立或被机器理解 --[[用户:Ryan|Ryan]]（[[用户讨论:Ryan|讨论]]）该句存疑，其结果可能会导致较高的假阳性率（本身不是正常值，但识别为正常值，说明算法预测了“正确”或“有”的判断，但却判断错误了）。

第248行：第248行：

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. Sparrow highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.

−

数据本身存在固有的问题，包括完整性(或缺失性)和持续的改变。数据可能包含“由于错误的收集或处理，以及当实体积极试图欺骗和 / ~~或隐瞒其行为，而造成的错误的遗漏和委托”。Sparro强调了数据分析中三个主要的问题，不完整性~~(数据或链接缺失的必然性)、模糊边界(边界确定的主观性)和动态变化(数据的持续变化性)。

+

数据本身存在固有的问题，包括完整性(或缺失性)和持续的改变。数据可能包含“由于错误的收集或处理，以及当实体积极试图欺骗和 / 或隐瞒其行为，而造成的错误的遗漏和委托” --[[用户:Ryan|Ryan]]（[[用户讨论:Ryan|讨论]]）该句存疑。Sparro强调了数据分析中三个主要的问题，不完整性(数据或链接缺失的必然性)、模糊边界(边界确定的主观性)和动态变化(数据的持续变化性)。

第264行：第264行：

The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.

−

目前，解决数据分析中这些问题的主要方法是依赖专家的领域知识。如此进行链路分析是非常耗时和昂贵的，并且无法排除其自身固有的问题。麦格拉斯等人得出结论，网络图的分布和表示方式对用户的“对存在在网络中群体的感知”有重大影响。即使是领域内的专家也可能导致不同的结论，因为分析可能是很主观的。

+

目前，解决数据分析中这些问题的主要方法是依赖专家的'''领域知识 Domain Knowledge'''。如此进行链路分析是非常耗时和昂贵的，并且无法排除其自身固有的问题。麦格拉斯等人得出结论，网络图的分布和表示方式对用户的“对存在在网络中群体的感知”有重大影响。即使是领域内的专家也可能导致不同的结论，因为分析可能是很主观的。

第334行：第334行：

Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.

−

~~基于启发式的工具运用从专家知识中提取出来的决策规则对结构化数据进行操作。基于模板的工具使用自然语言处理(~~Natural Language ~~Processing，NLP)~~从非结构化数据中提取与预定义模板匹配的细节。基于相似度的方法使用加权评分来比较属性和识别潜在的链接。统计方法基于词汇统计识别潜在的链接。

+

基于启发式的工具运用从专家知识中提取出来的决策规则对结构化数据进行操作。基于模板的工具使用'''自然语言处理 Natural Language Processing'''从非结构化数据中提取与预定义模板匹配的细节。基于相似度的方法使用加权评分来比较属性和识别潜在的链接。统计方法基于词汇统计识别潜在的链接。

Ryan

75

个编辑