链路分析

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
Moonscar讨论 | 贡献2020年5月12日 (二) 19:09的版本 (Moved page from wikipedia:en:Link analysis (history))
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳到导航 跳到搜索

此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。

In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity (fraud detection, counterterrorism, and intelligence), computer security analysis, search engine optimization, market research, medical research, and art.

In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity (fraud detection, counterterrorism, and intelligence), computer security analysis, search engine optimization, market research, medical research, and art.

在网络理论中,链路分析是一种数据分析技术,用于评估节点之间的关系(连接)。可以识别各种类型的节点(对象)之间的关系,包括组织、人员和事务。链接分析被用于犯罪活动的调查(欺诈侦查、反恐和情报)、计算机安全分析、搜索引擎优化安全、市场调查、医学研究和艺术。


Knowledge discovery

Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data.[1] Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):[2]

Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):

知识发现是一个迭代和交互的过程,用于识别、分析和可视化数据中的模式。网络分析、链接分析和社会网络分析都是知识发现的方法,每一种都是先验方法的一个子集。大多数知识发现方法遵循以下步骤(在最高级别) :


  1. Data processing
Data processing

数据处理

  1. Transformation
Transformation

转变

  1. Analysis
Analysis

分析

  1. Visualization
Visualization

可视化


Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search.

Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search.

数据收集和处理需要访问数据,并存在一些固有的问题,包括信息超载和数据错误。一旦数据被收集,它将需要转换成一种人和计算机分析程序都能有效使用的格式。手工或计算机生成的可视化工具可以根据数据进行映射,包括网络图。有几种算法可以帮助分析数据-Dijkstra 的算法,广度优先搜索和深度优先搜索。


Link analysis focuses on analysis of relationships among nodes through visualization methods (network charts, association matrix). Here is an example of the relationships that may be mapped for crime investigations:[3]

Link analysis focuses on analysis of relationships among nodes through visualization methods (network charts, association matrix). Here is an example of the relationships that may be mapped for crime investigations:

链路分析主要通过可视化方法(网络图、关联矩阵)分析节点之间的关系。这里有一个关系的例子,可以为犯罪调查绘制地图:


{ | class“ wikitable”
Relationship/Network Data Sources Relationship/Network Data Sources 人际关系 / 网络! !数据来源
1. Trust Prior contacts in family, neighborhood, school, military, club or organization. Public and court records. Data may only be available in suspect's native country. 1. Trust Prior contacts in family, neighborhood, school, military, club or organization. Public and court records. Data may only be available in suspect's native country. 在家庭、社区、学校、军队、俱乐部或组织中有先前的联系。公开及法庭纪录。数据可能只能在嫌疑人的本国获得。
2. Task Logs and records of phone calls, electronic mail, chat rooms, instant messages, Web site visits. Travel records. Human intelligence: observation of meetings and attendance at common events. 2. Task Logs and records of phone calls, electronic mail, chat rooms, instant messages, Web site visits. Travel records. Human intelligence: observation of meetings and attendance at common events. | 电话、电子邮件、聊天室、即时消息、网站访问的日志和记录。出入境纪录。人类智慧: 观察会议和参加共同活动。
3. Money & Resources Bank account and money transfer records. Pattern and location of credit card use. Prior court records. Human intelligence: observation of visits to alternate banking resources such as Hawala. 3. Money & Resources Bank account and money transfer records. Pattern and location of credit card use. Prior court records. Human intelligence: observation of visits to alternate banking resources such as Hawala. 银行账户和汇款记录。信用卡使用模式及地点。以前的法庭记录。人类智慧: 观察访问其他银行资源,如 Hawala。
4. Strategy & Goals Web sites. Videos and encrypted disks delivered by courier. Travel records. Human intelligence: observation of meetings and attendance at common events. 4. Strategy & Goals Web sites. Videos and encrypted disks delivered by courier. Travel records. Human intelligence: observation of meetings and attendance at common events. 网站。视频和加密光盘由快递公司递送。出入境纪录。人类智慧: 观察会议和参加共同活动。

|}


Link analysis is used for 3 primary purposes:[4]

Link analysis is used for 3 primary purposes:

链接分析主要用于三个目的:


  1. Find matches in data for known patterns of interest;
Find matches in data for known patterns of interest;

在已知的兴趣模式的数据中寻找匹配项;

  1. Find anomalies where known patterns are violated;
Find anomalies where known patterns are violated;

发现违反已知模式的异常;

  1. Discover new patterns of interest (social network analysis, data mining).
Discover new patterns of interest (social network analysis, data mining).

发现感兴趣的新模式(社会网络分析、数据挖掘)。


History

Klerks categorized link analysis tools into 3 generations.[5] The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris.[6] This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data.

Klerks categorized link analysis tools into 3 generations. The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris. This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data.Association Matrix

把链接分析工具分为三代。第一代是在1975年被引入的,名字是哈珀和哈里斯的 Anacpapa Chart。这种方法需要一个领域专家审查数据文件,通过构造一个关联矩阵来识别关联,创建一个可视化的链接图,最后通过分析网络图来识别感兴趣的模式。这种方法需要广泛的领域知识,并且在审查大量数据时非常耗时


In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered.[7][8][9][10]

In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered.

除了关联矩阵外,活动矩阵还可用于生成可操作的信息,对执法具有实用价值和使用价值。活动矩阵,正如这个术语可能暗示的那样,集中于人们相对于地点的行动和活动。而关联矩阵关注的是人、组织和 / 或属性之间的关系。这两类矩阵之间的区别虽然很小,但就已完成或提供的分析的输出而言却很重要。


Second generation tools consist of automatic graphics-based analysis tools such as IBM i2 Analyst’s Notebook, Netmap, ClueMaker and Watson. These tools offer the ability to automate the construction and updates of the link chart once an association matrix is manually created, however, analysis of the resulting charts and graphs still requires an expert with extensive domain knowledge.

Second generation tools consist of automatic graphics-based analysis tools such as IBM i2 Analyst’s Notebook, Netmap, ClueMaker and Watson. These tools offer the ability to automate the construction and updates of the link chart once an association matrix is manually created, however, analysis of the resulting charts and graphs still requires an expert with extensive domain knowledge.

第二代工具包括基于图形的自动分析工具,如 IBM i 2 Analyst’ s Notebook、 Netmap、 ClueMaker 和 Watson。这些工具提供了自动构建和更新链接图表的能力,一旦手动创建了关联矩阵,然而,对结果图表和图表的分析仍然需要具有广泛领域知识的专家。


The third generation of link-analysis tools like DataWalk allow the automatic visualization of linkages between elements in a data set, that can then serve as the canvas for further exploration or manual updates.

The third generation of link-analysis tools like DataWalk allow the automatic visualization of linkages between elements in a data set, that can then serve as the canvas for further exploration or manual updates.

像 DataWalk 这样的第三代链接分析工具允许数据集中元素之间的链接自动可视化,然后可以作为进一步探索或手动更新的画布。


Applications

  • Iowa State Sex Crimes Analysis System
  • Minnesota State Sex Crimes Analysis System (MIN/SCAP)
  • Washington State Homicide Investigation Tracking System (HITS)[11]
  • New York State Homicide Investigation & Lead Tracking (HALT)
  • New Jersey Homicide Evaluation & Assessment Tracking (HEAT)[12]
  • Pennsylvania State ATAC Program.
  • Violent Crime Linkage Analysis System (ViCLAS)[13]


Issues with link analysis

Information overload

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks).[14]

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks).

由于大量数据和信息以电子方式存储,用户面临着可用于分析的多种不相关的信息来源。需要使用数据分析技术,以便有效和高效地利用数据。Palshikar 将数据分析技术分为两大类(统计模型、时间序列分析、聚类分类、异常检测匹配算法)和人工智能(AI)技术(数据挖掘、专家系统、模式识别、机器学习技术、神经网络)。


Bolton & Hand define statistical data analysis as either supervised or unsupervised methods.[15] Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood.

Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood.

Bolton & Hand 将统计数据分析定义为有监督或无监督的方法。监督式学习方法要求在系统中定义规则,以建立预期或意外的行为。非监督式学习方法检查数据与正常值的比较,并发现统计异常值。监督式学习方法在可以处理的场景中是有限的,因为这种方法需要基于以前的模式建立训练规则。非监督式学习检测方法可以检测更广泛的问题,但是,如果行为规范没有得到很好的建立或理解,可能会导致较高的假阳性率。


Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”.[4] Sparrow[16] highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.[3]

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.

数据本身存在固有的问题,包括完整性(或缺乏)和持续的更改。数据可能包含”由于错误的收集或处理,以及当实体积极试图欺骗和 / 或隐瞒其行为而造成的遗漏和委托错误”。强调数据分析的三个主要问题是不完整性(缺失数据或链接的必然性)、模糊边界(决定包括什么的主观性)和动态变化(认识到数据是不断变化的)。


Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts.[17] Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.[18]

Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts. Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.

一旦数据转换成可用的格式,开放纹理和交叉引用问题就会出现。韦斯曼将开放结构定义为在不同语境中使用经验词汇时不可避免的意义不确定性。当试图从多个来源搜索和交叉引用数据时,术语含义的不确定性带来了问题。


The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”.[19] Even using domain experts may result in differing conclusions as analysis may be subjective.

The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.

解决数据分析问题的主要方法是依赖专家的领域知识。这是一个进行链路分析的非常耗时和昂贵的方法,并且有其自身固有的问题。麦格拉斯等人。得出结论,网络图的布局和表示方式对用户的“对网络中群体存在的感知”有重大影响。即使使用领域专家也可能导致不同的结论,因为分析可能是主观的。


Prosecution vs. crime prevention

Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.

Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.

链接分析技术主要用于起诉,因为回顾历史数据以获得模式要比预测未来的行动容易得多。


Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks.[3] Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.

Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks. Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.

Krebs 通过绘制袭击后公布的详细资料,演示了与9月11日袭击事件的19名劫机者有关的恐怖主义网络的关联矩阵和链接图。即使有事后诸葛亮的优势,以及关于人员、地点和交易的公开可用信息,很明显仍然缺少数据。


Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network.[20] “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.”[3] Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.

Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.

另外,Picarelli 认为,使用链接分析技术可以用来查明并有可能防止奥姆真理教网络内的非法活动。“我们必须小心‘连带犯罪’。与恐怖分子有联系并不能证明有罪——但确实值得调查。” 在审查可能敏感的数据以防止尚未发生的犯罪或非法活动时,如何平衡可能的原因、隐私权和结社自由等法律概念变得困难。


Proposed solutions

There are four categories of proposed link analysis solutions:[21]

There are four categories of proposed link analysis solutions:

有四类拟议的链接分析解决方案:


  1. Heuristic-based
Heuristic-based

基于启发式的

  1. Template-based
Template-based

基于模板的

  1. Similarity-based
Similarity-based

基于相似性的

  1. Statistical
Statistical

统计资料


Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.

Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.

基于启发式的工具利用结构化数据从专家知识中提取的决策规则。基于模板的工具使用自然语言处理(Natural Language Processing,NLP)从非结构化数据中提取与预定义模板匹配的细节。基于相似度的方法使用加权评分来比较属性和识别潜在的链接。统计方法基于词汇统计识别潜在的链接。


CrimeNet explorer

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer.[22] This framework includes the following elements:

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer. This framework includes the following elements:

和 h. Chen 提出了一个自动化网络分析和可视化的框架,叫做 CrimeNet Explorer。这一框架包括以下内容:

  • Network Creation through a concept space approach that uses “co-occurrence weight to measure the frequency with which two words or phrases appear in the same document. The more frequently two words or phrases appear together, the more likely it will be that they are related”.[22]
  • Network Partition using “hierarchical clustering to partition a network into subgroups based on relational strength”.[22]
  • Structural Analysis through “three centrality measures (degree, betweenness, and closeness) to identify central members in a given subgroup.[22] CrimeNet Explorer employed Dijkstra’s shortest-path algorithm to calculate the betweenness and closeness from a single node to all other nodes in the subgroup.


References

!-参见 http: / / en.wikipedia. org / wiki / wikipedia: 如何使用标记创建引用的脚注,这些标记将自动出现在这里 --

  1. Inc., The Tor Project. "Tor Project: Overview".
  2. Ahonen, H., Features of Knowledge Discovery Systems.
  3. 3.0 3.1 3.2 3.3 Krebs, V. E. 2001, Mapping networks of terrorist cells -{zh-cn:互联网档案馆; zh-tw:網際網路檔案館; zh-hk:互聯網檔案館;}-存檔,存档日期2011-07-20., Connections 24, 43–52.
  4. Klerks, P. (2001). "The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? Recent developments in the Netherlands". Connections. 24: 53–65. CiteSeerX 10.1.1.129.4720.
  5. Harper and Harris, The Analysis of Criminal Intelligence, Human Factors and Ergonomics Society Annual Meeting Proceedings, 19(2), 1975, pp. 232-238.
  6. Pike, John. "FMI 3-07.22 Appendix F Intelligence Analysis Tools and Indicators".
  7. Social Network Analysis and Other Analytical Tools -{zh-cn:互联网档案馆; zh-tw:網際網路檔案館; zh-hk:互聯網檔案館;}-存檔,存档日期2014-03-08.
  8. MSFC, Rebecca Whitaker (10 July 2009). "Aeronautics Educator Guide - Activity Matrices".
  9. Personality/Activity Matrix -{zh-cn:互联网档案馆; zh-tw:網際網路檔案館; zh-hk:互聯網檔案館;}-存檔,存档日期2014-03-08.
  10. "Archived copy". Archived from the original on 2010-10-21. Retrieved 2010-10-31.{{cite web}}: CS1 maint: archived copy as title (link)
  11. "Archived copy". Archived from the original on 2009-03-25. Retrieved 2010-10-31.{{cite web}}: CS1 maint: archived copy as title (link)
  12. "Archived copy". Archived from the original on 2010-12-02. Retrieved 2010-10-31.{{cite web}}: CS1 maint: archived copy as title (link)
  13. Palshikar, G. K., The Hidden Truth, Intelligent Enterprise, May 2002.
  14. Bolton, R. J. & Hand, D. J., Statistical Fraud Detection: A Review, Statistical Science, 2002, 17(3), pp. 235-255.
  15. Sparrow M.K. 1991. Network Vulnerabilities and Strategic Intelligence in Law Enforcement’, International Journal of Intelligence and Counterintelligence Vol. 5 #3.
  16. Friedrich Waismann, Verifiability (1945), p.2.
  17. Lyons, D., Open Texture and the Possibility of Legal Interpretation (2000).
  18. McGrath, C., Blythe, J., Krackhardt, D., Seeing Groups in Graph Layouts.
  19. Picarelli, J. T., Transnational Threat Indications and Warning: The Utility of Network Analysis, Military and Intelligence Analysis Group.
  20. Schroeder et al., Automated Criminal Link Analysis Based on Domain Knowledge, Journal of the American Society for Information Science and Technology, 58:6 (842), 2007.
  21. 22.0 22.1 22.2 22.3 Xu, J.J. & Chen, H., CrimeNet Explorer: A Framework for Criminal Network Knowledge Discovery, ACM Transactions on Information Systems, 23(2), April 2005, pp. 201-226.


External links

  • Bartolini, I; Ciaccia, P.. Imagination: Accurate Image Annotation Using Link-Analysis Techniques. 

Category:Network theory

范畴: 网络理论


This page was moved from wikipedia:en:Link analysis. Its edit history can be viewed at 链路分析/edithistory