# 链路分析

In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity (fraud detection, counterterrorism, and intelligence), computer security analysis, search engine optimization, market research, medical research, and art.

In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity (fraud detection, counterterrorism, and intelligence), computer security analysis, search engine optimization, market research, medical research, and art.

## 知识发现

Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data.[1] Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):[2]

Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):

``` --趣木木（讨论）遇到拿不准的专业名词翻译 可从cnki翻译助手检索一下该名词，下为链接http://dict.cnki.net/dict_result.aspx  选取引用量较高的释义进行翻译 譬如Knowledge Discovery 可以暂时定为知识发现  后续专家审校进行反馈再核实其准不准确
```

1. Data processing
```Data processing
```

1. Transformation
```Transformation
```

1. Analysis
```Analysis
```

1. Visualization
```Visualization
```

``` --趣木木（讨论）这里很棒 补充出“数据”这个限定词
```

Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search.

Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search.

Link analysis focuses on analysis of relationships among nodes through visualization methods (network charts, association matrix). Here is an example of the relationships that may be mapped for crime investigations:[5]

Link analysis focuses on analysis of relationships among nodes through visualization methods (network charts, association matrix). Here is an example of the relationships that may be mapped for crime investigations:

{ | class“ wikitable”
Relationship/Network Data Sources Relationship/Network Data Sources 关系/网络 数据来源
1. Trust Prior contacts in family, neighborhood, school, military, club or organization. Public and court records. Data may only be available in suspect's native country. 1. Trust Prior contacts in family, neighborhood, school, military, club or organization. Public and court records. Data may only be available in suspect's native country. 1.信任 嫌疑人在家庭、社区、学校、军队、俱乐部或组织中已有的关系；公开信息及法庭纪录；只能在嫌疑人本国使用的数据。
2. Task Logs and records of phone calls, electronic mail, chat rooms, instant messages, Web site visits. Travel records. Human intelligence: observation of meetings and attendance at common events. 2. Task Logs and records of phone calls, electronic mail, chat rooms, instant messages, Web site visits. Travel records. Human intelligence: observation of meetings and attendance at common events. 2.任务 电话、电子邮件、聊天室、即时消息、网站访问的日志和记录；出入境纪录；人工智能: 会议和出席公共活动的观察报告。
3. Money & Resources Bank account and money transfer records. Pattern and location of credit card use. Prior court records. Human intelligence: observation of visits to alternate banking resources such as Hawala. 3. Money & Resources Bank account and money transfer records. Pattern and location of credit card use. Prior court records. Human intelligence: observation of visits to alternate banking resources such as Hawala. 3.资金和资源 银行账户和汇款记录；信用卡使用地点及使用习惯；以前的法庭记录。人工智能: 访问其他银行资源（如 Hawala）的观察报告。

4. Strategy & Goals Web sites. Videos and encrypted disks delivered by courier. Travel records. Human intelligence: observation of meetings and attendance at common events. 4. Strategy & Goals Web sites. Videos and encrypted disks delivered by courier. Travel records. Human intelligence: observation of meetings and attendance at common events. 4.策略与目标 网站；由快递公司递送的视频和加密光盘；出入境纪录；人工智能: 会议和出席公共活动的观察报告。

|}

``` --Vicky（讨论）observation 翻译为 观察报告 是不是更好？
```
``` --Vicky（讨论）hawala: 哈瓦拉是独立于传统银行金融渠道的非正统、非主流的汇款系统，所以可以译为其他银行资源
```

Link analysis is used for 3 primary purposes:[6]

Link analysis is used for 3 primary purposes:

1. Find matches in data for known patterns of interest;
```Find matches in data for known patterns of interest;
```

1. Find anomalies where known patterns are violated;
```Find anomalies where known patterns are violated;
```

1. Discover new patterns of interest (social network analysis, data mining).
```Discover new patterns of interest (social network analysis, data mining).
```

## 历史

Klerks categorized link analysis tools into 3 generations.[7] The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris.[8] This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data.

``` --趣木木（讨论）点击编辑后会发现[[File: 开头关于图片的语法  该图注需要进行翻译且遵循格式【图1：英文原文＋翻译内容】
```

Klerks categorized link analysis tools into 3 generations. The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris. This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data.

Klerks把链路分析工具分为三代。[9]第一代是由哈珀 Harper和哈里斯 Harris在1975年引入的 阿纳卡帕图 Anacpapa Chart[10]这种方法需要一个领域专家来查看数据文件，通过构造一个关联矩阵来识别关联，然后创建一个用于可视化的链路图，最后通过分析网络图来识别兴趣模式。这种方法需要广泛的领域知识，且因要审查大量数据，所以非常耗时。

In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered.[11][12][13][14]

In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered.

Second generation tools consist of automatic graphics-based analysis tools such as IBM i2 Analyst’s Notebook, Netmap, ClueMaker and Watson. These tools offer the ability to automate the construction and updates of the link chart once an association matrix is manually created, however, analysis of the resulting charts and graphs still requires an expert with extensive domain knowledge.

Second generation tools consist of automatic graphics-based analysis tools such as IBM i2 Analyst’s Notebook, Netmap, ClueMaker and Watson. These tools offer the ability to automate the construction and updates of the link chart once an association matrix is manually created, however, analysis of the resulting charts and graphs still requires an expert with extensive domain knowledge.

The third generation of link-analysis tools like DataWalk allow the automatic visualization of linkages between elements in a data set, that can then serve as the canvas for further exploration or manual updates.

The third generation of link-analysis tools like DataWalk allow the automatic visualization of linkages between elements in a data set, that can then serve as the canvas for further exploration or manual updates.

## 应用

• Iowa State Sex Crimes Analysis System

• Minnesota State Sex Crimes Analysis System (MIN/SCAP)

• Washington State Homicide Investigation Tracking System (HITS)[19]

• New York State Homicide Investigation & Lead Tracking (HALT)

• New Jersey Homicide Evaluation & Assessment Tracking (HEAT)[21]

• Pennsylvania State ATAC Program.

• Violent Crime Linkage Analysis System (ViCLAS)[23]

## 链路分析的问题

### 信息过载

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks).[25]

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks).

Bolton & Hand define statistical data analysis as either supervised or unsupervised methods.[27] Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood.

Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood.

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”.[6] Sparrow[29] highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.[5]

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. Sparrow highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.

``` --Vicky（讨论）标绿这句翻译大体上没有问题，表达可以更简练一点，因收集或处理不当，以及试图欺骗或隐瞒其行为而造成的遗漏和委托错误。
```

Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts.[30] Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.[31]

Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts. Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.

The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”.[34] Even using domain experts may result in differing conclusions as analysis may be subjective.

The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.

### Prosecution vs. crime prevention

Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.

Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.

Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks.[5] Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.

Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks. Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.

``` --趣木木（讨论）即使有事后诸葛亮的优势  如果进行了意译应标注讨论出来
```

--WildBoar讨论）或许可以翻译成后见之明？

Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network.[36] “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.”[5] Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.

Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.

## Proposed solutions

There are four categories of proposed link analysis solutions:[38]

There are four categories of proposed link analysis solutions:

1. Heuristic-based
```Heuristic-based
```

1. Template-based
```Template-based
```

1. Similarity-based
```Similarity-based
```

1. Statistical
```Statistical
```

Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.

Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.

### CrimeNet explorer

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer.[40] This framework includes the following elements:

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer. This framework includes the following elements:

J.J.Xu和H.Chen 提出了一个自动化网络分析和可视化的框架，叫做 CrimeNet Explorer。[40] 这一框架包括以下内容:

• Network Creation through a concept space approach that uses “co-occurrence weight to measure the frequency with which two words or phrases appear in the same document. The more frequently two words or phrases appear together, the more likely it will be that they are related”.[40]

• Network Partition using “hierarchical clustering to partition a network into subgroups based on relational strength”.[40]

• Structural Analysis through “three centrality measures (degree, betweenness, and closeness) to identify central members in a given subgroup.[40] CrimeNet Explorer employed Dijkstra’s shortest-path algorithm to calculate the betweenness and closeness from a single node to all other nodes in the subgroup.

## 参考

!-参见 http: / / en.wikipedia. org / wiki / wikipedia: 如何使用标记创建引用的脚注，这些标记将自动出现在这里 --

1. Inc., The Tor Project. "Tor Project: Overview".
2. Ahonen, H., Features of Knowledge Discovery Systems.
3. Inc., The Tor Project. "Tor Project: Overview".
4. Ahonen, H., Features of Knowledge Discovery Systems.
5. Krebs, V. E. 2001, Mapping networks of terrorist cells -{zh-cn:互联网档案馆; zh-tw:網際網路檔案館; zh-hk:互聯網檔案館;}-存檔，存档日期2011-07-20., Connections 24, 43–52.
6. Klerks, P. (2001). "The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? Recent developments in the Netherlands". Connections. 24: 53–65. CiteSeerX 10.1.1.129.4720.
7. Harper and Harris, The Analysis of Criminal Intelligence, Human Factors and Ergonomics Society Annual Meeting Proceedings, 19(2), 1975, pp. 232-238.
8. Klerks, P. (2001). "The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? Recent developments in the Netherlands". Connections. 24: 53–65. CiteSeerX 10.1.1.129.4720.
9. Harper and Harris, The Analysis of Criminal Intelligence, Human Factors and Ergonomics Society Annual Meeting Proceedings, 19(2), 1975, pp. 232-238.
10. MSFC, Rebecca Whitaker (10 July 2009). "Aeronautics Educator Guide - Activity Matrices".
11. MSFC, Rebecca Whitaker (10 July 2009). "Aeronautics Educator Guide - Activity Matrices".
12. "Archived copy". Archived from the original on 2010-10-21. Retrieved 2010-10-31.CS1 maint: archived copy as title (link)
13. "Archived copy". Archived from the original on 2010-10-21. Retrieved 2010-10-31.CS1 maint: archived copy as title (link)
14. "Archived copy". Archived from the original on 2009-03-25. Retrieved 2010-10-31.CS1 maint: archived copy as title (link)
15. "Archived copy". Archived from the original on 2009-03-25. Retrieved 2010-10-31.CS1 maint: archived copy as title (link)
16. "Archived copy". Archived from the original on 2010-12-02. Retrieved 2010-10-31.CS1 maint: archived copy as title (link)
17. "Archived copy". Archived from the original on 2010-12-02. Retrieved 2010-10-31.CS1 maint: archived copy as title (link)
18. Palshikar, G. K., The Hidden Truth, Intelligent Enterprise, May 2002.
19. Palshikar, G. K., The Hidden Truth, Intelligent Enterprise, May 2002.
20. Bolton, R. J. & Hand, D. J., Statistical Fraud Detection: A Review, Statistical Science, 2002, 17(3), pp. 235-255.
21. Bolton, R. J. & Hand, D. J., Statistical Fraud Detection: A Review, Statistical Science, 2002, 17(3), pp. 235-255.
22. Sparrow M.K. 1991. Network Vulnerabilities and Strategic Intelligence in Law Enforcement’, International Journal of Intelligence and Counterintelligence Vol. 5 #3.
23. Friedrich Waismann, Verifiability (1945), p.2.
24. Friedrich Waismann, Verifiability (1945), p.2.
25. McGrath, C., Blythe, J., Krackhardt, D., Seeing Groups in Graph Layouts.
26. McGrath, C., Blythe, J., Krackhardt, D., Seeing Groups in Graph Layouts.
27. Schroeder et al., Automated Criminal Link Analysis Based on Domain Knowledge, Journal of the American Society for Information Science and Technology, 58:6 (842), 2007.
28. Schroeder et al., Automated Criminal Link Analysis Based on Domain Knowledge, Journal of the American Society for Information Science and Technology, 58:6 (842), 2007.
29. Xu, J.J. & Chen, H., CrimeNet Explorer: A Framework for Criminal Network Knowledge Discovery, ACM Transactions on Information Systems, 23(2), April 2005, pp. 201-226.