更改

链路分析 (查看源代码)

2020年12月20日 (日) 23:32的版本

添加4,977字节、 2020年12月20日 (日) 23:32

无编辑摘要

第1行：第1行： −

~~本词条由Ryan初步翻译~~

+

本词条由Ryan初步翻译，已由WildBoar审校

In [[network theory]], '''link analysis''' is a [[data-analysis]] technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including [[organization]]s, [[people]] and [[Financial transaction|transactions]]. Link analysis has been used for investigation of criminal activity ([[fraud detection]], [[counterterrorism]], and [[Intelligence (information gathering)|intelligence]]), [[Computer security|computer security analysis]], [[search engine optimization]], [[market research]], [[medical research]], and art.

第5行：第5行：

In network theory, link analysis is a data-analysis technique used to evaluate relationships (connections) between nodes. Relationships may be identified among various types of nodes (objects), including organizations, people and transactions. Link analysis has been used for investigation of criminal activity (fraud detection, counterterrorism, and intelligence), computer security analysis, search engine optimization, market research, medical research, and art.

−

在''' 网络理论 Network Theory'''中，''' 链路分析 Link Analysis'''~~是一种用于评估节点之间关系(连接)的~~''' 数据分析 Data Analysis'''~~技术。该技术可以鉴别各种类型节点(对象)之间的关系，包括组织、人群和市场交易双方。链路分析已被应用于诸多领域，如打击犯罪活动~~(如欺诈侦查、反恐和情报)、计算机安全分析、搜索引擎优化、市场调查、医学研究和艺术。

+

在''' 网络理论 Network Theory'''中，''' 链路分析 Link Analysis'''是一种用于评估节点之间关系（连接）的''' 数据分析 Data Analysis'''技术。该技术可以鉴别各种类型节点（对象）之间的关系，包括组织、人和金融交易。链路分析已被应用于诸多领域，如打击犯罪活动(如欺诈侦查、反恐和情报)、计算机安全分析、搜索引擎优化、市场调查、医学研究和艺术。

==Knowledge discovery==

−

知识发现

+

==知识发现==

+

[[Knowledge discovery]] is an [[Iteration|iterative]] and [[interactive]] process used to [[Identification (information)|identify]], analyze and visualize patterns in data.<ref>{{cite web|url=https://www.torproject.org/about/overview.html.en|title=Tor Project: Overview|first=The Tor Project|last=Inc.|publisher=}}</ref> Network analysis, link analysis and [[social network analysis]] are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):<ref>Ahonen, H., [http://www.cs.helsinki.fi/u/hahonen/features.txt Features of Knowledge Discovery Systems].</ref>

第16行：第18行：

Knowledge discovery is an iterative and interactive process used to identify, analyze and visualize patterns in data. Network analysis, link analysis and social network analysis are all methods of knowledge discovery, each a corresponding subset of the prior method. Most knowledge discovery methods follow these steps (at the highest level):

−

''' ~~知识的发现~~ Knowledge Discovery'''~~，是指不断地识别、分析和可视化数据中的内在模式，这是一个持续迭代和交互的过程。网络分析、链路分析和~~'''社会网络分析 Social Network Analysis'''都是知识发现的方法，它们都是属于''' 先验方法 Prior Method'''。大多数知识发现方法遵循以下几个步骤('''在最高级别''' at the highest level) :

+

''' 知识发现 Knowledge Discovery'''，是指不断地识别、分析和可视化数据中的内在模式，这是一个持续迭代和交互的过程。<ref>{{cite web|url=https://www.torproject.org/about/overview.html.en|title=Tor Project: Overview|first=The Tor Project|last=Inc.|publisher=}}</ref>网络分析、链路分析和'''社会网络分析 Social Network Analysis'''都是知识发现的方法，它们都是属于''' 先验方法 Prior Method'''。大多数知识发现方法遵循以下几个步骤('''在最高级别''' at the highest level) :<ref>Ahonen, H., [http://www.cs.helsinki.fi/u/hahonen/features.txt Features of Knowledge Discovery Systems].</ref>

--[[用户:趣木木|趣木木]]（[[用户讨论:趣木木|讨论]]）遇到拿不准的专业名词翻译可从cnki翻译助手检索一下该名词，下为链接http://dict.cnki.net/dict_result.aspx 选取引用量较高的释义进行翻译譬如Knowledge Discovery 可以暂时定为知识发现后续专家审校进行反馈再核实其准不准确

第30行：第32行：

Transformation

−

~~数据转变~~

+

数据转换

# [[Data analysis|Analysis]]

第50行：第52行：

Data gathering and processing requires access to data and has several inherent issues, including information overload and data errors. Once data is collected, it will need to be transformed into a format that can be effectively used by both human and computer analyzers. Manual or computer-generated visualizations tools may be mapped from the data, including network charts. Several algorithms exist to help with analysis of data – Dijkstra’s algorithm, breadth-first search, and depth-first search.

−

~~数据的收集和处理是首先进行的过程，但此过程存在一些固有的问题，包括~~'''信息超载 Information Overload'''和数据错误等。在数据被收集后，它将转换成一种人和计算机分析程序都能有效使用的格式。之后基于数据，可使用计算机生成的或人工操作的可视化工具进行作图（如网络图）。目前有几种算法可以帮助人类进行数据分析-'''~~Dijkstra算法~~'''，'''广度优先搜索 Breadth-First Search'''和''' 深度优先搜索 Depth-First Search'''。

+

数据的收集和处理需要访问数据，但此过程存在一些固有的问题，包括'''信息超载 Information Overload'''和数据错误等。在数据被收集后，它将转换成一种人和计算机分析程序都能有效使用的格式。之后基于数据，可使用计算机生成的或人工操作的可视化工具进行作图（如网络图）。目前有几种算法可以帮助人类进行数据分析-'''迪杰斯特算法 Dijkstra’s algorithm'''，'''广度优先搜索 Breadth-First Search'''和''' 深度优先搜索 Depth-First Search'''。

第58行：第60行：

Link analysis focuses on analysis of relationships among nodes through visualization methods (network charts, association matrix). Here is an example of the relationships that may be mapped for crime investigations:

−

链路分析主要通过可视化方法(网络图、关联矩阵)~~分析节点之间的关系。这里有一个基于罪犯和社会各部分关系绘制网图的例子~~:

+

链路分析主要通过可视化方法(网络图、关联矩阵)分析节点之间的关系。这里有一个基于犯罪侦查绘制网图的例子:<ref name=Krebs>Krebs, V. E. 2001, [http://vlado.fmf.uni-lj.si/pub/networks/doc/Seminar/Krebs.pdf Mapping networks of terrorist cells] {{webarchive|url=https://web.archive.org/web/20110720000539/http://vlado.fmf.uni-lj.si/pub/networks/doc/Seminar/Krebs.pdf |date=2011-07-20 }}, Connections 24, 43–52.</ref>

第78行：第80行：

! Relationship/Network !! Data Sources

−

!关系 / 网络! ~~！数据来源~~

+

!关系/网络 !! 数据来源

|-

第90行：第92行：

| 1. Trust || Prior contacts in family, neighborhood, school, military, club or organization. Public and court records. Data may only be available in suspect's native country.

−

| 1.信任 | ~~嫌疑人在家庭、社区、学校、军队、俱乐部或组织中已有的联系。公开信息及法庭纪录。以及只能在嫌疑人本国获得的数据。~~

+

| 1.信任 || 嫌疑人在家庭、社区、学校、军队、俱乐部或组织中已有的关系；公开信息及法庭纪录；只能在嫌疑人本国使用的数据。

|-

第102行：第104行：

| 2. Task || Logs and records of phone calls, electronic mail, chat rooms, instant messages, Web site visits. Travel records. Human intelligence: observation of meetings and attendance at common events.

−

| 2.任务 | | ~~电话、电子邮件、聊天室、即时消息、网站访问的日志和记录。出入境纪录。'''人类智能~~: ~~会议评论和公共活动的出席。'''Human intelligence: observation of meetings and attendance at common events.~~

+

| 2.任务 || 电话、电子邮件、聊天室、即时消息、网站访问的日志和记录；出入境纪录；人工智能: 观察会议和出席公共活动。

|-

第114行：第116行：

| 3. Money & Resources || Bank account and money transfer records. Pattern and location of credit card use. Prior court records. Human intelligence: observation of visits to alternate banking resources such as Hawala.

−

| 3.资金和资源 | ~~银行账户和汇款记录。信用卡使用地点及使用习惯。以前的法庭记录。~~'''~~人类智能~~: 访问其他银行资源的观察，如 Hawala。'''Human intelligence: observation of visits to alternate banking resources such as Hawala.

+

| 3.资金和资源 || 银行账户和汇款记录；信用卡使用地点及使用习惯；以前的法庭记录。'''人工智能: 访问其他银行资源的观察，如 Hawala。'''Human intelligence: observation of visits to alternate banking resources such as Hawala.

第127行：第129行：

| 4. Strategy & Goals || Web sites. Videos and encrypted disks delivered by courier. Travel records. Human intelligence: observation of meetings and attendance at common events.

−

| 4.策略与目标 | ~~网站。由快递公司递送的视频和加密光盘。出入境纪录。'''人类智能~~: ~~会议评论和公共活动的出席。'''Human intelligence: observation of meetings and attendance at common events.~~

+

| 4.策略与目标 || 网站；由快递公司递送的视频和加密光盘；出入境纪录；人工智能: 观察会议和出席公共活动。

|}

第141行：第143行：

Link analysis is used for 3 primary purposes:

−

~~链接分析主要有3个作用~~:

+

链路分析主要有3个作用:<ref name="Link Analysis Workbench">[https://docs.google.com/viewer?a=v&q=cache:R4_7k-udkxMJ:www.fas.org/irp/eprint/law.pdf+%22link+analysis%22&hl=en&gl=uk&pid=bl&srcid=ADGEESjUVRJFP2O2eMGzSuW4WtGgujUHVeuUYDQ9wqOkJUeVMHOwpagPa65ypDM5Fgma2AlwDyFBblZwXNMoYDCHwCQuTQvK-HnmqbW-z5A-MOwMKLy8nk6_uHLa0CiUAql-kAilAYcd&sig=AHIEtbRgGVNE4PMZ0a2vipzlDTd6OLX4fA Link Analysis Workbench], Air Force Research Laboratory Information Directorate, Rome Research Site, Rome, New York, September 2004.</ref>

+

第149行：第152行：

Find matches in data for known patterns of interest;

−

~~在数据中寻找有意义的已知模式~~;

+

在数据中寻找已知兴趣模式的匹配;

# Find anomalies where known patterns are violated;

第161行：第164行：

Discover new patterns of interest (social network analysis, data mining).

−

~~发现有意义的新模式~~(社会网络分析、数据挖掘)。

+

发现新的兴趣模式(社会网络分析、数据挖掘)。

==History==

+

==历史==

+

Klerks categorized link analysis tools into 3 generations.<ref>{{cite journal | last = Klerks | first = P. | year = 2001 | title = The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? Recent developments in the Netherlands | citeseerx = 10.1.1.129.4720 | journal = Connections | volume = 24 | pages = 53–65 }}</ref> The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris.<ref>Harper and Harris, The Analysis of Criminal Intelligence, Human Factors and Ergonomics Society Annual Meeting Proceedings, 19(2), 1975, pp. 232-238.</ref> This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data.

第173行：第179行：

Klerks categorized link analysis tools into 3 generations. The first generation was introduced in 1975 as the Anacpapa Chart of Harper and Harris. This method requires that a domain expert review data files, identify associations by constructing an association matrix, create a link chart for visualization and finally analyze the network chart to identify patterns of interest. This method requires extensive domain knowledge and is extremely time-consuming when reviewing vast amounts of data.

−

Klerks把链接分析工具分为三代。第一代是由哈珀和哈里斯在1975年引入的，阿纳卡帕图。这种方法需要一个领域内的专家来审查数据文件，通过构造一个关联矩阵来识别关联，然后创建一个可视化的链路图，最后通过分析网络图来识别有意义的模式。这种方法需要广泛的领域知识，且因要人工审查大量数据，所以非常耗时。

+

Klerks把链路分析工具分为三代。<ref>{{cite journal | last = Klerks | first = P. | year = 2001 | title = The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? Recent developments in the Netherlands | citeseerx = 10.1.1.129.4720 | journal = Connections | volume = 24 | pages = 53–65 }}</ref>第一代是由哈珀 Harper和哈里斯 Harris在1975年引入的''' 阿纳卡帕图 Anacpapa Chart'''。<ref>Harper and Harris, The Analysis of Criminal Intelligence, Human Factors and Ergonomics Society Annual Meeting Proceedings, 19(2), 1975, pp. 232-238.</ref>这种方法需要一个领域专家来查看数据文件，通过构造一个关联矩阵来识别关联，然后创建一个用于可视化的链路图，最后通过分析网络图来识别兴趣模式。这种方法需要广泛的领域知识，且因要审查大量数据，所以非常耗时。

第181行：第187行：

In addition to the association matrix, the activities matrix can be used to produce actionable information, which has practical value and use to law-enforcement. The activities matrix, as the term might imply, centers on the actions and activities of people with respect to locations. Whereas the association matrix focuses on the relationships between people, organizations, and/or properties. The distinction between these two types of matrices, while minor, is nonetheless significant in terms of the output of the analysis completed or rendered.

−

除了关联矩阵外，活动矩阵也可用于生成对执法活动具有实用和使用价值的可操作的信息。正如这个术语可能暗示的那样，活动矩阵关注的是人们基于地点的行动和活动。而关联矩阵关注的是人，组织和/~~或属性之间的关系。这两类矩阵之间的区别虽然很小，但就已完成或经过分析的数据来看，区别还是很重要的。~~

+

除了关联矩阵外，活动矩阵也可用于生成对执法具有实用和使用价值的可操作的信息。正如这个术语可能暗示的那样，活动矩阵关注的是人们基于地点的行动和活动。而关联矩阵关注的是人，组织和/或属性之间的关系。这两类矩阵之间的区别虽然很小，但就完成或提供的数据来看，这些区别还是很重要的。<ref>{{cite web|url=http://www.globalsecurity.org/military/library/policy/army/fm/3-07-22/app-f.htm|title=FMI 3-07.22 Appendix F Intelligence Analysis Tools and Indicators|first=John|last=Pike|publisher=}}</ref><ref>[https://rdl.train.army.mil/catalog/view/100.ATSC/41449AB4-E8E0-46C4-8443-E4276B6F0481-1274576841878/3-24/appb.htm Social Network Analysis and Other Analytical Tools] {{webarchive|url=https://web.archive.org/web/20140308233614/https://rdl.train.army.mil/catalog/view/100.ATSC/41449AB4-E8E0-46C4-8443-E4276B6F0481-1274576841878/3-24/appb.htm |date=2014-03-08 }}</ref><ref>{{cite web|url=http://www.nasa.gov/audience/foreducators/topnav/materials/listbytype/Aeronautics_Activity_Matrices.html|title=Aeronautics Educator Guide - Activity Matrices|first=Rebecca Whitaker|last=MSFC|date=10 July 2009|publisher=}}</ref><ref>[https://rdl.train.army.mil/catalog/view/100.ATSC/0EF89CA1-2680-4782-B103-D2F5DC941188-1274309335668/7-98-1/chap2l6.htm Personality/Activity Matrix] {{webarchive|url=https://web.archive.org/web/20140308234135/https://rdl.train.army.mil/catalog/view/100.ATSC/0EF89CA1-2680-4782-B103-D2F5DC941188-1274309335668/7-98-1/chap2l6.htm |date=2014-03-08 }}</ref>

第189行：第195行：

Second generation tools consist of automatic graphics-based analysis tools such as IBM i2 Analyst’s Notebook, Netmap, ClueMaker and Watson. These tools offer the ability to automate the construction and updates of the link chart once an association matrix is manually created, however, analysis of the resulting charts and graphs still requires an expert with extensive domain knowledge.

−

第二代工具包括基于图形的自动分析工具，如 IBM i 2 Analyst’ s Notebook、 Netmap、 ClueMaker 和 Watson。在手动创建关联矩阵的情况下，这些工具提供了自动构建和更新链接图表的能力。然而，对结果图表的分析仍然需要在该领域具有广泛知识的专家。

+

第二代工具包括基于图形的自动分析工具，如 IBM i2 Analyst’s Notebook、Netmap、 ClueMaker和Watson。在手动创建关联矩阵的情况下，这些工具提供了自动构建和更新链路图表的能力。然而，对结果图表的分析仍然需要在该领域具有广泛知识的专家。

第197行：第203行：

The third generation of link-analysis tools like DataWalk allow the automatic visualization of linkages between elements in a data set, that can then serve as the canvas for further exploration or manual updates.

−

~~像 DataWalk 这样的第三代链接分析工具能自动可视化数据集中元素之间的链接，其结果可允许进一步的人工探索与改进。~~

+

第三代链路分析工具（例如DataWalk）可以自动显示数据集中元素之间的链路，然后可以在此基础上进一步探索或手动更新、。

−

==Applications==

+

==Applications ==

+

==应用 ==

+

* [[ViCAP|FBI Violent Criminal Apprehension Program (ViCAP)]]

第213行：第222行：

* Washington State Homicide Investigation Tracking System (HITS)<ref>{{cite web|url=http://www.atg.wa.gov/HITS.aspx |title=Archived copy |accessdate=2010-10-31 |url-status=dead |archiveurl=https://web.archive.org/web/20101021005202/http://atg.wa.gov/HITS.aspx |archivedate=2010-10-21 }}</ref>

−

华盛顿州凶杀案调查追踪系统

+

华盛顿州凶杀案调查追踪系统<ref>{{cite web|url=http://www.atg.wa.gov/HITS.aspx |title=Archived copy |accessdate=2010-10-31 |url-status=dead |archiveurl=https://web.archive.org/web/20101021005202/http://atg.wa.gov/HITS.aspx |archivedate=2010-10-21 }}</ref>

+

* New York State Homicide Investigation & Lead Tracking (HALT)

第219行：第229行：

* New Jersey Homicide Evaluation & Assessment Tracking (HEAT)<ref>{{cite web|url=http://www.state.nj.us/njsp/divorg/invest/invest.html |title=Archived copy |accessdate=2010-10-31 |url-status=dead |archiveurl=https://web.archive.org/web/20090325004722/http://www.state.nj.us/njsp/divorg/invest/invest.html |archivedate=2009-03-25 }}</ref>

−

新泽西州凶杀案评估与测评跟踪系统

+

新泽西州凶杀案评估与测评跟踪系统<ref>{{cite web|url=http://www.state.nj.us/njsp/divorg/invest/invest.html |title=Archived copy |accessdate=2010-10-31 |url-status=dead |archiveurl=https://web.archive.org/web/20090325004722/http://www.state.nj.us/njsp/divorg/invest/invest.html |archivedate=2009-03-25 }}</ref>

+

* Pennsylvania State ATAC Program.

第225行：第236行：

* Violent Crime Linkage Analysis System (ViCLAS)<ref>{{cite web|url=http://www.rcmp-grc.gc.ca/tops-opst/bs-sc/viclas-salvac-eng.htm |title=Archived copy |accessdate=2010-10-31 |url-status=dead |archiveurl=https://web.archive.org/web/20101202144141/http://www.rcmp-grc.gc.ca/tops-opst/bs-sc/viclas-salvac-eng.htm |archivedate=2010-12-02 }}</ref>

−

暴力犯罪联系分析系统

+

暴力犯罪联系分析系统<ref>{{cite web|url=http://www.rcmp-grc.gc.ca/tops-opst/bs-sc/viclas-salvac-eng.htm |title=Archived copy |accessdate=2010-10-31 |url-status=dead |archiveurl=https://web.archive.org/web/20101202144141/http://www.rcmp-grc.gc.ca/tops-opst/bs-sc/viclas-salvac-eng.htm |archivedate=2010-12-02 }}</ref>

==Issues with link analysis==

+

==链路分析的问题==

+

===Information overload===

+

===信息过载===

+

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – ([[Statistics|statistical]] [[Models of computation|models]], [[Time series analysis|time-series analysis]], [[Clustering coefficient|clustering]] and [[Statistical classification|classification]], matching algorithms to detect anomalies) and [[Artificial intelligence|artificial intelligence (AI)]] techniques (data mining, [[expert systems]], [[pattern recognition]], [[Machine learning|machine learning techniques]], [[neural network]]s).<ref>Palshikar, G. K., [http://www.intelligententerprise.com//020528/509feat3_1.jhtml The Hidden Truth], Intelligent Enterprise, May 2002.</ref>

第236行：第253行：

With the vast amounts of data and information that are stored electronically, users are confronted with multiple unrelated sources of information available for analysis. Data analysis techniques are required to make effective and efficient use of the data. Palshikar classifies data analysis techniques into two categories – (statistical models, time-series analysis, clustering and classification, matching algorithms to detect anomalies) and artificial intelligence (AI) techniques (data mining, expert systems, pattern recognition, machine learning techniques, neural networks).

−

由于大量数据和信息以电子形式存储，用户可能会面临拥有多种不相关的信息来源却不知如何分析的难题。数据分析技术的使用可以帮助有效和高效地利用数据。Palshikar 将数据分析技术分为两大类(统计模型、时间序列分析、聚类分类、异常检测匹配算法)和人工智能(AI)技术(数据挖掘、专家系统、模式识别、机器学习技术、神经网络)。

+

由于大量数据和信息以电子形式存储，用户可能会面临拥有多种不相关的信息来源却不知如何分析的难题。数据分析技术的使用可以帮助有效和高效地利用数据。帕尔希卡尔 Palshikar将数据分析技术分为两大类(统计模型、时间序列分析、聚类分类、异常检测匹配算法)和人工智能(AI)技术(数据挖掘、专家系统、模式识别、机器学习技术、神经网络)。<ref>Palshikar, G. K., [http://www.intelligententerprise.com//020528/509feat3_1.jhtml The Hidden Truth], Intelligent Enterprise, May 2002.</ref>

+

第244行：第262行：

Bolton & Hand define statistical data analysis as either supervised or unsupervised methods. Supervised learning methods require that rules are defined within the system to establish what is expected or unexpected behavior. Unsupervised learning methods review data in comparison to the norm and detect statistical outliers. Supervised learning methods are limited in the scenarios that can be handled as this method requires that training rules are established based on previous patterns. Unsupervised learning methods can provide detection of broader issues, however, may result in a higher false-positive ratio if the behavioral norm is not well established or understood.

−

Bolton & Hand 将统计数据分析定义为有监督或无监督的方法。'''监督式学习方法 Supervised Learning Methods'''要求在系统中有明确的规则来指出什么是预期行为，什么是意外行为。'''非监督式学习方法 Unsupervised Learning Methods'''在审视数据时，通过将数据与正常值的比较，来发现统计异常值。监督式学习方法能处理的场景是有限的，因为这种方法需要基于以前的模式建立训练规则。非监督式学习方法可以对更广泛的问题采取进攻。但是，如果数据的行为规范没有很好的建立或被机器理解 --[[用户:Ryan|Ryan]]（[[用户讨论:Ryan|讨论]]）该句存疑，其结果可能会导致较高的假阳性率（本身不是正常值，但识别为正常值，说明算法预测了“正确”或“有”的判断，但却判断错误了）。

+

博尔顿 Bolton &汉德 Hand 将统计数据分析定义为有监督或无监督的方法。<ref>Bolton, R. J. & Hand, D. J., Statistical Fraud Detection: A Review, Statistical Science, 2002, 17(3), pp. 235-255.</ref>'''监督式学习方法 Supervised Learning Methods'''要求在系统中有明确的规则来指出什么是预期行为，什么是意外行为。'''非监督式学习方法 Unsupervised Learning Methods'''在审视数据时，通过将数据与正常值的比较，来发现统计异常值。监督式学习方法能处理的场景是有限的，因为这种方法需要基于以前的模式建立训练规则。非监督式学习方法可以检测更广泛的问题。但是，如果数据的行为规范没有很好的建立或被机器理解 --[[用户:Ryan|Ryan]]（[[用户讨论:Ryan|讨论]]）该句存疑，其结果可能会导致较高的假阳性率（本身不是正常值，但识别为正常值，说明算法预测了“正确”或“有”的判断，但却判断错误了）。

−

+

--[[用户:WildBoar|WildBoar]]（[[用户讨论:WildBoar|讨论]]）可能会导致较高的误报率

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”.<ref name="Link Analysis Workbench"/> Sparrow<ref>Sparrow M.K. 1991. Network Vulnerabilities and Strategic Intelligence in Law Enforcement’, [[International Journal of Intelligence and Counterintelligence]] Vol. 5 #3.</ref> highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.<ref name=Krebs/>

−

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. Sparrow highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.

+

Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain '''“errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”.''' Sparrow highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.

−

数据本身存在固有的问题，包括完整性(或缺失性)和持续的改变。数据可能包含'''~~“由于错误的收集或处理，以及当实体积极试图欺骗和~~ / ~~或隐瞒其行为，而造成的错误的遗漏和委托”~~''' ~~“errors of omission~~ and ~~commission because~~ of ~~faulty collection or handling, and when entities are actively attempting to deceive~~ and/~~or conceal their actions”。Sparro强调了数据分析中三个主要的问题，不完整性~~(~~数据或链接缺失的必然性~~)、模糊边界(边界确定的主观性)和动态变化(数据的持续变化性)。

+

数据本身存在固有的问题，包括完整性(或缺失性)和持续的改变。数据可能包含'''“由于收集或处理不当，以及当实体积极试图欺骗和/或隐瞒其行为，而造成的遗漏和委托错误”。''' <ref name="Link Analysis Workbench"/> Sparrow<ref>“Sparrow<ref>Sparrow M.K. 1991. Network Vulnerabilities and Strategic Intelligence in Law Enforcement’, [[International Journal of Intelligence and Counterintelligence]] Vol. 5 #3.</ref> 强调了数据分析中三个主要的问题，不完整性(数据或链路缺失的必然性)、模糊边界(边界确定的主观性)和动态变化(数据的持续变化性)。

−

~~--[[用户:趣木木|趣木木]]（[[用户讨论:趣木木|讨论]]）存疑的地方也可以用绿色语法进行标注~~

Once data is transformed into a usable format, open texture and cross referencing issues may arise. [[Open texture]] was defined by [[Friedrich Waismann|Waismann]] as the unavoidable uncertainty in meaning when empirical terms are used in different contexts.<ref>Friedrich Waismann, Verifiability (1945), p.2.</ref> Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.<ref>Lyons, D., [http://ssrn.com/abstract=212328 Open Texture and the Possibility of Legal Interpretation (2000)].</ref>

第260行：第277行：

Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts. Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.

−

~~一旦数据转换成可用的格式，开放纹理和交叉引用问题就会出现。Waismann将~~''' ~~开放纹理~~ Open Texture'''~~定义为在不同语境中使用经验词汇时不可避免的语义不确定性。当试图从多个数据源搜索和交叉引用数据时，术语含义的不确定性带来了问题。~~

+

一旦数据转换成可用的格式，就会出现开放结构和交叉引用问题。魏斯曼 Waismann将''' 开放结构 Open Texture'''定义为在不同语境中使用经验词汇时不可避免的语义不确定性。<ref>Friedrich Waismann, Verifiability (1945), p.2.</ref>当试图从多个数据源搜索和交叉引用数据时，术语含义的不确定性带来了问题。<ref>Lyons, D., [http://ssrn.com/abstract=212328 Open Texture and the Possibility of Legal Interpretation (2000)].</ref>

第268行：第285行：

The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.

−

~~目前，解决数据分析中这些问题的主要方法是依赖专家的'''~~<~~font color="#ff8000"~~>~~领域知识 Domain Knowledge~~</~~font~~>'''。如此进行链路分析是非常耗时和昂贵的，并且无法排除其自身固有的问题。麦格拉斯等人得出结论，网络图的分布和表示方式对用户的“对存在在网络中群体的感知”有重大影响。即使是领域内的专家也可能导致不同的结论，因为分析可能是很主观的。

+

目前，解决数据分析中这些问题的主要方法是依赖专家的领域知识。如此进行链路分析是非常耗时和昂贵的，并且无法排除其自身固有的问题。麦格拉斯 McGrath 等人得出结论：网络图的分布和表示方式对用户的“对存在在网络中群体的感知”有重大影响。<ref>McGrath, C., Blythe, J., Krackhardt, D., [http://www.andrew.cmu.edu/user/cm3t/groups.html Seeing Groups in Graph Layouts].</ref> 即使是领域内的专家也可能得出不同的结论，因为他们的分析可能是很主观的。

+

===Prosecution vs. crime prevention===

第276行：第294行：

Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.

−

~~目前，链接分析技术主要用于起诉，因为回顾历史数据以期从中获得模式，要比预测未来的行动容易得多。~~

+

目前，链路分析技术主要用于经营，因为回顾历史数据以期从中获得模式，要比预测未来的行动容易得多。

第284行：第302行：

Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks. Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.

−

~~Krebs基于袭击后的详细公开资料进行绘图，演示了与9月11日袭击事件的19名劫机者有关的恐怖分子关系网的关联矩阵和链接图。即使有事后诸葛亮的优势~~--[[用户:Ryan|Ryan]]（[[用户讨论:Ryan|讨论]]），以及关于人员、地点和交易的公开可用信息，做出的结果图很明显仍然缺少数据。

+

克雷布斯 Krebs基于袭击后的详细公开资料进行绘图，展示了与9月11日袭击事件的19名劫机者有关的恐怖分子关系网的关联矩阵和链路图。<ref name=Krebs/>即使有事后诸葛亮的优势--[[用户:Ryan|Ryan]]（[[用户讨论:Ryan|讨论]]），以及关于人员、地点和交易的公开可用信息，做出的结果图很明显仍然缺少数据。

+

--[[用户:趣木木|趣木木]]（[[用户讨论:趣木木|讨论]]）即使有事后诸葛亮的优势如果进行了意译应标注讨论出来

+

--[[用户:WildBoar|WildBoar]]（[[用户讨论:WildBoar|讨论]]）或许可以翻译成后见之明？

第292行：第312行：

Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.

−

另外，Picarelli认为，使用链路分析技术可以用来查明并有可能防止奥姆真理教的非法活动。“我们必须小心‘牵连犯罪’。与恐怖分子有联系并不能证明有罪——但确实得进行调查。” 在审查较为敏感的数据以防止尚未发生的犯罪或非法活动时，如何同时不违背合理依据、隐私权和结社自由等法律概念将变得很困难。

+

另外，皮卡雷利 Picarelli认为，使用链路分析技术可以用来查明并有可能防止奥姆真理教的非法活动。<ref>Picarelli, J. T., [http://kdl.cs.umass.edu/events/aila1998/picarelli.ps Transnational Threat Indications and Warning: The Utility of Network Analysis, Military and Intelligence Analysis Group].</ref>“我们必须小心‘牵连犯罪’。与恐怖分子有联系并不能证明有罪——但确实应该进行调查。”<ref name=Krebs/>如何在审查潜在敏感数据以防止尚未发生的犯罪或非法活动时不违背合理依据、隐私权和结社自由等法律概念将会变得很困难。

+

==Proposed solutions==

第300行：第321行：

There are four categories of proposed link analysis solutions:

−

有四类拟议的链路分析解决方案:

+

有四类拟议的链路分析解决方案:<ref>Schroeder et al., Automated Criminal Link Analysis Based on Domain Knowledge, Journal of the American Society for Information Science and Technology, 58:6 (842), 2007.</ref>

第334行：第355行：

Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.

−

~~基于启发式的工具运用从专家知识中提取出来的决策规则对结构化数据进行操作。基于模板的工具使用~~'''自然语言处理 Natural Language Processing'''从非结构化数据中提取与预定义模板匹配的细节。基于相似度的方法使用加权评分来比较属性和识别潜在的链接。统计方法基于词汇统计识别潜在的链接。

+

基于启发式的工具利用结构化数据从专家知识中提取决策规则。基于模板的工具使用'''自然语言处理 Natural Language Processing'''从非结构化数据中提取与预定义模板匹配的具体信息。基于相似度的方法使用加权评分来比较属性和识别潜在的链路。统计方法基于词汇统计识别潜在的链路。

===CrimeNet explorer===

+

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer.<ref name=Xu>Xu, J.J. & Chen, H., CrimeNet Explorer: A Framework for Criminal Network Knowledge Discovery, ACM Transactions on Information Systems, 23(2), April 2005, pp. 201-226.</ref> This framework includes the following elements:

第344行：第366行：

J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer. This framework includes the following elements:

−

J.J.Xu和H.Chen 提出了一个自动化网络分析和可视化的框架，叫做 CrimeNet ~~Explorer。这一框架包括以下内容~~:

+

J.J.Xu和H.Chen 提出了一个自动化网络分析和可视化的框架，叫做 CrimeNet Explorer。<ref name=Xu>Xu, J.J. & Chen, H., CrimeNet Explorer: A Framework for Criminal Network Knowledge Discovery, ACM Transactions on Information Systems, 23(2), April 2005, pp. 201-226.</ref> 这一框架包括以下内容:

* Network Creation through a concept space approach that uses “[[Co-occurrence networks|co-occurrence]] weight to measure the frequency with which two words or phrases appear in the same document. The more frequently two words or phrases appear together, the more likely it will be that they are related”.<ref name=Xu/>

−

通过概念空间方法创建网络，该方法使用“共现网络”来衡量两个单词或短语在同一文档中出现的频率。两个单词或短语在一起出现的频率越高，它们关联的可能性就越大。

+

通过概念空间方法创建网络，该方法使用“共现网络”来衡量两个单词或短语在同一文档中出现的频率。两个单词或短语在一起出现的频率越高，它们关联的可能性就越大。<ref name=Xu/>

* Network Partition using “hierarchical clustering to partition a network into subgroups based on relational strength”.<ref name=Xu/>

−

网络分区通过“根据关系强度的分层聚类，将网络划分为子组”而实现。

+

网络分区通过“根据关系强度的分层聚类，将网络划分为子组”而实现。<ref name=Xu/>

* Structural Analysis through “three centrality measures (degree, betweenness, and closeness) to identify central members in a given subgroup.<ref name=Xu/> CrimeNet Explorer employed [[Dijkstra's algorithm|Dijkstra’s shortest-path algorithm]] to calculate the betweenness and closeness from a single node to all other nodes in the subgroup.

−

通过“三种中心性度量（度，中间性和紧密性）来识别给定子集中的中心成员”进行结构分析。CrimeNet Explorer使用Dijkstra的最短路径算法来计算从单个节点到子组中所有其他节点的间隔度和紧密度。

+

通过“三种中心性度量（度中心性，中介中心性和接近中心性）来识别给定子集中的中心成员”进行结构分析。CrimeNet Explorer使用''' 迪杰斯特拉最短路径算法 Dijkstra’s shortest-path algorithm'''来计算从单个节点到子组中所有其他节点的中介数和紧密程度。

* Network Visualization using Torgerson’s metric [[Multidimensional scaling|multidimensional scaling (MDS)]] algorithm.

−

~~使用Torgerson的度量多维标度（MDS）算法进行网络可视化。~~

+

使用托格森的度量多维标度（MDS）算法进行网络可视化。

==References==

+

==参考==

+

第372行：第400行：

==External links==

+

==外部链接==

+

* {{cite book | last1 = Bartolini | first1 = I | last2 = Ciaccia | first2 = P. | title = Imagination: Accurate Image Annotation Using Link-Analysis Techniques | citeseerx = 10.1.1.63.2453 }}

WildBoar

10

个编辑

更改

链路分析 (查看源代码)

2020年12月20日 (日) 23:32的版本

导航菜单

搜索