更改

添加808字节 、 2020年8月14日 (五) 09:11
无编辑摘要
第246行: 第246行:  
Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”.<ref name="Link Analysis Workbench"/> Sparrow<ref>Sparrow M.K. 1991. Network Vulnerabilities and Strategic Intelligence in Law Enforcement’, [[International Journal of Intelligence and Counterintelligence]] Vol. 5 #3.</ref> highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.<ref name=Krebs/>
 
Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”.<ref name="Link Analysis Workbench"/> Sparrow<ref>Sparrow M.K. 1991. Network Vulnerabilities and Strategic Intelligence in Law Enforcement’, [[International Journal of Intelligence and Counterintelligence]] Vol. 5 #3.</ref> highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.<ref name=Krebs/>
   −
Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.
+
Data itself has inherent issues including integrity (or lack of) and continuous changes. Data may contain “errors of omission and commission because of faulty collection or handling, and when entities are actively attempting to deceive and/or conceal their actions”. Sparrow highlights incompleteness (inevitability of missing data or links), fuzzy boundaries (subjectivity in deciding what to include) and dynamic changes (recognition that data is ever-changing) as the three primary problems with data analysis.
   −
数据本身存在固有的问题,包括完整性(或缺乏)和持续的更改。数据可能包含”由于错误的收集或处理,以及当实体积极试图欺骗和 / 或隐瞒其行为而造成的遗漏和委托错误”。强调数据分析的三个主要问题是不完整性(缺失数据或链接的必然性)、模糊边界(决定包括什么的主观性)和动态变化(认识到数据是不断变化的)。
+
数据本身存在固有的问题,包括完整性(或缺失性)和持续的改变。数据可能包含“由于错误的收集或处理,以及当实体积极试图欺骗和 / 或隐瞒其行为,而造成的错误的遗漏和委托”。Sparro强调了数据分析中三个主要的问题,不完整性(数据或链接缺失的必然性)、模糊边界(边界确定的主观性)和动态变化(数据的持续变化性)。
      第256行: 第256行:  
Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts. Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.
 
Once data is transformed into a usable format, open texture and cross referencing issues may arise. Open texture was defined by Waismann as the unavoidable uncertainty in meaning when empirical terms are used in different contexts. Uncertainty in meaning of terms presents problems when attempting to search and cross reference data from multiple sources.
   −
一旦数据转换成可用的格式,开放纹理和交叉引用问题就会出现。韦斯曼将开放结构定义为在不同语境中使用经验词汇时不可避免的意义不确定性。当试图从多个来源搜索和交叉引用数据时,术语含义的不确定性带来了问题。
+
一旦数据转换成可用的格式,开放纹理和交叉引用问题就会出现。Waismann将'''<font color="#ff8000"> 开放纹理</font>'''定义为在不同语境中使用经验词汇时不可避免的语义不确定性。当试图从多个数据源搜索和交叉引用数据时,术语含义的不确定性带来了问题。
      第264行: 第264行:  
The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.
 
The primary method for resolving data analysis issues is reliance on domain knowledge from an expert. This is a very time-consuming and costly method of conducting link analysis and has inherent problems of its own. McGrath et al. conclude that the layout and presentation of a network diagram have a significant impact on the user’s “perceptions of the existence of groups in networks”. Even using domain experts may result in differing conclusions as analysis may be subjective.
   −
解决数据分析问题的主要方法是依赖专家的领域知识。这是一个进行链路分析的非常耗时和昂贵的方法,并且有其自身固有的问题。麦格拉斯等人。得出结论,网络图的布局和表示方式对用户的“对网络中群体存在的感知”有重大影响。即使使用领域专家也可能导致不同的结论,因为分析可能是主观的。
+
目前,解决数据分析中这些问题的主要方法是依赖专家的领域知识。如此进行链路分析是非常耗时和昂贵的,并且无法排除其自身固有的问题。麦格拉斯等人得出结论,网络图的分布和表示方式对用户的“对存在在网络中群体的感知”有重大影响。即使是领域内的专家也可能导致不同的结论,因为分析可能是很主观的。
      第274行: 第274行:  
Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.
 
Link analysis techniques have primarily been used for prosecution, as it is far easier to review historical data for patterns than it is to attempt to predict future actions.
   −
链接分析技术主要用于起诉,因为回顾历史数据以获得模式要比预测未来的行动容易得多。
+
目前,链接分析技术主要用于起诉,因为回顾历史数据以期从中获得模式,要比预测未来的行动容易得多。
      第282行: 第282行:  
Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks. Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.
 
Krebs demonstrated the use of an association matrix and link chart of the terrorist network associated with the 19 hijackers responsible for the September 11th attacks by mapping publicly available details made available following the attacks. Even with the advantages of hindsight and publicly available information on people, places and transactions, it is clear that there is missing data.
   −
Krebs 通过绘制袭击后公布的详细资料,演示了与9月11日袭击事件的19名劫机者有关的恐怖主义网络的关联矩阵和链接图。即使有事后诸葛亮的优势,以及关于人员、地点和交易的公开可用信息,很明显仍然缺少数据。
+
Krebs基于袭击后的详细公开资料进行绘图,演示了与9月11日袭击事件的19名劫机者有关的恐怖分子关系网的关联矩阵和链接图。即使有事后诸葛亮的优势,以及关于人员、地点和交易的公开可用信息,做出的结果图很明显仍然缺少数据。
      第290行: 第290行:  
Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.
 
Alternatively, Picarelli argued that use of link analysis techniques could have been used to identify and potentially prevent illicit activities within the Aum Shinrikyo network. “We must be careful of ‘guilt by association’. Being linked to a terrorist does not prove guilt – but it does invite investigation.” Balancing the legal concepts of probable cause, right to privacy and freedom of association become challenging when reviewing potentially sensitive data with the objective to prevent crime or illegal activity that has not yet occurred.
   −
另外,Picarelli 认为,使用链接分析技术可以用来查明并有可能防止奥姆真理教网络内的非法活动。“我们必须小心‘连带犯罪’。与恐怖分子有联系并不能证明有罪——但确实值得调查。” 在审查可能敏感的数据以防止尚未发生的犯罪或非法活动时,如何平衡可能的原因、隐私权和结社自由等法律概念变得困难。
+
另外,Picarelli认为,使用链路分析技术可以用来查明并有可能防止奥姆真理教的非法活动。“我们必须小心‘牵连犯罪’。与恐怖分子有联系并不能证明有罪——但确实得进行调查。” 在审查较为敏感的数据以防止尚未发生的犯罪或非法活动时,如何同时不违背合理依据、隐私权和结社自由等法律概念将变得困难。
      第300行: 第300行:  
There are four categories of proposed link analysis solutions:
 
There are four categories of proposed link analysis solutions:
   −
有四类拟议的链接分析解决方案:
+
有四类拟议的链路分析解决方案:
      第326行: 第326行:  
  Statistical
 
  Statistical
   −
统计资料
+
统计方法
      第334行: 第334行:  
Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.
 
Heuristic-based tools utilize decision rules that are distilled from expert knowledge using structured data. Template-based tools employ Natural Language Processing (NLP) to extract details from unstructured data that are matched to pre-defined templates. Similarity-based approaches use weighted scoring to compare attributes and identify potential links. Statistical approaches identify potential links based on lexical statistics.
   −
基于启发式的工具利用结构化数据从专家知识中提取的决策规则。基于模板的工具使用自然语言处理(Natural Language Processing,NLP)从非结构化数据中提取与预定义模板匹配的细节。基于相似度的方法使用加权评分来比较属性和识别潜在的链接。统计方法基于词汇统计识别潜在的链接。
+
基于启发式的工具运用从专家知识中提取出来的决策规则对结构化数据进行操作。基于模板的工具使用自然语言处理(Natural Language Processing,NLP)从非结构化数据中提取与预定义模板匹配的细节。基于相似度的方法使用加权评分来比较属性和识别潜在的链接。统计方法基于词汇统计识别潜在的链接。
      第344行: 第344行:  
J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer. This framework includes the following elements:
 
J.J. Xu and H. Chen propose a framework for automated network analysis and visualization called CrimeNet Explorer. This framework includes the following elements:
   −
和 h. Chen 提出了一个自动化网络分析和可视化的框架,叫做 CrimeNet Explorer。这一框架包括以下内容:
+
J.J.Xu和H.Chen 提出了一个自动化网络分析和可视化的框架,叫做 CrimeNet Explorer。这一框架包括以下内容:
    
* Network Creation through a concept space approach that uses “[[Co-occurrence networks|co-occurrence]] weight to measure the frequency with which two words or phrases appear in the same document. The more frequently two words or phrases appear together, the more likely it will be that they are related”.<ref name=Xu/>
 
* Network Creation through a concept space approach that uses “[[Co-occurrence networks|co-occurrence]] weight to measure the frequency with which two words or phrases appear in the same document. The more frequently two words or phrases appear together, the more likely it will be that they are related”.<ref name=Xu/>
 +
通过概念空间方法创建网络,该方法使用“共现网络”来衡量两个单词或短语在同一文档中出现的频率。两个单词或短语在一起出现的频率越高,它们关联的可能性就越大。
    
* Network Partition using “hierarchical clustering to partition a network into subgroups based on relational strength”.<ref name=Xu/>
 
* Network Partition using “hierarchical clustering to partition a network into subgroups based on relational strength”.<ref name=Xu/>
 +
网络分区通过“根据关系强度的分层聚类,将网络划分为子组”而实现。
    
* Structural Analysis through “three centrality measures (degree, betweenness, and closeness) to identify central members in a given subgroup.<ref name=Xu/> CrimeNet Explorer employed [[Dijkstra's algorithm|Dijkstra’s shortest-path algorithm]] to calculate the betweenness and closeness from a single node to all other nodes in the subgroup.
 
* Structural Analysis through “three centrality measures (degree, betweenness, and closeness) to identify central members in a given subgroup.<ref name=Xu/> CrimeNet Explorer employed [[Dijkstra's algorithm|Dijkstra’s shortest-path algorithm]] to calculate the betweenness and closeness from a single node to all other nodes in the subgroup.
 +
通过“三种中心性度量(度,间隔度和紧密度)来识别给定子集中的中心成员”进行结构分析。CrimeNet Explorer使用Dijkstra的最短路径算法来计算从单个节点到子组中所有其他节点的间隔度和紧密度。
    
* Network Visualization using Torgerson’s metric [[Multidimensional scaling|multidimensional scaling (MDS)]] algorithm.
 
* Network Visualization using Torgerson’s metric [[Multidimensional scaling|multidimensional scaling (MDS)]] algorithm.
 
+
使用Torgerson的度量多维标度(MDS)算法进行网络可视化。
     
75

个编辑