更改

删除6,697字节 、 2020年5月9日 (六) 23:56
第108行: 第108行:       −
{|class="wikitable" align="center"
+
{|class="wikitable" style="width: 75%;margin:0 auto"
 
|-
 
|-
 
! Mavisto
 
! Mavisto
第158行: 第158行:     
===ESU (FANMOD)算法及对应的软件===
 
===ESU (FANMOD)算法及对应的软件===
The sampling bias of Kashtan ''et al.'' <ref name="kas1" /> provided great impetus for designing better algorithms for the NM discovery problem. Although Kashtan ''et al.'' tried to settle this drawback by means of a weighting scheme, this method imposed an undesired overhead on the running time as well a more complicated implementation. This tool is one of the most useful ones, as it supports visual options and also is an efficient algorithm with respect to time. But, it has a limitation on motif size as it does not allow searching for motifs of size 9 or higher because of the way the tool is implemented.
     −
由于Kashtan等学者发现的采样偏差问题,所以针对NM discovery problem需要设计更好的算法。虽然Kashtan等人尝试用加权法来解决这个弊端,但这个方法在运行上,消耗了过多的运行时间,且实现起来也变得更加复杂。但这个工具还是最好用的工具之一,因为它支持可视化选项,同时也『是个节约时间的算法』。但是,它在所支持的模体的规模大小还是有局限性。由于该工具在具体实施中,不允许搜索规模大小为9或者更大的模体。
+
由于Kashtan等学者发现的采样偏差问题, <ref name="kas1" />所以针对NM发现问题需要设计更好的算法。虽然Kashtan等人尝试用加权法来解决这个弊端,但这个方法在运行上,消耗了过多的运行时间,且实现起来也变得更加复杂。但这个工具还是最好用的工具之一,因为它支持可视化选项,同时也是个节约时间的算法。但是,它在所支持的模体的规模大小还是有局限性。由于该工具在具体实施中,不允许搜索规模大小为9或者更大的模体。
   −
Wernicke <ref name="wer1" /> introduced an algorithm named ''RAND-ESU'' that provides a significant improvement over ''mfinder''.<ref name="kas1" /> This algorithm, which is based on the exact enumeration algorithm ''ESU'', has been implemented as an application called ''FANMOD''.<ref name="wer1" /> ''RAND-ESU'' is a NM discovery algorithm applicable for both directed and undirected networks, effectively exploits an unbiased node sampling throughout the network, and prevents overcounting sub-graphs more than once. Furthermore, ''RAND-ESU'' uses a novel analytical approach called ''DIRECT'' for determining sub-graph significance instead of using an ensemble of random networks as a Null-model. The ''DIRECT'' method estimates the sub-graph concentration without explicitly generating random networks.<ref name="wer1" /> Empirically, the DIRECT method is more efficient in comparison with the random network ensemble in case of sub-graphs with a very low concentration; however, the classical Null-model is faster than the ''DIRECT'' method for highly concentrated sub-graphs.<ref name="mil1" /><ref name="wer1" /> In the following, we detail the ''ESU'' algorithm and then we show how this exact algorithm can be modified efficiently to ''RAND-ESU'' that estimates sub-graphs concentrations.
     −
Weinicke引入了一种叫RAND-ESU的算法,这个新引入的算法比Mfinder软件有着更显著的提升。RAND-ESU基于精准的ESU算法,已有对应的软件FANMOD。RAND-ESU是一种[NM算法],可应用于定向的或者不定向的网络中,能够有效的在网络中利用无偏差节点进行采样,以及保证了一个子图仅仅被搜索一次,且不会产生无意义的子图。并且,RAND-ESU采用了一个叫做DIRECT的全新的分析方式,从而来确定子图的重要性,而不是用随机网络的组合来建立『Null模型』。DIRECT方法可以不用大量生成随机网络就能估计子图的浓度。实际上,相较于用随机网络组合分析比较低集中度的子图来说,DIRECT这个方法更加的高效。但是,传统的『Null模型』又比DIRECT这个算法能更加快速地解决高度集中的子图。接下来,我们将详细讲述ESU算法和展示如何把这种精确的算法调整为RAND-ESU算法去估计子图的浓度。
+
Weinicke<ref name="wer1" /> 引入了一种叫RAND-ESU的算法,这个新引入的算法比Mfinder软件有着更显著的提升。.<ref name="kas1" /> RAND-ESU基于精准的ESU算法,已有对应的软件FANMOD。<ref name="wer1" />RAND-ESU是一种NM发现算法,可应用于定向的或者不定向的网络中,能够有效的在网络中利用无偏差节点进行采样,以及保证了一个子图仅仅被搜索一次,且不会产生无意义的子图。并且,RAND-ESU采用了一个叫做DIRECT的全新的分析方式,从而来确定子图的重要性,而不是用随机网络的组合来建立Null模型。DIRECT方法可以不用大量生成随机网络就能估计子图的浓度。实际上,相较于用随机网络组合分析比较低集中度的子图来说,DIRECT这个方法更加的高效。但是,传统的Null模型比DIRECT这个算法能更加快速地解决高度集中的子图。接下来,我们将详细讲述ESU算法和展示如何把这种精确的算法调整为RAND-ESU算法去估计子图的集中度。
   −
The algorithms ''ESU'' and ''RAND-ESU'' are fairly simple, and hence easy to implement. ''ESU'' first finds the set of all induced sub-graphs of size {{math|k}}, let {{math|S<sub>k</sub>}} be this set. ''ESU'' can be implemented as a recursive function; the running of this function can be displayed as a tree-like structure of depth {{math|k}}, called the ESU-Tree (see figure). Each of the ESU-Tree nodes indicate the status of the recursive function that entails two consecutive sets SUB and EXT. SUB refers to nodes in the target network that are adjacent and establish a partial sub-graph of size {{math|{{!}}SUB{{!}} ≤ k}}. If {{math|{{!}}SUB{{!}} {{=}} k}}, the algorithm has found an induced complete sub-graph, so {{math|S<sub>k</sub> {{=}} SUB ∪ S<sub>k</sub>}}. However, if  {{math|{{!}}SUB{{!}} < k}}, the algorithm must expand SUB to achieve cardinality {{math|k}}. This is done by the EXT set that contains all the nodes that satisfy two conditions: First, each of the nodes in EXT must be adjacent to at least one of the nodes in SUB; second, their numerical labels must be larger than the label of first element in SUB. The first condition makes sure that the expansion of SUB nodes yields a connected graph and the second condition causes ESU-Tree leaves (see figure) to be distinct; as a result, it prevents overcounting. Note that, the EXT set is not a static set, so in each step it may expand by some new nodes that do not breach the two conditions. The next step of ESU involves classification of sub-graphs placed in the ESU-Tree leaves into non-isomorphic size-{{math|k}} graph classes; consequently, ESU determines sub-graphs frequencies and concentrations. This stage has been implemented simply by employing McKay's ''nauty'' algorithm,<ref name="mck1">{{cite journal |author=McKay BD |title=Practical graph isomorphism |journal=Congressus Numerantium |year=1981 |volume=30 |pages=45–87|bibcode=2013arXiv1301.1493M |arxiv=1301.1493 }}</ref><ref name="mck2">{{cite journal |author=McKay BD |title=Isomorph-free exhaustive generation |journal=Journal of Algorithms |year=1998 |volume=26 |issue=2 |pages=306–324 |doi=10.1006/jagm.1997.0898}}</ref> which classifies each sub-graph by performing a graph isomorphism test. Therefore, ESU finds the set of all induced {{math|k}}-size sub-graphs in a target graph by a recursive algorithm and then determines their frequency using an efficient tool.
     −
ESU和RAND-ESU两种算法都比较简捷,所以实现起来都很容易。『ESU首先找到大小为k的所有诱导子图的集合』,并命名这个集合为Sk。因为EUS以递归函数的形式实现,该函数的运行可以演示为『k级』的树状结构,称为ESU-Tree(见图)。每一个在ESU-Tree上的节点都表示递归函数的状态,这个递归函数需要两个连续集合的SUB和EXT。『SUB指的是在目标网络的相邻节点上,并且是一部分的层级绝对值大小小于等于k的子图集合。』如果SUB集合层级的绝对值等于k,那么这个算法可以找到一个『完整的诱导子图』,所以在此情况下Sk等于SUB与Sk的并集。相反,如果它的绝对值小于k,那么这个算法必须把SUB扩大,才能实现基数为k。『EXT这个集合包含了所有的满足以下两个情况的节点。第一,每个在EXT的节点必须至少与在SUB的一个节点相邻。第二,他们的下标必须比在SUB的第一个元素大。』???第一个条件保证了『SUB节点的展开产生相关的图』,第二个条件能使ESU-Tree树状图上的分支变得离散。所以,这个方法可以避免过度计算。注意,EXT集合不是一个固定的集合。所以每一步都有可能扩展满足于以上两个条件的新节点。下一步包含了在ESU-Tree分支上的子图的分类,『将它们分为非同构的大小为k的图类』。因此,ESU决定了子图的『频率以及浓度』。这一阶段的实施仅通过运用McKay的nauty算法,这一算法可以通过图的同构测试来把每个子图进行分类。所以,ESU能够在目标图中通过递归算法,找到所有规模大小为k的诱导子图集合,且使用高效的工具来确定他们的『频率』。
+
ESU和RAND-ESU两种算法都比较简捷,所以实现起来都很容易。『ESU首先找到大小为{{math|k}}的所有诱导子图的集合』,并命名这个集合为Sk。因为EUS以递归函数的形式实现,该函数的运行可以演示为深度为{{math|k}}的树状结构,称为ESU-Tree(见图)。每一个在ESU-Tree上的节点都表示递归函数的状态,这个递归函数需要两个连续集合的SUB和EXT。SUB指的是在目标网络的相邻节点上{{math|{{!}}SUB{{!}} ≤ k}}。如果{{math|{{!}}SUB{{!}} {{=}} k}},那么这个算法可以找到一个完整的诱导子图,所以在此情况下{{math|S<sub>k</sub> {{=}} SUB ∪ S<sub>k</sub>}}。相反,如果{{math|{{!}}SUB{{!}} < k}},那么这个算法必须把SUB扩大,才能实现基数为{{math|k}}。EXT这个集合包含了所有的满足以下两个情况的节点。第一,每个在EXT的节点必须至少与在SUB的一个节点相邻。第二,他们的下标必须比在SUB的第一个元素大。第一个条件保证了SUB节点的展开产生相关的图,第二个条件能使ESU-Tree树状图上的分支变得离散。所以,这个方法可以避免过度计算。注意,EXT集合不是一个固定的集合。所以每一步都有可能扩展满足于以上两个条件的新节点。下一步包含了在ESU-Tree分支上的子图的分类,将它们分为非同构的大小为{{math|k}}的图类。因此,ESU决定了子图的『频率以及浓度』。这一阶段的实施仅通过运用McKay的nauty算法,<ref name="mck1">{{cite journal |author=McKay BD |title=Practical graph isomorphism |journal=Congressus Numerantium |year=1981 |volume=30 |pages=45–87|bibcode=2013arXiv1301.1493M |arxiv=1301.1493 }}</ref><ref name="mck2">{{cite journal |author=McKay BD |title=Isomorph-free exhaustive generation |journal=Journal of Algorithms |year=1998 |volume=26 |issue=2 |pages=306–324 |doi=10.1006/jagm.1997.0898}}</ref>这一算法可以通过图的同构测试来把每个子图进行分类。所以,ESU能够在目标图中通过递归算法,找到所有规模大小为{{math|k}}的诱导子图集合,且使用高效的工具来确定他们的频率。
   −
The procedure of implementing ''RAND-ESU'' is quite straightforward and is one of the main advantages of ''FANMOD''. One can change the ''ESU'' algorithm to explore just a portion of the ESU-Tree leaves by applying a probability value {{math|0 ≤ p<sub>d</sub> ≤ 1}} for each level of the ESU-Tree and oblige ''ESU'' to traverse each child node of a node in level {{math|d-1}} with probability {{math|p<sub>d</sub>}}. This new algorithm is called ''RAND-ESU''. Evidently, when {{math|p<sub>d</sub> {{=}} 1}} for all levels, ''RAND-ESU'' acts like ''ESU''. For {{math|p<sub>d</sub> {{=}} 0}} the algorithm finds nothing. Note that, this procedure ensures that the chances of visiting each leaf of the ESU-Tree are the same, resulting in ''unbiased'' sampling of sub-graphs through the network. The probability of visiting each leaf is {{math|∏<sub>d</sub>p<sub>d</sub>}} and this is identical for all of the ESU-Tree leaves; therefore, this method guarantees unbiased sampling of sub-graphs from the network. Nonetheless, determining the value of {{math|p<sub>d</sub>}} for {{math|1 ≤ d ≤ k}} is another issue that must be determined manually by an expert to get precise results of sub-graph concentrations.<ref name="cir1" /> While there is no lucid prescript for this matter, the Wernicke provides some general observations that may help in determining p_d values. In summary, ''RAND-ESU'' is a very fast algorithm for NM discovery in the case of induced sub-graphs supporting unbiased sampling method. Although, the main ''ESU'' algorithm and so the ''FANMOD'' tool is known for discovering induced sub-graphs, there is trivial modification to ''ESU'' which makes it possible for finding non-induced sub-graphs, too. The pseudo code of ''ESU (FANMOD)'' is shown below:
  −
运用RAND-ESU的过程十分的简单,这也是FANMOD的一个主要的优点。可以通过对ESU-Tree『树状图』的每个级别应用概率{{math|0 ≤ p<sub>d</sub> ≤ 1}}并强制ESU以概率{{math|p<sub>d</sub>}}遍历{{math|d-1}}级别中节点的每个子节点,来更改ESU算法使其仅搜索ESU-Tree分支的一部分。 这种新的演算方式叫RAND-ESU。显然,当所有阶段{{math|p<sub>d</sub> {{=}} 1}}时,RAND-ESU等同于ESU。当{{math|p<sub>d</sub> {{=}} 0}}时,在这个算法下没有任何意义。注意,这个过程只是确保了可以找到ESU-Tree上的每一分支的机会都是相同的,从而使网络中的子图采样无偏差。访问每个分支的概率为{{math|∏<sub>d</sub>p<sub>d</sub>}},这对于所有ESU-Tree中的分支都是相同的; 因此,该方法可确保从网络中对子图进行无偏采样。但是,设置{{math|1 ≤ d ≤ k}}的{{math|p<sub>d</sub>}}参数是另一个问题,必须由专家人工确定才能获得子图『浓度』的精确结果。尽管对此没有明确的规定,但是Wrenucke提出了一些一般性的观察结论,这些结论有可能可以帮助我们确定p_d值。总的来说,在诱导子图支持无偏采样方法的情况下,RAND-ESU是一个能快速解决『NM discovery problem』的算法。 尽管,ESU算法的主要部分和FANMOD工具是以用来寻找诱导子图而著称的,但只需对ESU进行细小的改动,就可以用来寻找诱导子图。ESU(FANMOD)的伪代码如下:
  −
[[File:ESU-Tree.jpg|thumb|(a) ''A target graph of size 5'', (b) ''the ESU-tree of depth k that is associated to the extraction of sub-graphs of size 3 in the target graph''. Leaves correspond to set S3 or all of the size-3 induced sub-graphs of the target graph (a). Nodes in the ESU-tree include two adjoining sets, the first set contains adjacent nodes called SUB and the second set named EXT holds all nodes that are adjacent to at least one of the SUB nodes and where their numerical labels are larger than the SUB nodes labels. The EXT set is utilized by the algorithm to expand a SUB set until it reaches a desired sub-graph size that are placed at the lowest level of ESU-Tree (or its leaves).]]
     −
{| class="wikitable"
+
运用RAND-ESU的过程十分的简单,这也是FANMOD的一个主要的优点。可以通过对ESU-Tree树状图的每个级别应用概率{{math|0 ≤ p<sub>d</sub> ≤ 1}}并强制ESU以概率{{math|p<sub>d</sub>}}遍历{{math|d-1}}级别中节点的每个子节点,来更改ESU算法使其仅搜索ESU-Tree分支的一部分。 这种新的演算方式叫RAND-ESU。显然,当所有阶段{{math|p<sub>d</sub> {{=}} 1}}时,RAND-ESU等同于ESU。当{{math|p<sub>d</sub> {{=}} 0}}时,在这个算法下没有任何意义。注意,这个过程只是确保了可以找到ESU-Tree上的每一分支的机会都是相同的,从而使网络中的子图采样无偏差。访问每个分支的概率为{{math|∏<sub>d</sub>p<sub>d</sub>}},这对于所有ESU-Tree中的分支都是相同的; 因此,该方法可确保从网络中对子图进行无偏采样。但是,设置{{math|1 ≤ d ≤ k}}{{math|p<sub>d</sub>}}参数是另一个问题,必须由专家人工确定才能获得子图集中度的精确结果。<ref name="cir1" /> 尽管对此没有明确的规定,但是Wrenucke提出了一些一般性的观察结论,这些结论有可能可以帮助我们确定p_d值。总的来说,在诱导子图支持无偏采样方法的情况下,RAND-ESU是一个能快速解决NM发现问题的算法。 尽管,ESU算法的主要部分和FANMOD工具是以用来寻找诱导子图而著称的,但只需对ESU进行细小的改动,就可以用来寻找诱导子图。ESU(FANMOD)的伪代码如下:
|-
  −
! Enumeration of ESU (FANMOD)
  −
|-
  −
|'''''EnumerateSubgraphs(G,k)'''''
  −
 
  −
'''Input:''' A graph {{math|G {{=}} (V, E)}} and an integer {{math|1 ≤ k ≤ {{!}}V{{!}}}}.
  −
 
  −
'''Output:''' All size-{{math|k}} subgraphs in {{math|G}}.
  −
 
  −
'''for each''' vertex {{math|v ∈ V}} '''do'''
  −
 
  −
{{pad|2em}}{{math|VExtension ← {u ∈ N({v}) {{!}} u > v}}}
     −
{{pad|2em}}'''call''' {{math|''ExtendSubgraph''({v}, VExtension, v)}}
+
[[File:ESU-Tree.jpg|thumb|图(a)大小为5的目标图,图(b)深度为k的ESU树,与目标图中大小为3的子图的提取相关。叶子对应于目标图(a)的集合S3或所有大小3诱导子图。ESU树中的节点包括两个相邻的集合,第一个集合包含称为SUB的相邻节点,第二个名为EXT的集合保存与至少一个SUB节点相邻且其数字标签大于SUB节点的所有节点标签。该算法利用EXT集扩展SUB集,直到达到所需的子图大小为止,该子图大小位于ESU-Tree(或其叶)的最低级别。]]
   −
'''endfor'''
+
{| class="wikitable" style="width: 75%;margin:0 auto"
|-
  −
|'''''ExtendSubgraph(VSubgraph, VExtension, v)'''''
  −
 
  −
'''if''' {{math|{{!}}VSubgraph{{!}} {{=}} k}} '''then''' output {{math|G[VSubgraph]}} and '''return'''
  −
 
  −
'''while''' {{math|VExtension ≠ ∅}} '''do'''
  −
 
  −
{{pad|2em}}Remove an arbitrarily chosen vertex {{math|w}} from {{math|VExtension}}
  −
 
  −
{{pad|2em}}{{math|VExtension&prime; ← VExtension ∪ {u ∈ N<sub>excl</sub>(w, VSubgraph) {{!}} u > v}}}
  −
 
  −
{{pad|2em}}'''call''' {{math|''ExtendSubgraph''(VSubgraph ∪ {w}, VExtension&prime;, v)}}
  −
 
  −
'''return'''
  −
|}
  −
 
  −
{| class="wikitable"
   
|-
 
|-
 
! ESU子图搜索算法(FANMOD软件实现)
 
! ESU子图搜索算法(FANMOD软件实现)
第239行: 第204行:  
'''返回'''
 
'''返回'''
 
|}
 
|}
 +
    
===NeMoFinder===
 
===NeMoFinder===
7,129

个编辑