第23行: |
第23行: |
| | | |
| 与穷举搜索形成鲜明对比的是,该算法的计算时间竟然与网络大小渐近无关。对算法时间复杂度的分析表明,对于网络中大小为n的子图的每个样本,它的时间复杂度为<math>O(n^n)</math>。另一方面,<font color="red">并没有对已采样子图的每一个子图样本判断图同构问题的分类时间进行分析</font>[9]。另外,子图权重计算将额外增加该算法的计算负担。但是不得不指出的是,该算法可能会多次采样相同的子图——花费时间而不收集任何有用信息。[10]总之,通过利用采样的优势,该算法的性能比穷举搜索算法更有效;但是,它只能大致确定子图的<font color="red">集中度 concentrations </font>。由于该算法的实现方式,使得它可以找到最大为6的模体,并且它会给出的最重要的模体,而不是其他所有模体。另外,有必要提到此工具没有可视化的呈现。采样算法简要显示如下: | | 与穷举搜索形成鲜明对比的是,该算法的计算时间竟然与网络大小渐近无关。对算法时间复杂度的分析表明,对于网络中大小为n的子图的每个样本,它的时间复杂度为<math>O(n^n)</math>。另一方面,<font color="red">并没有对已采样子图的每一个子图样本判断图同构问题的分类时间进行分析</font>[9]。另外,子图权重计算将额外增加该算法的计算负担。但是不得不指出的是,该算法可能会多次采样相同的子图——花费时间而不收集任何有用信息。[10]总之,通过利用采样的优势,该算法的性能比穷举搜索算法更有效;但是,它只能大致确定子图的<font color="red">集中度 concentrations </font>。由于该算法的实现方式,使得它可以找到最大为6的模体,并且它会给出的最重要的模体,而不是其他所有模体。另外,有必要提到此工具没有可视化的呈现。采样算法简要显示如下: |
| + | |
| | | |
| {|class="wikitable" | | {|class="wikitable" |
第35行: |
第36行: |
| 4. Repeat steps 2-3 until completing an n-node subgraph (until |Vs| = n). <br> | | 4. Repeat steps 2-3 until completing an n-node subgraph (until |Vs| = n). <br> |
| 5. Calculate the probability to sample the picked n-node subgraph. <br> | | 5. Calculate the probability to sample the picked n-node subgraph. <br> |
− | |- | + | |} |
− | }
| + | |
| | | |
| {|class="wikitable" | | {|class="wikitable" |
第49行: |
第50行: |
| 4. 重复步骤2-3,直到完成包含n个节点的子图 (<math>\left | V_{s} \right | = n</math>)。<br> | | 4. 重复步骤2-3,直到完成包含n个节点的子图 (<math>\left | V_{s} \right | = n</math>)。<br> |
| 5. 计算对选取的n节点子图进行采样的概率。<br> | | 5. 计算对选取的n节点子图进行采样的概率。<br> |
− | |- | + | |} |
− | } | |
| | | |
| | | |
第57行: |
第57行: |
| Schreiber and Schwöbbermeyer [12] proposed an algorithm named flexible pattern finder (FPF) for extracting frequent sub-graphs of an input network and implemented it in a system named Mavisto.[27] Their algorithm exploits the downward closure property which is applicable for frequency concepts F2 and F3. The downward closure property asserts that the frequency for sub-graphs decrease monotonically by increasing the size of sub-graphs; however, this property does not hold necessarily for frequency concept F1. FPF is based on a pattern tree (see figure) consisting of nodes that represents different graphs (or patterns), where the parent of each node is a sub-graph of its children nodes; in other words, the corresponding graph of each pattern tree's node is expanded by adding a new edge to the graph of its parent node. | | Schreiber and Schwöbbermeyer [12] proposed an algorithm named flexible pattern finder (FPF) for extracting frequent sub-graphs of an input network and implemented it in a system named Mavisto.[27] Their algorithm exploits the downward closure property which is applicable for frequency concepts F2 and F3. The downward closure property asserts that the frequency for sub-graphs decrease monotonically by increasing the size of sub-graphs; however, this property does not hold necessarily for frequency concept F1. FPF is based on a pattern tree (see figure) consisting of nodes that represents different graphs (or patterns), where the parent of each node is a sub-graph of its children nodes; in other words, the corresponding graph of each pattern tree's node is expanded by adding a new edge to the graph of its parent node. |
| | | |
− | Schreiber和Schwöbbermeyer [12]提出了一种称为灵活模式查找器(FPF)的算法,用于提取输入网络的频繁子图,并将其在名为Mavisto的系统中加以实现。[27]他们的算法利用了向下闭包特性,该特性适用于频率概念<math>F_{2}</math>和<math>F_{3}</math>。向下闭包性质表明,子图的频率随着子图的大小而单调下降;但这一性质并不一定适用于频率概念 F1。FPF算法基于模式树(见右图),由代表不同图形(或模式)的节点组成,其中每个节点的父节点是其子节点的子图;换句话说,每个模式树节点的对应图通过向其父节点图添加新边来扩展。 | + | Schreiber和Schwöbbermeyer [12]提出了一种称为灵活模式查找器(FPF)的算法,用于提取输入网络的频繁子图,并将其在名为Mavisto的系统中加以实现。[27]他们的算法利用了向下闭包特性,该特性适用于频率概念<math>F_{2}</math>和<math>F_{3}</math>。向下闭包性质表明,子图的频率随着子图的大小而单调下降;但这一性质并不一定适用于频率概念<math>F_{1}</math>。FPF算法基于模式树(见右图),由代表不同图形(或模式)的节点组成,其中每个节点的父节点是其子节点的子图;换句话说,每个模式树节点的对应图通过向其父节点图添加新边来扩展。 |
| | | |
| [[Image:The pattern tree in FPF algorithm.jpg|right|thumb|''Illustration of the pattern tree in FPF algorithm.''.]] | | [[Image:The pattern tree in FPF algorithm.jpg|right|thumb|''Illustration of the pattern tree in FPF algorithm.''.]] |
第67行: |
第67行: |
| The advantage of the algorithm is that it does not consider infrequent sub-graphs and tries to finish the enumeration process as soon as possible; therefore, it only spends time for promising nodes in the pattern tree and discards all other nodes. As an added bonus, the pattern tree notion permits FPF to be implemented and executed in a parallel manner since it is possible to traverse each path of the pattern tree independently. However, FPF is most useful for frequency concepts F2 and F3, because downward closure is not applicable to F1. Nevertheless, the pattern tree is still practical for F1 if the algorithm runs in parallel. Another advantage of the algorithm is that the implementation of this algorithm has no limitation on motif size, which makes it more amenable to improvements. The pseudocode of FPF (Mavisto) is shown below: | | The advantage of the algorithm is that it does not consider infrequent sub-graphs and tries to finish the enumeration process as soon as possible; therefore, it only spends time for promising nodes in the pattern tree and discards all other nodes. As an added bonus, the pattern tree notion permits FPF to be implemented and executed in a parallel manner since it is possible to traverse each path of the pattern tree independently. However, FPF is most useful for frequency concepts F2 and F3, because downward closure is not applicable to F1. Nevertheless, the pattern tree is still practical for F1 if the algorithm runs in parallel. Another advantage of the algorithm is that the implementation of this algorithm has no limitation on motif size, which makes it more amenable to improvements. The pseudocode of FPF (Mavisto) is shown below: |
| | | |
− | 该算法的优点是它不会考虑不频繁的子图,并尝试尽快完成枚举过程;因此,它只花时间在模式树中用于有前途的节点,而丢弃所有其他节点。另外,模式树概念允许以并行方式实现和执行FPF,因为可以独立遍历模式树的每个路径。但是,FPF对于频率概念F 2和F 3最为有用,因为向下闭合不适用于F 1。尽管如此,模式树对于F 1仍然是实用的该算法是否并行运行。该算法的另一个优点是该算法的实现对主题大小没有限制,这使其更易于改进。FPF(Mavisto)的伪代码如下所示:
| + | 该算法的优点是它不会考虑不频繁的子图,并尝试尽快完成枚举过程;因此,它只花时间在模式树中用于有希望的节点上,而放弃所有其他节点。还有一点额外的好处,模式树概念允许 FPF 以并行方式实现和执行,因为它可以独立地遍历模式树的每个路径。但是,FPF对于频率概念<math>F_{2}</math>和<math>F_{3}</math>最为有用,因为向下闭包不适用于<math>F_{1}</math>。尽管如此,如果算法并行运行,那么模式树对于<math>F_{1}</math>仍然是可行的。该算法的另一个优点是它的实现对模体大小没有限制,这使其更易于改进。FPF(Mavisto)的伪代码如下所示: |
| + | |
| | | |
| | | |
| + | {|class="wikitable" |
| + | |+ mfinder |
| + | |- |
| + | !rowspan="1"|Definitions: Esis the set of picked edges. Vs is the set of all nodes that are touched by the edges in E. |
| + | |- |
| + | |rowspan="5"|Init Vs and Es to be empty sets.<br> |
| + | 1. Pick a random edge e1 = (vi, vj). Update Es = {e1}, Vs = {vi, vj} <br> |
| + | 2. Make a list L of all neighbor edges of Es. Omit from L all edges between members of Vs. <br> |
| + | 3. Pick a random edge e = {vk,vl} from L. Update Es = Es ⋃ {e}, Vs = Vs ⋃ {vk, vl}. <br> |
| + | 4. Repeat steps 2-3 until completing an n-node subgraph (until |Vs| = n). <br> |
| + | 5. Calculate the probability to sample the picked n-node subgraph. <br> |
| + | |} |
| + | |
| + | |
| + | {|class="wikitable" |
| + | |+ mfinder |
| + | |- |
| + | !rowspan="1"|定义:<math>E_{s}</math>是采集的边集合。<math>V_{s}</math>是<math>E</math>中所有边的顶点的集合。 |
| + | |- |
| + | |rowspan="5"|初始化<math>V_{s}</math>和<math>E_{s}</math>为空集。<br> |
| + | 1. 随机选择一条边<math> e_{1} = (v_{i}, v_{j}) </math>,更新 <math>E_{s} = \{e_{1}\}, V{s} = \{v_{i}, v_{j}\}</math><br> |
| + | 2. 列出<math>E{s}</math>的所有邻边列表<math> L </math>,从<math> L </math>中删除<math>V{s}</math>中所有元素组成的边。<br> |
| + | 3. 从<math> L </math>中随机选择一条边<math> e = \{v_{k},v_{l}\} </math>, 更新<math>E_{s} = E_{s} \cup \{e\} , V_{s} = V_{s} \cup \{v_{k}, v_{l}\}</math>。<br> |
| + | 4. 重复步骤2-3,直到完成包含n个节点的子图 (<math>\left | V_{s} \right | = n</math>)。<br> |
| + | 5. 计算对选取的n节点子图进行采样的概率。<br> |
| + | |} |
| | | |
| | | |