更改

跳到导航 跳到搜索
添加5,123字节 、 2020年5月6日 (三) 22:24
无编辑摘要
第11行: 第11行:     
===mfinder 算法===
 
===mfinder 算法===
 +
 +
mfinder, the first motif-mining tool, implements two kinds of motif finding algorithms: a full enumeration and a sampling method. Until 2004, the only exact counting method for NM (network motif) detection was the brute-force one proposed by Milo et al..[3] This algorithm was successful for discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible. Hence, a new approach to this problem was needed.
 +
 +
'''mfinder'''是第一个模体挖掘工具,它主要有两种模体查找算法:完全枚举 full enumeration 和采样方法 sampling method。直到2004年,用于NM('''网络模体 networkmotif''')检测的唯一精确计数方法是'''Milo'''等人提出的暴力穷举方法。[3]该算法成功地发现了小规模的模体,但是这种方法甚至对于发现规模为5个或6个的模体在计算上都不可行的。因此,需要一种解决该问题的新方法。
 +
 +
Kashtan et al. [9] presented the first sampling NM discovery algorithm, which was based on edge sampling throughout the network. This algorithm estimates concentrations of induced sub-graphs and can be utilized for motif discovery in directed or undirected networks. The sampling procedure of the algorithm starts from an arbitrary edge of the network that leads to a sub-graph of size two, and then expands the sub-graph by choosing a random edge that is incident to the current sub-graph. After that, it continues choosing random neighboring edges until a sub-graph of size n is obtained. Finally, the sampled sub-graph is expanded to include all of the edges that exist in the network between these n nodes. When an algorithm uses a sampling approach, taking unbiased samples is the most important issue that the algorithm might address. The sampling procedure, however, does not take samples uniformly and therefore Kashtan et al. proposed a weighting scheme that assigns different weights to the different sub-graphs within network.[9] The underlying principle of weight allocation is exploiting the information of the sampling probability for each sub-graph, i.e. the probable sub-graphs will obtain comparatively less weights in comparison to the improbable sub-graphs; hence, the algorithm must calculate the sampling probability of each sub-graph that has been sampled. This weighting technique assists mfinder to determine sub-graph concentrations impartially.
 +
 +
'''Kashtan''' 等人[9]首次提出了一种基于边缘采样的网络模体(NM)采样发现算法。该算法估计了诱导子图的<font color="red">集中程度 concentrations </font>,可用于有向或无向网络中的模体发现。该算法的采样过程从网络的任意一条边开始,该边连向大小为2的子图,然后选择一条与当前子图相关的随机边对子图进行扩展。之后,它将继续选择随机的相邻边,直到获得大小为n的子图为止。最后,采样得到的子图扩展为包括这n个节点在内的网络中存在的所有边。当使用采样方法时,获取无偏样本是这类算法可能面临的最重要问题。而且,采样过程并不能保证采到所有的样本(也就是不能保证得到所有的子图,译者注),因此,Kashtan 等人又提出了一种加权方案,为网络中的不同子图分配不同的权重。[9]权重分配的基本原理是利用每个子图的抽样概率信息,即,与不可能的子图相比,可能的子图将获得相对较少的权重;因此,该算法必须计算已采样的每个子图的采样概率。这种加权技术有助于mfinder公正地确定子图的浓度。
 +
 +
在扩展到与穷举搜索形成鲜明对比的过程中,算法的计算时间令人惊讶地渐近地与网络大小无关。对算法的计算时间的分析表明,对于网络中大小为n的子图的每个样本,它的取值为O(n n)。另一方面,在[9]中没有对需要解决图形同构性的采样子图的分类时间进行分析。每个子图样本的问题。另外,通过子图权重计算将额外的计算工作强加于该算法。但是不可避免的是,该算法可能会多次采样相同的子图–花时间而不收集任何信息。[10]总之,通过利用采样的优势,该算法的性能比穷举搜索算法更有效;但是,它只能大致确定子图的浓度。由于该算法的主要实现方式,它可以找到最大为6的主题,因此它给出的是最重要的主题,而其他所有主题都不是。另外,有必要提到此工具没有视觉呈现的选项。采样算法简要显示:
 +
 +
 +
 +
 +
[3]Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002). "Network motifs: simple building blocks of complex networks". Science. 298 (5594): 824–827. Bibcode:2002Sci...298..824M. CiteSeerX 10.1.1.225.8750. doi:10.1126/science.298.5594.824. PMID 12399590.
 +
[9]Kashtan N, Itzkovitz S, Milo R, Alon U (2004). "Efficient sampling algorithm for estimating sub-graph concentrations and detecting network motifs". Bioinformatics. 20 (11): 1746–1758. doi:10.1093/bioinformatics/bth163. PMID 15001476.
106

个编辑

导航菜单