更改

网络模体 (查看源代码)

2020年5月9日 (六) 23:09的版本

删除25,372字节、 2020年5月9日 (六) 23:09

无编辑摘要

第1行：第1行： −

大家好，我们的公众号计划要推送一篇关于网络模体的综述文章，我们希望可以配套建议该重要概念：网络模体。现在希望可以大家一起协作完成这个词条。

+

所有网络，包括生物网络 biological networks、社交网络 social networks、技术网络（例如计算机网络和电路）等，都可以用图的形式来表示，这些图中会包括各种各样的子图 subgraphs。网络的一个重要的局部性质是所谓的网络模体，即重复且具有统计意义的子图或模式 patterns。

−

~~翻译任务主要分为以下5个内容：~~

−

* 网络定义和历史 ---许菁

−

* 网络模体的发现算法 mfinder和FPF算法---李鹏

−

* 网络模体的发现算法 ESU和对应的软件FANMOD---Imp

−

* 网络模体的发现算法 G-Trie、算法对比和算法分类——Ricky（中英对照[[用户讨论:Qige96|初稿在这里]]）

−

* 已有网络模体及其函数表示 --周佳欣

−

* 活动模体+批判 --- 孙宇

−

* 代码实现

−

~~大家可以在对应感兴趣的部分下面，写上姓名。我们的协作方式是石墨文档上翻译，最后再编辑成文。~~

+

'''网络模体 Network motifs'''是指在特定网络或各种网络中重复出现的相同的子图。这些子图由顶点之间特定的交互模式定义，一个子图便可以反映一个框架，这个框架可以有效地实现某个特定的功能。事实上，之所以说模体是一个重要的特性，正是因为它们可能反映出对应网络功能的这一性质。近年来这一概念作为揭示复杂网络结构设计原理的一个有用概念而受到了广泛的关注。<ref name="mas1">{{cite journal |vauthors=Masoudi-Nejad A, Schreiber F, Razaghi MK Z |title=Building Blocks of Biological Networks: A Review on Major Network Motif Discovery Algorithms |journal=IET Systems Biology |volume=6 |issue=5 |pages=164–74 |year=2012|doi=10.1049/iet-syb.2011.0011 |pmid=23101871 }}</ref> 但是，虽然通过研究网络模体可以深入了解网络的功能，但是对于模体的检测在计算上是具有挑战性的。

−

~~对应的词条链接：https~~:/~~/en~~.~~wikipedia~~.~~org~~/~~wiki/Network_motif#Well-established_motifs_and_their_functions~~

−

~~截止时间：今晚12：00~~

+

==定义==

+

设{{math|G {{=}} (V, E)}} 和 {{math|G′ {{=}} (V′, E′)}} 是两个图。若{{math|V′ ⊆ V}}且满足{{math|E′ ⊆ E ∩ (V′ × V′)}}）（即图{{math|G′ ⊆ G}}的所有边和点都属于图{{math|G}}）则称图{{math|G′ ⊆ G}}是图{{math|G}}的一个子图。若{{math|G′ ⊆ G}}，且对于顶点{{math|u}}、{{math|v}}及其连边，只要{{math|u}}和{{math|v}}存在于{{math|G′}}（即若{{math|U}}, {{math|V′ ⊆ V}}），那么{{math|G′ ⊆ G}}中就应该包含原图{{math|G}}中的所有{{math|u}}和{{math|V}}的对应连边（即{{math|&lang;u, v&rang; ∈ E}}），则称此时图{{math|G′}}就是图{{math|G}}的导出子图。如果存在一个双射（一对一）{{math|f:V′ → V}}，且对所有{{math|u, v ∈ V′}}都有{{math|&lang;u, v&rang; ∈ E′ ⇔ &lang;f(u), f(v)&rang; ∈ E}} ，则称{{math|G&prime }}是{{math|G}}的同构图（记作：{{math|G′ → G}}），映射f则称为{{math|G}}与{{math|G′}}之间的同构 isomorphism。<ref name="die1">{{cite journal |author=Diestel R |title=Graph Theory (Graduate Texts in Mathematics) |volume=173 |year=2005|publisher=New York: Springer-Verlag Heidelberg}}</ref>

−

~~All networks, including [[biological network]]s, social networks, technological networks (e.g., computer networks and electrical circuits) and more, can be represented as [[complex network~~|~~graphs]], which include a wide variety of subgraphs. One important local property of networks are so-called '''network motifs''', which are defined as recurrent and [[statistically significant]]~~ sub-~~graphs or patterns.~~

+

当{{math|G″ ⊂ G}}，且{{math|G″}}与图{{math|G′}}之间存在同构时，该映射表示{{math|G′}}对于{{math|G}}存在。图{{math|G′}}在{{math|G}}的出现次数称为{{math|G′}}出现在{{math|G}}的频率{{math|FG}}。当一个子图的频率{{math|FG}}高于预定的阈值或截止值时，则称{{math|G′}}是{{math|G}}中的递归（或频繁）子图。在接下来的内容中，我们交替使用术语“模式 motifs”和“频繁子图 frequent sub-graph”。

−

所有网络，包括生物网络(biological networks)、社交网络(social networks)、技术网络（例如计算机网络和电路）等，都可以用图的形式来表示，这些图中会包括各种各样的子图(subgraphs)。网络的一个重要的局部性质是所谓的网络模体，即重复且具有统计意义的子图或模式(patterns)。

−

Network motifs are sub-graphs that repeat themselves in a specific network or even among various networks. Each of these sub-graphs, defined by a particular pattern of interactions between vertices, may reflect a framework in which particular functions are achieved efficiently. Indeed, motifs are of notable importance largely because they may reflect functional properties. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks.<ref name="mas1">{{~~cite journal~~ |~~vauthors=Masoudi~~-~~Nejad A, Schreiber F, Razaghi MK Z~~ |~~title=Building Blocks of Biological Networks: A Review on Major Network Motif Discovery Algorithms~~ |~~journal=IET Systems Biology~~ |~~volume=6 |issue=5 |pages=164–74 |year=2012~~|~~doi=10.1049/iet-syb.2011.0011 |pmid=23101871~~ }}<~~/ref~~> ~~Although network motifs may provide a deep insight into the network's functional abilities, their detection is computationally challenging.~~

+

设从与{{math|G}}相关联的零模型 the null-model获得的随机图集合为{{math|Ω(G)}}，我们从{{math|Ω(G)}}中均匀地选择N个随机图，并计算其特定频繁子图的频率。如果{{math|G′}}出现在{{math|G}}的频率高于N个随机图{{math|Ri}}的算术平均频率，其中{{math|1 ≤ i ≤ N}}，我们称此递归模式是有意义的，因此可以将{{math|G′}}视为{{math|G}}的网络模体。对于一个小图{{math|G′}}，网络{{math|G}}和一组随机网络{{math|R(G) ⊆ Ω(R)}}，当{{math|1=R(G) {{=}} N}}时，由其Z分数 Z-score的定义如下式：

−

网络模体(Network motifs)是指在特定网络或各种网络中重复出现的相同的子图。这些子图由顶点之间特定的交互模式定义，一个子图便可以反映一个框架，这个框架可以有效地实现某个特定的功能。事实上，之所以说模体是一个重要的特性，正是因为它们可能反映出对应网络功能的这一性质。近年来这一概念作为揭示复杂网络结构设计原理的一个有用概念而受到了广泛的关注。<~~ref name="mas1"~~>{{~~cite journal~~ |~~vauthors=Masoudi-Nejad A, Schreiber F, Razaghi MK Z~~ |~~title=Building Blocks of Biological Networks: A Review on Major Network Motif Discovery Algorithms~~ |~~journal=IET Systems Biology~~ |~~volume=6~~ |~~issue=5~~ |~~pages=164–74~~ |~~year~~=~~2012|doi~~=~~10.1049/iet-syb.2011.0011 |pmid=23101871~~ }}~~</ref> 但是，虽然通过研究网络模体可以深入了解网络的功能，但是对于模体的检测在计算上是具有挑战性的。~~

−

~~==Definition==~~

−

Let {{math|G {{=}} (V, E)}} and {{math|G′ {{=}} (V′, E′)}} be two graphs. Graph {{math|G′}} is a ''sub-graph'' of graph {{math|G}} (written as {{math|G′ ⊆ G}}) if {{math|V′ ⊆ V}} and {{math|E′ ⊆ E ∩ (V′ × V′)}}. If {{math|G′ ⊆ G}} and {{math|G′}} contains all of the edges {{math|&lang;u, v&rang; ∈ E}} with {{math|u, v ∈ V′}}, then {{math|G′}} is an ''induced sub-graph'' of {{math|G}}. We call {{math|G′}} and {{math|G}} isomorphic (written as {{math|G′ ↔ G}}), if there exists a bijection (one-to-one) {{math|f:V′ → V}} with {{math|&lang;u, v&rang; ∈ E′ ⇔ &lang;f(u), f(v)&rang; ∈ E}} for all {{math|u, v ∈ V′}}. The mapping {{math|f}} is called an isomorphism between {{math|G}} and {{math|G′}}.<ref name="die1">{{cite journal |author=Diestel R |title=Graph Theory (Graduate Texts in Mathematics) |volume=173 |year=2005|publisher=New York: Springer-Verlag Heidelberg}}</ref>

−

~~设{{~~math~~|G {{=}}~~ (~~V, E)}} 和 {{math|~~G&prime~~; {~~{~~=}}~~ (~~V′, E&~~prime;)~~}} 是两个图。若{{math|V′ ⊆ V}}且满足{{math|E′ ⊆ E ∩~~ (~~V′ × V&~~prime;)}~~}）（即图~~{~~{math|G′ ⊆ G}的所有边和点都属于图{{math|G}}）则称图{{math|~~G&prime~~; ⊆ G}是图{{math|G}}的一个子图<ref name="die1">{{cite journal |author=Diestel R |title=Graph Theory (Graduate Texts in Mathematics~~) ~~|volume=173 |year=2005|publisher=New York: Springer-Verlag Heidelberg}~~}</~~ref~~>

+

:<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>

−

若{{math|G′ ~~⊆ G~~}}~~，且对于顶点~~{{math|u}}、{{math|v}}~~及其连边，只要~~{{~~math~~|u}}和{{~~math~~|v}}~~存在于~~{{~~math~~|~~G′~~}}~~（即若~~{{~~math~~|~~U}}~~, ~~{{math~~|V&~~prime; ⊆ V~~}}~~），那么~~{{~~math~~|~~G′ ⊆ G~~}}~~中就应该包含原图~~{{~~math~~|G~~}}中的所有{{math~~|~~u}}和{{math~~|V}}~~的对应连边（即~~{{math|&~~lang;u, v&rang~~; ~~∈ E~~}}~~），则称此时图~~{{math|G′}}~~就是图{{math|G}}的导出子图。~~

+

式中，{{math|μR(G′)}} 和 {{math|σR(G′)}}分别代表集合{{math|R(G)}}中的平均和标准偏差频率。<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> {{math|Z(G′)}}越大，子图{{math|G′}}作为模体的意义也就越大。

−

如果存在一个双射（一对一）{{math|f:V′ → V}}，且对所有{{math|u, v ∈ V′}}都有{{math|&lang;u, v&rang; ∈ E′ ⇔ &lang;f(u), f(v)&rang; ∈ E}} ，则称{{math|G&prime }}是{{math|G}}的同构图（记作：{{math|G′ → G}}），映射f则称为{{math|G}}与{{math|G′}}之间的同构（isomorphism）。[2]

−

~~When {{math|G″ ⊂ G}} and there exists an isomorphism between the sub~~-graph {{math|G″}} and a graph {{math|G′}}, this mapping represents an ''appearance'' of {{math|G′}} in {{math|G}}. The number of appearances of graph {{math|G′}} in {{math|G}} is called the frequency {{math|FG~~}} of {{math|~~G′~~}} in {{math|G}}. A graph is called ''recurrent'' (or ''frequent''~~) ~~in {{math|G}}, when its ''frequency'' {{math|~~FG(G′)}} is above a predefined threshold or cut-off value. We use terms ''pattern'' and ''frequent sub-graph'' in this review interchangeably. There is an [[Statistical ensemble (mathematical physics)|ensemble]] {{math|Ω(G)}} of random graphs corresponding to the [[Null model|null-~~model]] associated to~~ {{math|G}}. We should choose {{math|N}} random graphs uniformly from {{math|Ω(G)}} and calculate the frequency for a particular frequent sub-graph {{math|G′}} in {{math|G}}. If the frequency of {{math|G′}} in {{math|G}} is higher than its arithmetic mean frequency in {{math|N}} random graphs {{math|Ri~~}}, where {{math|1 ≤ i ≤ N}}, we call this recurrent pattern ''significant'' and hence treat {{math|~~G′}} ~~as a ''network motif'' for {{math|G}}. For a small graph~~ {{math|G′}}~~, the network {{math|G}} and a set of randomized networks {{math|R(G) ⊆ Ω(R)}}, where {{math|1~~=~~R(G) {{=}} N}}, the ''Z-Score'' that has been defined by the following formula:~~

+

此外还可以使用统计假设检验中的另一个测量方法，可以作为模体检测中的一种方法，即P值 P-value，以 {{math|FR(G′) ≥ FG(G′)}}的概率给出（作为其零假设 null-hypothesis），其中{{math|FR(G′)}}表示随机网络中{{math|G′}}的频率。<ref name="sch1" /> 当P值小于阈值（通常为0.01或0.05）时，该子图可以被称为显著模式。该P值定义为：

−

~~<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>~~

−

~~当{{~~math|G~~″ ⊂ G}~~}~~，且{~~{~~math|G″~~}~~}与图{~~{~~math|G′~~}~~}之间存在同构时，该映射表示{{math|G&prime~~;~~}}对于{{math|G}}存在。图{{math|~~G&prime~~;}}在{{math|~~G~~}}的出现次数称为{{math|G&~~prime~~;}}出现在{{math|G}}的频率{{math|FG~~</~~sub>}}。当一个子图的频率{{~~math~~|FG</sub~~>~~}}高于预定的阈值或截止值时，则称{{math|G′}}是{{math|G}}中的递归（或频繁）子图。~~

+

:<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>

−

~~在接下来的内容中，我们交替使用术语“模式（motifs）”和“频繁子图（frequent~~ sub-~~graph）”。~~

+

[[File:Different occurrences of a sub-graph in a graph.jpg|thumb|图1 ''图中子图的不同出现''. (M1 – M4) 是图（a）中子图（b）的不同出现。对于频率概念{{math|F1}},集合M1，M2，M3，M4表示所有匹配项，因此{{math|F1 {{=}} 4}}。对于{{math|F2}}，两个集合M1，M4或M2，M3之一是可能的匹配，{{math|F2 {{=}} 2}}。最后，对于频率概念{{math|F3}}仅允许匹配项之一（M1至M4），因此{{math|F3 {{=}} 1}}。随着网元的使用受到限制，这三个频率概念的频率降低。]]

−

设从与{{math|G}}相关联的零模型（the null-model）获得的随机图集合为{{math|Ω(G)}}，我们从{{math|Ω(G)}}中均匀地选择N个随机图，并计算其特定频繁子图的频率。如果{{math|G′}}出现在{{math|G}}的频率高于N个随机图Ri的算术平均频率，其中{{math|1 ≤ i ≤ N}}，我们称此递归模式是有意义的，因此可以将{{math|G′}}视为{{math|G}}的网络模体。

−

~~对于一个小图~~{{math|~~G′~~}}~~，网络~~{{math|G}}~~和一组随机网络~~{{math|R(G) ~~⊆ Ω~~(R)}}，当{{math|1=~~R(G)~~ {{=}} N}}~~时，由其Z分数（Z-score）的定义如下式：~~

+

其中{{math|N}}表示随机网络的数目，{{math|i}}定义在随机网络的集合上，若条件{{math|c(i)}}成立，则Kroneckerδ函数{{math|δ(c(i))}}是1。在网络{{math|G}}中，一个特定的n维子图{{math|N′}}的集中度<ref name="kas1">{{cite journal |vauthors=Kashtan N, Itzkovitz S, Milo R, Alon U |title=Efficient sampling algorithm for estimating sub-graph concentrations and detecting network motifs |journal=Bioinformatics |year=2004 |volume=20 |issue=11 |pages=1746–1758 |doi=10.1093/bioinformatics/bth163|pmid=15001476 |doi-access=free }}</ref><ref name="wer1">{{cite journal |author=Wernicke S |title=Efficient detection of network motifs |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |year=2006 |volume=3 |issue=4 |pages=347–359 |doi=10.1109/tcbb.2006.51|pmid=17085844 |citeseerx=10.1.1.304.2576 }}</ref>是指子图在网络中出现频率与n维非同构子图的总频率之比，其计算公式如下：

−

~~<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>~~

−

where {{math|μR(G′)}} and {{math|σR(G′)}} stand for mean and standard deviation frequency in set {{math|R(G)}}, respectively.<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> The larger the {{math|Z(G′)}}, the more significant is the sub-graph {{math|G′}} as a motif. Alternatively, another measurement in statistical hypothesis testing that can be considered in motif detection is the P-Value, given as the probability of {{math|FR(G′) ≥ FG(G′)}} (as its null-hypothesis), where {{math|FR(G′)}} indicates the frequency of G' in a randomized network.<ref name="sch1" /> A sub-graph with P-value less than a threshold (commonly 0.01 or 0.05) will be treated as a significant pattern. The P-value is defined as

+

:<math>C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)}</math>

−

~~<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>~~

−

式中，{{math|μR(G′)}} 和 {{math|σR(G′)}}分别代表集合{{math|R(G)}}中的平均和标准偏差频率。.<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> {{math|Z(G′)}}越大，子图{{math|G′}}作为模体的意义也就越大。

−

此外还可以使用统计假设检验中的另一个测量方法，可以作为模体检测中的一种方法，即P值（P-value），以 {{math|FR(G′) ≥ FG(G′)}}的概率给出（作为其零假设null-hypothesis），其中{{math|FR(G′)}}表示随机网络中{{math|G′}}的频率。<ref name="sch1" /> 当P值小于阈值（通常为0.01或0.05）时，该子图可以被称为显著模式。该P值定义为：

−

~~<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>~~

−

[[File:Different occurrences of a sub-graph in a graph.jpg|thumb|''Different occurrences of a sub-graph in a graph''. (M1 – M4) are different occurrences of sub-graph (b) in graph (a). For frequency concept {{math|F1}}, the set M1, M2, M3, M4 represent all matches, so {{math|F1 {{=}} 4}}. For {{math|F2}}, one of the two set M1, M4 or M2, M3 are possible matches, {{math|F2 {{=}} 2}}. Finally, for frequency concept {{math|F3}}, merely one of the matches (M1 to M4) is allowed, therefore {{math|F3 {{=}} 1}}. The frequency of these three frequency concepts decrease as the usage of network elements are restricted.]]

−

Where {{math|N}} indicates number of randomized networks, {{math|i}} is defined over an ensemble of randomized networks and the Kronecker delta function {{math|δ(c(i))}} is one if the condition {{math|c(i)}} holds. The concentration <ref name="kas1">{{cite journal |vauthors=Kashtan N, Itzkovitz S, Milo R, Alon U |title=Efficient sampling algorithm for estimating sub-graph concentrations and detecting network motifs |journal=Bioinformatics |year=2004 |volume=20 |issue=11 |pages=1746–1758 |doi=10.1093/bioinformatics/bth163|pmid=15001476 |doi-access=free }}</ref><ref name="wer1">{{cite journal |author=Wernicke S |title=Efficient detection of network motifs |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |year=2006 |volume=3 |issue=4 |pages=347–359 |doi=10.1109/tcbb.2006.51|pmid=17085844 |citeseerx=10.1.1.304.2576 }}</ref> of a particular n-size sub-graph {{math|G′}} in network {{math|G}} refers to the ratio of the sub-graph appearance in the network to the total ''n''-size non-isomorphic sub-graphs’ frequencies, which is formulated by

−

<math>C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)}</math>

where index {{math|i}} is defined over the set of all non-isomorphic n-size graphs. Another statistical measurement is defined for evaluating network motifs, but it is rarely used in known algorithms. This measurement is introduced by Picard ''et al.'' in 2008 and used the Poisson distribution, rather than the Gaussian normal distribution that is implicitly being used above.<ref name="pic1">{{cite journal |vauthors=Picard F, Daudin JJ, Schbath S, Robin S |title=Assessing the Exceptionality of Network Motifs |journal=J. Comp. Bio. |year=2005 |volume=15 |issue=1 |pages=1–20|doi=10.1089/cmb.2007.0137 |pmid=18257674 |citeseerx=10.1.1.475.4300 }}</ref>

−

其中索引 i 定义在所有非同构 n ~~大小图的集合上。~~ 另一种统计测量是用来评估网络主题的，但在已知的算法中很少使用。这种测量方法是由 Picard ~~等人在2008年提出的，使用的是泊松分佈分布，而不是上面隐含使用的高斯正态分布。~~<ref name="pic1">{{cite journal |vauthors=Picard F, Daudin JJ, Schbath S, Robin S |title=Assessing the Exceptionality of Network Motifs |journal=J. Comp. Bio. |year=2005 |volume=15 |issue=1 |pages=1–20|doi=10.1089/cmb.2007.0137 |pmid=18257674 |citeseerx=10.1.1.475.4300 }}</ref>其中{{math|N}}表示随机网络的数目，{{math|i}}定义在随机网络的集合上，若条件{{math|c(i)}}成立，则Kroneckerδ函数{{math|δ(c(i))}}是1。在网络{{math|G}}中，一个特定的n维子图{{math|N′}}的集中度是指子图在网络中出现频率与n维非同构子图的总频率之比，其计算公式如下：

+

其中索引 {{math|i}} 定义在所有非同构 n 尺寸图的集合上。另一种统计测量是用来评估网络主题的，但在已知的算法中很少使用。这种测量方法是由 Picard 等人在2008年提出的，使用的是泊松分布，而不是上面隐含使用的高斯正态分布。<ref name="pic1">{{cite journal |vauthors=Picard F, Daudin JJ, Schbath S, Robin S |title=Assessing the Exceptionality of Network Motifs |journal=J. Comp. Bio. |year=2005 |volume=15 |issue=1 |pages=1–20|doi=10.1089/cmb.2007.0137 |pmid=18257674 |citeseerx=10.1.1.475.4300 }}</ref>

−

~~<math>C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)}</math>~~

−

In addition, three specific concepts of sub-graph frequency have been proposed.<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> As the figure illustrates, the first frequency concept {{math|F1}} considers all matches of a graph in original network. This definition is similar to what we have introduced above. The second concept {{math|F2}} is defined as the maximum number of edge-disjoint instances of a given graph in original network. And finally, the frequency concept {{math|F3}} entails matches with disjoint edges and nodes. Therefore, the two concepts {{math|F2}} and {{math|F3}} restrict the usage of elements of the graph, and as can be inferred, the frequency of a sub-graph declines by imposing restrictions on network element usage. As a result, a network motif detection algorithm would pass over more candidate sub-graphs if we insist on frequency concepts {{math|F2}} and {{math|F3}}.

−

此外，他们还提出了子图频率的三个具体概念。<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> 如图所示，第一频率概念 {{math|F1}}考虑原始网络中图的所有匹配，这与我们前面介绍过的类似。第二个概念{{math|F2}}定义为原始网络中给定图的最大不相交边的数量。最后，频率概念{{math|F3}}包含与不相交边（disjoint edges）和节点的匹配。因此，两个概念F2和F3限制了图元素的使用，并且可以看出，通过对网络元素的使用施加限制，子图的频率下降。因此，如果我们坚持使用频率概念{{math|F2}}和{{math|F3}}，网络模体检测算法将可以筛选出更多的候选子图。

−

~~==History==~~

+

:<math>C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)}</math>

−

The study of network motifs was pioneered by Holland and Leinhardt<ref>Holland, P. W., & Leinhardt, S. (1974). The statistical analysis of local structure in social networks. Working Paper No. 44, National Bureau of Economic Research.</ref><ref>Hollandi, P., & Leinhardt, S. (1975). The Statistical Analysis of Local. Structure in Social Networks. Sociological Methodology, David Heise, ed. San Francisco: ~~Josey-Bass.</ref>~~<~~ref~~> ~~Holland, P. W., & Leinhardt, S.~~ (~~1976~~). Local structure in social networks. Sociological methodology, 7, 1-45.</ref><ref>Holland, P. W., & Leinhardt, S. (1977). A method for detecting structure in sociometric data. In Social Networks (pp. 411-432). Academic Press.</ref> who introduced the concept of a triad census of networks. They introduced methods to enumerate various types of subgraph configurations, and test whether the subgraph counts are statistically different from those expected in random networks.

−

霍兰（Holland）和莱因哈特（Leinhardt）率先提出了'''网络三合会普查'''（a triad census of networks）的概念，开创了网络模体研究的先河。<ref>Holland, P. W., & Leinhardt, S. (1974). The statistical analysis of local structure in social networks. Working Paper No. 44, National Bureau of Economic Research.</ref><ref>Hollandi, P., & Leinhardt, S. (1975). The Statistical Analysis of Local. Structure in Social Networks. Sociological Methodology, David Heise, ed. San Francisco: Josey-Bass.</ref><ref> Holland, P. W., & Leinhardt, S. (1976). Local structure in social networks. Sociological methodology, 7, 1-45.</ref><ref>Holland, P. W., & Leinhardt, S. (~~1977~~)~~. A method for detecting structure in sociometric data. In Social Networks~~ (~~pp. 411-432~~)~~. Academic Press.~~</~~ref~~> ~~他们介绍了枚举各种子图配置的方法，并测试子图计数是否与随机网络中的期望值存在统计学上的差异。~~

−

~~这里对于'''网络三合会普查'''（a triad census~~ of ~~networks）这一概念的翻译存疑~~

+

此外，他们还提出了子图频率的三个具体概念。<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> 如图所示，第一频率概念 {{math|F1}}考虑原始网络中图的所有匹配，这与我们前面介绍过的类似。第二个概念{{math|F2}}定义为原始网络中给定图的最大不相交边的数量。最后，频率概念{{math|F3}}包含与不相交边 disjoint edges和节点的匹配。因此，两个概念{{math|F2}}和{{math|F3}}限制了图元素的使用，并且可以看出，通过对网络元素的使用施加限制，子图的频率下降。因此，如果坚持使用频率概念{{math|F2}}和{{math|F3}}，网络模体检测算法将可以筛选出更多的候选子图。

−

~~This idea was further generalized in 2002 by [[Uri Alon]] and his group <ref name~~=~~"she1">{{cite journal |vauthors~~=~~Shen-Orr SS, Milo R, Mangan S, Alon U |title~~=~~Network motifs in the transcriptional regulation network of ''Escherichia coli'' |journal~~=Nat. Genet. |volume=31 |issue=1 |pages=64–8 |date=May 2002 |pmid=11967538 |doi=10.1038/ng881}}</ref> when network motifs were discovered in the gene regulation (transcription) network of the bacteria ''[[Escherichia coli|E. coli]]'' and then in a large set of natural networks. Since then, a considerable number of studies have been conducted on the subject. Some of these studies focus on the biological applications, while others focus on the computational theory of network motifs.

+

==历史==

−

2002年，Uri Alon和他的团队[17]在大肠杆菌的基因调控（gene regulation network）（转录 transcription）网络中发现了网络模体，随后在大量的自然网络中也发现了网络模体，从而进一步推广了这一观点。自那时起，许多科学家都对这一问题进行了大量的研究。其中一些研究集中在生物学应用上，而另一些则集中在网络模体的计算理论上。<ref ~~name="she1"~~>~~{{cite journal |vauthors=Shen-Orr SS~~, ~~Milo R~~, ~~Mangan~~ S, ~~Alon U |title=Network motifs~~ in ~~the transcriptional regulation network of ''Escherichia coli'' |journal=Nat~~. ~~Genet~~. ~~|volume=31 |issue=~~1 ~~|pages=64–8 |date=May 2002 |pmid=11967538 |doi=10~~.~~1038~~/~~ng881}}~~</ref>

+

霍兰 Holland和莱因哈特 Leinhardt率先提出了'''网络三合会普查 a triad census of networks'''的概念，开创了网络模体研究的先河。<ref>Holland, P. W., & Leinhardt, S. (1974). The statistical analysis of local structure in social networks. Working Paper No. 44, National Bureau of Economic Research.</ref><ref>Hollandi, P., & Leinhardt, S. (1975). The Statistical Analysis of Local. Structure in Social Networks. Sociological Methodology, David Heise, ed. San Francisco: Josey-Bass.</ref><ref> Holland, P. W., & Leinhardt, S. (1976). Local structure in social networks. Sociological methodology, 7, 1-45.</ref><ref>Holland, P. W., & Leinhardt, S. (1977). A method for detecting structure in sociometric data. In Social Networks (pp. 411-432). Academic Press.</ref> 他们介绍了枚举各种子图配置的方法，并测试子图计数是否与随机网络中的期望值存在统计学上的差异。

−

~~The biological studies endeavor to interpret the motifs detected for biological networks. For example, in work following,~~<ref name="she1~~" /> the network motifs found in ''[[Escherichia coli|E. coli]]'' were discovered in the transcription networks of other bacteria<ref name="eic1~~">{{cite journal |vauthors=~~Eichenberger P~~, Fujita M, Jensen ST, etal |title=The program of gene transcription for a single differentiating cell type during sporulation in ''Bacillus subtilis'' |journal=PLOS Biology |volume=2 |issue=10 |pages=e328 |date=October 2004 |pmid=15383836 |pmc=517825 |doi=10.1371/journal.pbio.0020328 }} </ref> as well as yeast<ref name="mil3">{{cite journal |vauthors=Milo R, ~~Shen-Orr~~ S~~, Itzkovitz S, Kashtan N, Chklovskii D~~, Alon U |title=Network motifs: simple building blocks of complex networks |journal=Science |volume=298 |issue=5594 |pages=824–7 |date=October 2002 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="lee1">{{cite journal |vauthors=Lee TI, Rinaldi NJ, Robert F, etal |title=Transcriptional regulatory networks in Saccharomyces cerevisiae |journal=Science |volume=298 |issue=5594 |pages=799–804 |date=October 2002 |pmid=12399584 |doi=10.1126/science.1075090 |bibcode=2002Sci...298..799L }}</ref> and higher organisms.<ref name="odo1">{{cite journal |vauthors=Odom DT, Zizlsperger N, Gordon DB, etal |title=Control of pancreas and liver gene expression by HNF transcription factors |journal=Science |volume=303 |issue=5662 |pages=1378–81 |date=February 2004 |pmid=14988562 |pmc=3012624 |doi=10.1126/science.1089769 |bibcode=2004Sci...303.1378O }}</ref><ref name="boy1">{{cite journal |vauthors=Boyer LA, Lee TI, Cole MF, etal |title=Core transcriptional regulatory circuitry in human embryonic stem cells |journal=Cell |volume=122 |issue=6 |pages=947–56 |date=September 2005 |pmid=16153702 |pmc=3006442 |doi=10.1016/j.cell.2005.08.020 }}</ref><ref name="ira1">{{cite journal |vauthors=Iranfar N, Fuller D, Loomis WF |title=Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBF and LagC |journal=Dev. Biol. |volume=290 |issue=2 |pages=460–9 |date=February 2006 |pmid=16386729 |doi=10.1016/j.ydbio.2005.11.035 |doi-access=free }}</ref> A distinct set of network ~~motifs were identified in other types~~ of ~~biological networks such as neuronal networks and protein interaction networks.<ref name="mil2" /><ref name="maa1">{{cite journal |vauthors=Ma~~'~~ayan A, Jenkins SL, Neves S, etal |title=Formation of regulatory patterns during signal propagation in a Mammalian cellular network~~ |journal=~~Science |volume=309 |issue=5737 |pages=1078–83 |date=August 2005 |pmid=16099987 |pmc=3032439 |doi=10.1126/science.1108876 |bibcode=2005Sci..~~.~~309~~.~~1078M }}</ref><ref name="pta1">{{cite journal |vauthors=Ptacek J, Devgan G, Michaud G, etal |title=Global analysis of protein phosphorylation in yeast |journal=Nature~~ |volume=~~438~~ |issue=~~7068~~ |pages=~~679–84~~ |date=~~December 2005~~ |pmid=~~16319894~~ |doi=10.1038/~~nature04187|bibcode=2005Natur.438..679P |url=https://authors.library.caltech.edu/56271/2/Tables.pdf |type=Submitted manuscript~~ }}</ref>

+

2002年，Uri Alon和他的团队<ref name="she1">{{cite journal |vauthors=Shen-Orr SS, Milo R, Mangan S, Alon U |title=Network motifs in the transcriptional regulation network of ''Escherichia coli'' |journal=Nat. Genet. |volume=31 |issue=1 |pages=64–8 |date=May 2002 |pmid=11967538 |doi=10.1038/ng881}}</ref>在大肠杆菌的基因调控 gene regulation network （转录 transcription）网络中发现了网络模体，随后在大量的自然网络中也发现了网络模体，从而进一步推广了这一观点。自那时起，许多科学家都对这一问题进行了大量的研究。其中一些研究集中在生物学应用上，而另一些则集中在网络模体的计算理论上。

−

生物学研究试图解释为生物网络检测到的模体。例如，在接下来的工作中，文献[17]在大肠杆菌中发现的网络模体存在于其他细菌<ref name="eic1">{{cite journal |vauthors=Eichenberger P, Fujita M, Jensen ST, etal |title=The program of gene transcription for a single differentiating cell type during sporulation in ''Bacillus subtilis'' |journal=PLOS Biology |volume=2 |issue=10 |pages=e328 |date=October 2004 |pmid=15383836 |pmc=517825 |doi=10.1371/journal.pbio.0020328 }} </ref>以及酵母<ref name="mil3">{{cite journal |vauthors=Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |journal=Science |volume=298 |issue=5594 |pages=824–7 |date=October 2002 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="lee1">{{cite journal |vauthors=Lee TI, Rinaldi NJ, Robert F, etal |title=Transcriptional regulatory networks in Saccharomyces cerevisiae |journal=Science |volume=298 |issue=5594 |pages=799–804 |date=October 2002 |pmid=12399584 |doi=10.1126/science.1075090 |bibcode=2002Sci...298..799L }}</ref>和高等生物的转录网络中。文献<ref name="odo1">{{cite journal |vauthors=Odom DT, Zizlsperger N, Gordon DB, etal |title=Control of pancreas and liver gene expression by HNF transcription factors |journal=Science |volume=303 |issue=5662 |pages=1378–81 |date=February 2004 |pmid=14988562 |pmc=3012624 |doi=10.1126/science.1089769 |bibcode=2004Sci...303.1378O }}</ref><ref name="boy1">{{cite journal |vauthors=Boyer LA, Lee TI, Cole MF, etal |title=Core transcriptional regulatory circuitry in human embryonic stem cells |journal=Cell |volume=122 |issue=6 |pages=947–56 |date=September 2005 |pmid=16153702 |pmc=3006442 |doi=10.1016/j.cell.2005.08.020 }}</ref><ref name="ira1">{{cite journal |vauthors=Iranfar N, Fuller D, Loomis WF |title=Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBF and LagC |journal=Dev. Biol. |volume=290 |issue=2 |pages=460–9 |date=February 2006 |pmid=16386729 |doi=10.1016/j.ydbio.2005.11.035 |doi-access=free }}</ref>在其他类型的生物网络中发现了一组不同的网络模体，如神经元网络和蛋白质相互作用网络。<ref name="mil2" /><ref name="maa1">{{cite journal |vauthors=Ma'ayan A, Jenkins SL, Neves S, etal |title=Formation of regulatory patterns during signal propagation in a Mammalian cellular network |journal=Science |volume=309 |issue=5737 |pages=1078–83 |date=August 2005 |pmid=16099987 |pmc=3032439 |doi=10.1126/science.1108876 |bibcode=2005Sci...309.1078M }}</ref><ref name="pta1">{{cite journal |vauthors=Ptacek J, Devgan G, Michaud G, etal |title=Global analysis of protein phosphorylation in yeast |journal=Nature |volume=438 |issue=7068 |pages=679–84 |date=December 2005 |pmid=16319894 |doi=10.1038/nature04187|bibcode=2005Natur.438..679P |url=https://authors.library.caltech.edu/56271/2/Tables.pdf |type=Submitted manuscript }}</ref>

+

生物学研究试图解释为生物网络检测到的模体。例如，在接下来的工作中，<ref name="she1" />在大肠杆菌中发现的网络模体存在于其他细菌<ref name="eic1">{{cite journal |vauthors=Eichenberger P, Fujita M, Jensen ST, etal |title=The program of gene transcription for a single differentiating cell type during sporulation in ''Bacillus subtilis'' |journal=PLOS Biology |volume=2 |issue=10 |pages=e328 |date=October 2004 |pmid=15383836 |pmc=517825 |doi=10.1371/journal.pbio.0020328 }} </ref>以及酵母<ref name="mil3">{{cite journal |vauthors=Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |journal=Science |volume=298 |issue=5594 |pages=824–7 |date=October 2002 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="lee1">{{cite journal |vauthors=Lee TI, Rinaldi NJ, Robert F, etal |title=Transcriptional regulatory networks in Saccharomyces cerevisiae |journal=Science |volume=298 |issue=5594 |pages=799–804 |date=October 2002 |pmid=12399584 |doi=10.1126/science.1075090 |bibcode=2002Sci...298..799L }}</ref>和高等生物的转录网络中。<ref name="odo1">{{cite journal |vauthors=Odom DT, Zizlsperger N, Gordon DB, etal |title=Control of pancreas and liver gene expression by HNF transcription factors |journal=Science |volume=303 |issue=5662 |pages=1378–81 |date=February 2004 |pmid=14988562 |pmc=3012624 |doi=10.1126/science.1089769 |bibcode=2004Sci...303.1378O }}</ref><ref name="boy1">{{cite journal |vauthors=Boyer LA, Lee TI, Cole MF, etal |title=Core transcriptional regulatory circuitry in human embryonic stem cells |journal=Cell |volume=122 |issue=6 |pages=947–56 |date=September 2005 |pmid=16153702 |pmc=3006442 |doi=10.1016/j.cell.2005.08.020 }}</ref><ref name="ira1">{{cite journal |vauthors=Iranfar N, Fuller D, Loomis WF |title=Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBF and LagC |journal=Dev. Biol. |volume=290 |issue=2 |pages=460–9 |date=February 2006 |pmid=16386729 |doi=10.1016/j.ydbio.2005.11.035 |doi-access=free }}</ref>在其他类型的生物网络中发现了一组不同的网络模体，如神经元网络和蛋白质相互作用网络。<ref name="mil2" /><ref name="maa1">{{cite journal |vauthors=Ma'ayan A, Jenkins SL, Neves S, etal |title=Formation of regulatory patterns during signal propagation in a Mammalian cellular network |journal=Science |volume=309 |issue=5737 |pages=1078–83 |date=August 2005 |pmid=16099987 |pmc=3032439 |doi=10.1126/science.1108876 |bibcode=2005Sci...309.1078M }}</ref><ref name="pta1">{{cite journal |vauthors=Ptacek J, Devgan G, Michaud G, etal |title=Global analysis of protein phosphorylation in yeast |journal=Nature |volume=438 |issue=7068 |pages=679–84 |date=December 2005 |pmid=16319894 |doi=10.1038/nature04187|bibcode=2005Natur.438..679P |url=https://authors.library.caltech.edu/56271/2/Tables.pdf |type=Submitted manuscript }}</ref>

−

The computational research has focused on improving existing motif detection tools to assist the biological investigations and allow larger networks to be analyzed. Several different algorithms have been provided so far, which are elaborated in the next section in chronological order.

另一方面，对于计算研究的重点则是改进现有的模体检测工具，以协助生物学研究，并允许对更大的网络进行分析。到目前为止，已经提供了几种不同的算法，这些算法将在下一节按时间顺序进行阐述。

−

~~Most recently, the acc-MOTIF tool to detect network motifs was released.<ref>{{Cite web | url=http://www.ft.unicamp.br/docentes/meira/accmotifs/ |title = Acc-Motif: Accelerated Motif Detection}}</ref>~~

最近，还发布了用于检测网络基序的acc基序工具。<ref>{{Cite web | url=http://www.ft.unicamp.br/docentes/meira/accmotifs/ |title = Acc-Motif: Accelerated Motif Detection}}</ref>

−

==模体发现算法 ~~Motif discovery algorithms~~==

+

==模体发现算法==

−

Various solutions have been proposed for the challenging problem of motif discovery. These algorithms can be classified under various paradigms such as exact counting methods, sampling methods, pattern growth methods and so on. However, motif discovery problem comprises two main steps: first, calculating the number of occurrences of a sub-graph and then, evaluating the sub-graph significance. The recurrence is significant if it is detectably far more than expected. Roughly speaking, the expected number of appearances of a sub-graph can be determined by a Null-model, which is defined by an ensemble of random networks with some of the same properties as the original network.

针对模体发现这一问题存在多种解决方案。这些算法可以归纳为不同的范式：例如精确计数方法，采样方法，模式增长方法等。但模体发现问题包括两个主要步骤：首先，计算子图的出现次数，然后评估子图的重要性。如果检测到的重现性远超过预期，那么这种重现性是很显著的。粗略地讲，子图的预期出现次数可以由'''零模型 Null-model''' 确定，该模型定义为具有与原始网络某些属性相同的随机网络的集合。

−

~~Here, a review on computational aspects of major algorithms is given and their related benefits and drawbacks from an algorithmic perspective are discussed.~~

接下来，对下列算法的计算原理进行简要回顾，并从算法的角度讨论了它们的优缺点。

===mfinder 算法===

−

''mfinder'', the first motif-mining tool, implements two kinds of motif finding algorithms: a full enumeration and a sampling method. Until 2004, the only exact counting method for NM (network motif) detection was the brute-force one proposed by Milo ''et al.''.<ref name="mil1" /> This algorithm was successful for discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible. Hence, a new approach to this problem was needed.

'''mfinder'''是第一个模体挖掘工具，它主要有两种模体查找算法：完全枚举 full enumeration 和采样方法 sampling method。直到2004年，用于NM（'''网络模体 networkmotif'''）检测的唯一精确计数方法是'''Milo'''等人提出的暴力穷举方法。<ref name="mil1" />该算法成功地发现了小规模的模体，但是这种方法甚至对于发现规模为5个或6个的模体在计算上都不可行的。因此，需要一种解决该问题的新方法。

−

~~Kashtan ''et al.''~~ <ref name="kas1" /> presented the first sampling NM discovery algorithm, which was based on ''edge sampling'' throughout the network. This algorithm estimates concentrations of induced sub-graphs and can be utilized for motif discovery in directed or undirected networks. The sampling procedure of the algorithm starts from an arbitrary edge of the network that leads to a sub-graph of size two, and then expands the sub-graph by choosing a random edge that is incident to the current sub-graph. After that, it continues choosing random neighboring edges until a sub-graph of size n is obtained. Finally, the sampled sub-graph is expanded to include all of the edges that exist in the network between these n nodes. When an algorithm uses a sampling approach, taking unbiased samples is the most important issue that the algorithm might address. The sampling procedure, however, does not take samples uniformly and therefore Kashtan ''et al.'' proposed a weighting scheme that assigns different weights to the different sub-graphs within network.<ref name="kas1" /> The underlying principle of weight allocation is exploiting the information of the [[sampling probability]] for each sub-graph, i.e. the probable sub-graphs will obtain comparatively less weights in comparison to the improbable sub-graphs; hence, the algorithm must calculate the sampling probability of each sub-graph that has been sampled. This weighting technique assists ''mfinder'' to determine sub-graph concentrations impartially.

+

Kashtan等人<ref name="kas1" />首次提出了一种基于边缘采样的网络模体采样发现算法。该算法估计了所含子图的集中度，可用于有向或无向网络中的模体发现。该算法的采样过程从网络的任意一条边开始，该边连向大小为2的子图，然后选择一条与当前子图相关的随机边对子图进行扩展。之后，它将继续选择随机的相邻边，直到获得大小为n的子图为止。最后，采样得到的子图扩展为包括这n个节点在内的网络中存在的所有边。当使用采样方法时，获取无偏样本是这类算法可能面临的最重要问题。而且，采样过程并不能保证采到所有的样本（也就是不能保证得到所有的子图），因此，Kashtan 等人又提出了一种加权方案，为网络中的不同子图分配不同的权重。<ref name="kas1" /> 权重分配的基本原理是利用每个子图的抽样概率信息，即，与不可能的子图相比，可能的子图将获得相对较少的权重；因此，该算法必须计算已采样的每个子图的采样概率。这种加权技术有助于mfinder公正地确定子图的集中度。

−

'''Kashtan''' 等人<ref name="kas1" />首次提出了一种基于边缘采样的网络模体（NM）采样发现算法。该算法估计了所含子图 induced sub-graphs 的集中度 concentrations ，可用于有向或无向网络中的模体发现。该算法的采样过程从网络的任意一条边开始，该边连向大小为2的子图，然后选择一条与当前子图相关的随机边对子图进行扩展。之后，它将继续选择随机的相邻边，直到获得大小为n的子图为止。最后，采样得到的子图扩展为包括这n个节点在内的网络中存在的所有边。当使用采样方法时，获取无偏样本是这类算法可能面临的最重要问题。而且，采样过程并不能保证采到所有的样本（也就是不能保证得到所有的子图，译者注），因此，Kashtan 等人又提出了一种加权方案，为网络中的不同子图分配不同的权重。<ref name="kas1" /> 权重分配的基本原理是利用每个子图的抽样概率信息，即，与不可能的子图相比，可能的子图将获得相对较少的权重；因此，该算法必须计算已采样的每个子图的采样概率。这种加权技术有助于mfinder公正地确定子图的集中度 concentrations 。

−

In expanded to include sharp contrast to exhaustive search, the computational time of the algorithm surprisingly is asymptotically independent of the network size. An analysis of the computational time of the algorithm has shown that it takes {{math|O(nn)}} for each sample of a sub-graph of size {{math|n}} from the network. On the other hand, there is no analysis in <ref name="kas1" /> on the classification time of sampled sub-graphs that requires solving the ''graph isomorphism'' problem for each sub-graph sample. Additionally, an extra computational effort is imposed on the algorithm by the sub-graph weight calculation. But it is unavoidable to say that the algorithm may sample the same sub-graph multiple times – spending time without gathering any information.<ref name="wer1" /> In conclusion, by taking the advantages of sampling, the algorithm performs more efficiently than an exhaustive search algorithm; however, it only determines sub-graphs concentrations approximately. This algorithm can find motifs up to size 6 because of its main implementation, and as result it gives the most significant motif, not all the others too. Also, it is necessary to mention that this tool has no option of visual presentation. The sampling algorithm is shown briefly:

−

与穷举搜索形成鲜明对比的是，该算法的计算时间竟然与网络大小渐近无关。对算法时间复杂度的分析表明，对于网络中大小为n的子图的每个样本，它的时间复杂度为<math>O(n^n)</math>。另一方面，并没有对已采样子图的每一个子图样本判断图同构问题的分类时间进行分析<ref name="kas1" />。另外，子图权重计算将额外增加该算法的计算负担。但是不得不指出的是，该算法可能会多次采样相同的子图——花费时间而不收集任何有用信息。<ref name="wer1" />总之，通过利用采样的优势，该算法的性能比穷举搜索算法更有效；但是，它只能大致确定子图的集中度 concentrations 。由于该算法的实现方式，使得它可以找到最大为6的模体，并且它会给出的最重要的模体，而不是其他所有模体。另外，有必要提到此工具没有可视化的呈现。采样算法简要显示如下：

−

~~{| class="wikitable"~~

−

|-

−

~~! mfinder~~

−

|-

−

~~| '''Definitions:''' {{math|Es}}is the set of picked edges. {{math|Vs}} is the set of all nodes that are touched by the edges in {{math|E}}.~~

−

|-

−

~~| Init {{math|Vs}} and {{math|Es}} to be empty sets.~~

−

~~1. Pick a random edge {{math|e1 {{=}} (vi, vj)}}. Update {{math|Es {{=}} {e1}}}, {{math|Vs {{=}} {vi, vj}}}~~

−

~~2. Make a list {{math|L}} of all neighbor edges of {{math|Es}}. Omit from {{math|L}} all edges between members of {{math|Vs}}.~~

−

3. Pick a random edge {{math|e {{=}} {vk,vl}}} from {{math|L}}. Update {{math|Es {{=}} Es ⋃ {e}}}, {{math|Vs {{=}} Vs ⋃ {vk, vl}}}.

−

~~4. Repeat steps 2-3 until completing an ''n''-node subgraph (until {{math|{{!}}Vs{{!}} {{=}} n}}).~~

−

~~5. Calculate the probability to sample the picked ''~~n~~''-node subgraph.~~

+

与穷举搜索形成鲜明对比的是，该算法的计算时间竟然与网络大小渐近无关。对算法时间复杂度的分析表明，对于网络中大小为n的子图的每个样本，它的时间复杂度为<math>O(n^n)</math>。另一方面，并没有对已采样子图的每一个子图样本判断图同构问题的分类时间进行分析<ref name="kas1" />。另外，子图权重计算将额外增加该算法的计算负担。但是不得不指出的是，该算法可能会多次采样相同的子图——花费时间而不收集任何有用信息。<ref name="wer1" />总之，通过利用采样的优势，该算法的性能比穷举搜索算法更有效；但是，它只能大致确定子图的集中度。由于该算法的实现方式，使得它可以找到最大为6的模体，并且它会给出的最重要的模体，而不是其他所有模体。另外，有必要提到此工具没有可视化的呈现。采样算法简要显示如下：

−

|}

−

{|class="wikitable"

+

{|class="wikitable" align="center"

−

|+ mfinder

+

|-

+

!mfinder

|-

−

~~!rowspan="1"~~|定义：<math>E_{s}</math>是采集的边集合。<math>V_{s}</math>是<math>E</math>中所有边的顶点的集合。

+

|定义：<math>E_{s}</math>是采集的边集合。<math>V_{s}</math>是<math>E</math>中所有边的顶点的集合。

|-

−

~~|rowspan="5"~~|初始化<math>V_{s}</math>和<math>E_{s}</math>为空集。

+

|初始化<math>V_{s}</math>和<math>E_{s}</math>为空集。

1. 随机选择一条边<math> e_{1} = (v_{i}, v_{j}) </math>，更新 <math>E_{s} = \{e_{1}\}, V{s} = \{v_{i}, v_{j}\}</math>

第164行：第96行：

===FPF (Mavisto)算法===

−

Schreiber and Schwöbbermeyer <ref name="schr1" /> proposed an algorithm named ''flexible pattern finder (FPF)'' for extracting frequent sub-graphs of an input network and implemented it in a system named ''Mavisto''.<ref name="schr2">{{cite journal |vauthors=Schreiber F, Schwobbermeyer H |title=MAVisto: a tool for the exploration of network motifs |journal=Bioinformatics |volume=21 |issue=17|pages=3572–3574 |year=2005 |doi=10.1093/bioinformatics/bti556|pmid=16020473 |doi-access=free }}</ref> Their algorithm exploits the ''downward closure'' property which is applicable for frequency concepts {{math|F2}} and {{math|F3}}. The downward closure property asserts that the frequency for sub-graphs decrease monotonically by increasing the size of sub-graphs; however, this property does not hold necessarily for frequency concept {{math|F1}}. FPF is based on a ''pattern tree'' (see figure) consisting of nodes that represents different graphs (or patterns), where the parent of each node is a sub-graph of its children nodes; in other words, the corresponding graph of each pattern tree's node is expanded by adding a new edge to the graph of its parent node.

Schreiber和Schwöbbermeyer <ref name="schr1" />提出了一种称为灵活模式查找器（FPF）的算法，用于提取输入网络的频繁子图，并将其在名为Mavisto的系统中加以实现。<ref name="schr2">{{cite journal |vauthors=Schreiber F, Schwobbermeyer H |title=MAVisto: a tool for the exploration of network motifs |journal=Bioinformatics |volume=21 |issue=17|pages=3572–3574 |year=2005 |doi=10.1093/bioinformatics/bti556|pmid=16020473 |doi-access=free }}</ref> 他们的算法利用了向下闭包特性，该特性适用于频率概念<math>F_{2}</math>和<math>F_{3}</math>。向下闭包性质表明，子图的频率随着子图的大小而单调下降；但这一性质并不一定适用于频率概念<math>F_{1}</math>。FPF算法基于模式树（见右图），由代表不同图形（或模式）的节点组成，其中每个节点的父节点是其子节点的子图；换句话说，每个模式树节点的对应图通过向其父节点图添加新边来扩展。

−

+

[[Image:The_pattern_tree_in_FPF_algorithm.jpg|right|thumb|''FPF算法中的模式树展示''.<ref name="schr1" />]]

−

[[Image:~~The pattern tree in FPF algorithm~~.jpg|right|thumb|''FPF算法中的模式树展示''.<ref name="schr1" />]]

−

At first, the FPF algorithm enumerates and maintains the information of all matches of a sub-graph located at the root of the pattern tree. Then, one-by-one it builds child nodes of the previous node in the pattern tree by adding one edge supported by a matching edge in the target graph, and tries to expand all of the previous information about matches to the new sub-graph (child node). In next step, it decides whether the frequency of the current pattern is lower than a predefined threshold or not. If it is lower and if downward closure holds, FPF can abandon that path and not traverse further in this part of the tree; as a result, unnecessary computation is avoided. This procedure is continued until there is no remaining path to traverse.

首先，FPF算法枚举并维护位于模式树根部的子图的所有匹配信息。然后，它通过在目标图中添加匹配边缘支持的一条边缘，在模式树中一一建立前一节点的子节点，然后通过在目标图中添加匹配边支持的一条边，逐个构建模式树中前一个节点的子节点，并尝试将以前关于匹配的所有信息拓展到新的子图(子节点)中。下一步，它判断当前模式的频率是否低于预定义的阈值。如果它低于阈值且保持向下闭包，则FPF算法会放弃该路径，而不会在树的此部分进一步遍历；这样就避免了不必要的计算。重复此过程，直到没有剩余可遍历的路径为止。

−

The advantage of the algorithm is that it does not consider infrequent sub-graphs and tries to finish the enumeration process as soon as possible; therefore, it only spends time for promising nodes in the pattern tree and discards all other nodes. As an added bonus, the pattern tree notion permits FPF to be implemented and executed in a parallel manner since it is possible to traverse each path of the pattern tree independently. However, FPF is most useful for frequency concepts {{math|F2}} and {{math|F3}}, because downward closure is not applicable to {{math|F1}}. Nevertheless, the pattern tree is still practical for {{math|F1}} if the algorithm runs in parallel. Another advantage of the algorithm is that the implementation of this algorithm has no limitation on motif size, which makes it more amenable to improvements. The pseudocode of FPF (Mavisto) is shown below:

该算法的优点是它不会考虑不频繁的子图，并尝试尽快完成枚举过程；因此，它只花时间在模式树中用于有希望的节点上，而放弃所有其他节点。还有一点额外的好处，模式树概念允许 FPF 以并行方式实现和执行，因为它可以独立地遍历模式树的每个路径。但是，FPF对于频率概念<math>F_{2}</math>和<math>F_{3}</math>最为有用，因为向下闭包不适用于<math>F_{1}</math>。尽管如此，如果算法并行运行，那么模式树对于<math>F_{1}</math>仍然是可行的。该算法的另一个优点是它的实现对模体大小没有限制，这使其更易于改进。FPF（Mavisto）的伪代码如下所示：

−

{| class="wikitable"

+

{|class="wikitable" align="center"

|-

! Mavisto

−

|-

−

~~| '''Data:''' Graph {{math|G}}, target pattern size {{math|t}}, frequency concept {{math|F}}~~

−

~~'''Result:''' Set {{math|R}} of patterns of size {{math|t}} with maximum frequency.~~

−

|-

−

~~| {{math|R ← φ}}, {{math|fmax ← 0}}~~

−

~~{{math|P ←}}start pattern {{math|p1}} of size 1~~

−

~~{{math|Mp1 ←}}all matches of {{math|p1}} in {{math|G}}~~

−

~~'''While''' {{math|P ≠ φ}} '''do'''~~

−

~~{{pad|1em}}{{math|Pmax ←}}select all patterns from {{math|P}} with maximum size.~~

−

~~{{pad|1em}}{{math|P ←}} select pattern with maximum frequency from {{math|Pmax}}~~

−

~~{{pad|1em}}{{math|Ε {{=}} ''ExtensionLoop''(G, p, Mp)}}~~

−

~~{{pad|1em}}'''Foreach''' pattern {{math|p ∈ E}}~~

−

~~{{pad|2em}}'''If''' {{math|F {{=}} F1}} '''then''' {{math|f ← ''size''(Mp)}}~~

−

~~{{pad|2em}}'''Else''' {{math|f ←}} ''Maximum Independent set''{{math|(F, Mp)}}~~

−

~~{{pad|2em}}'''End'''~~

−

~~{{pad|2em}}'''If''' {{math|''size''(p) {{=}} t}} '''then'''~~

−

~~{{pad|3em}}'''If''' {{math|f {{=}} fmax}} '''then''' {{math|R ← R ⋃ {p}}}~~

−

~~{{pad|3em}}'''Else if''' {{math|f > fmax}} '''then''' {{math|R ← {p}}}; {{math|fmax ← f}}~~

−

~~{{pad|3em}}'''End'''~~

−

~~{{pad|2em}}'''Else'''~~

−

~~{{pad|3em}}'''If''' {{math|F {{=}} F1}} '''or''' {{math|f ≥ fmax}} '''then''' {{math|P ← P ⋃ {p}}}~~

−

~~{{pad|3em}}'''End'''~~

−

~~{{pad|2em}}'''End'''~~

−

~~{{pad|1em}}'''End'''~~

−

~~'''End'''~~

−

|}

−

~~{|class="wikitable"~~

−

~~|+ Mavisto~~

|-

!rowspan="1"|数据: 图 <math>G</math>, 目标模式规模 <math>t</math>, 频率概念 <math>F</math>。

第248行：第121行：

当 <math>P \neq \Phi </math> 时，执行：

−

~~    ~~<math>P_{max} \leftarrow</math> 从 <math>P</math> 中选择最大规模的所有模式

+

{{pad|1em}}<math>P_{max} \leftarrow</math> 从 <math>P</math> 中选择最大规模的所有模式

−

~~    ~~<math>P\leftarrow</math> 从 <math>P_{max}</math> 中选择最大频率的模式

+

{{pad|1em}}<math>P\leftarrow</math> 从 <math>P_{max}</math> 中选择最大频率的模式

−

~~    ~~<math>E = ExtensionLoop(G, p, M_{p})</math>

+

{{pad|1em}}<math>E = ExtensionLoop(G, p, M_{p})</math>

−

~~    ~~对于 <math>p \in E </math> 的所有模式：

+

{{pad|1em}}对于 <math>p \in E </math> 的所有模式：

−

~~        ~~如果 <math>F = F_{1}</math> ，那么 <math>f \leftarrow size(M_{p})</math>

+

{{pad|2em}}如果 <math>F = F_{1}</math> ，那么 <math>f \leftarrow size(M_{p})</math>

−

~~        ~~其他<math>f \leftarrow</math> 最大独立集 <math>(F, M_{p})</math>

+

{{pad|2em}}其他<math>f \leftarrow</math> 最大独立集 <math>(F, M_{p})</math>

−

~~        ~~结束

+

{{pad|2em}}结束

−

~~        ~~如果 <math>size(p) = t</math> ，那么

+

{{pad|2em}}如果 <math>size(p) = t</math> ，那么

−

~~            ~~如果 <math>f = f_{max}</math> ，那么 <math>R \leftarrow R \cup \{p\}</math>

+

{{pad|3em}}如果 <math>f = f_{max}</math> ，那么 <math>R \leftarrow R \cup \{p\}</math>

−

~~            ~~其他如果 <math>f > f_{max}</math> ，那么 <math>R \leftarrow \{p\}</math>； <math>f_{max} \leftarrow f</math>

+

{{pad|3em}}其他如果 <math>f > f_{max}</math> ，那么 <math>R \leftarrow \{p\}</math>； <math>f_{max} \leftarrow f</math>

−

~~            ~~结束

+

{{pad|3em}}结束

−

~~        ~~其他

+

{{pad|2em}}其他

−

~~            ~~如果 <math>F = F_{1} or f \geq f_{max}</math> ，那么 <math> P \leftarrow P \cup \{p\}</math>

+

{{pad|3em}}如果 <math>F = F_{1} or f \geq f_{max}</math> ，那么 <math> P \leftarrow P \cup \{p\}</math>

−

~~            ~~结束

+

{{pad|3em}}结束

−

~~        ~~结束

+

{{pad|2em}}结束

−

~~    ~~结束

+

{{pad|1em}}结束

结束

薄荷

7,129

个编辑

更改

网络模体 (查看源代码)

2020年5月9日 (六) 23:09的版本

导航菜单

搜索