更改

添加5,569字节 、 2020年5月6日 (三) 19:17
第30行: 第30行:     
When {{math|G&Prime; ⊂ G}} and there exists an isomorphism between the sub-graph {{math|G&Prime;}} and a graph {{math|G&prime;}}, this mapping represents an ''appearance'' of {{math|G&prime;}} in {{math|G}}. The number of appearances of graph {{math|G&prime;}} in {{math|G}}  is called the frequency {{math|F<sub>G</sub>}} of {{math|G&prime;}} in {{math|G}}. A graph is called ''recurrent'' (or ''frequent'') in {{math|G}}, when its ''frequency'' {{math|F<sub>G</sub>(G&prime;)}} is above a predefined threshold or cut-off value. We use terms ''pattern'' and ''frequent sub-graph'' in this review interchangeably. There is an [[Statistical ensemble (mathematical physics)|ensemble]] {{math|Ω(G)}} of random graphs corresponding to the [[Null model|null-model]] associated to {{math|G}}. We should choose {{math|N}} random graphs uniformly from {{math|Ω(G)}}  and calculate the frequency for a particular frequent sub-graph {{math|G&prime;}} in {{math|G}}. If the frequency of {{math|G&prime;}} in {{math|G}} is higher than its arithmetic mean frequency in {{math|N}} random graphs {{math|R<sub>i</sub>}}, where {{math|1 ≤ i ≤ N}}, we call this recurrent pattern ''significant'' and hence treat {{math|G&prime;}} as a ''network motif'' for {{math|G}}. For a small graph {{math|G&prime;}}, the network {{math|G}} and a set of randomized networks {{math|R(G) ⊆ Ω(R)}}, where {{math|1=R(G) {{=}} N}}, the ''Z-Score'' that has been defined by the following formula:
 
When {{math|G&Prime; ⊂ G}} and there exists an isomorphism between the sub-graph {{math|G&Prime;}} and a graph {{math|G&prime;}}, this mapping represents an ''appearance'' of {{math|G&prime;}} in {{math|G}}. The number of appearances of graph {{math|G&prime;}} in {{math|G}}  is called the frequency {{math|F<sub>G</sub>}} of {{math|G&prime;}} in {{math|G}}. A graph is called ''recurrent'' (or ''frequent'') in {{math|G}}, when its ''frequency'' {{math|F<sub>G</sub>(G&prime;)}} is above a predefined threshold or cut-off value. We use terms ''pattern'' and ''frequent sub-graph'' in this review interchangeably. There is an [[Statistical ensemble (mathematical physics)|ensemble]] {{math|Ω(G)}} of random graphs corresponding to the [[Null model|null-model]] associated to {{math|G}}. We should choose {{math|N}} random graphs uniformly from {{math|Ω(G)}}  and calculate the frequency for a particular frequent sub-graph {{math|G&prime;}} in {{math|G}}. If the frequency of {{math|G&prime;}} in {{math|G}} is higher than its arithmetic mean frequency in {{math|N}} random graphs {{math|R<sub>i</sub>}}, where {{math|1 ≤ i ≤ N}}, we call this recurrent pattern ''significant'' and hence treat {{math|G&prime;}} as a ''network motif'' for {{math|G}}. For a small graph {{math|G&prime;}}, the network {{math|G}} and a set of randomized networks {{math|R(G) ⊆ Ω(R)}}, where {{math|1=R(G) {{=}} N}}, the ''Z-Score'' that has been defined by the following formula:
 +
 +
<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>
 +
 +
当{{math|G&Prime; ⊂ G}},且{{math|G&Prime;}}与图{{math|G&prime;}}之间存在同构时,该映射表示{{math|G&prime;}}对于{{math|G}}存在。图{{math|G&prime;}}在{{math|G}}的出现次数称为{{math|G&prime;}}出现在{{math|G}}的频率{{math|F<sub>G</sub>}}。当一个子图的频率{{math|F<sub>G</sub>}}高于预定的阈值或截止值时,则称{{math|G&prime;}}是{{math|G}}中的递归(或频繁)子图。
 +
 +
在接下来的内容中,我们交替使用术语“模式(motifs)”和“频繁子图(frequent sub-graph)”。
 +
 +
设从与{{math|G}}相关联的零模型(the null-model)获得的随机图集合为{{math|Ω(G)}},我们从{{math|Ω(G)}}中均匀地选择N个随机图,并计算其特定频繁子图的频率。如果{{math|G&prime;}}出现在{{math|G}}的频率高于N个随机图Ri的算术平均频率,其中{{math|1 ≤ i ≤ N}},我们称此递归模式是有意义的,因此可以将{{math|G&prime;}}视为{{math|G}}的网络模体。
 +
 +
对于一个小图{{math|G&prime;}},网络{{math|G}}和一组随机网络{{math|R(G) ⊆ Ω(R)}},当{{math|1=R(G) {{=}} N}}时,由其Z分数(Z-score)的定义如下式:
    
<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>
 
<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>
    
where {{math|μ<sub>R</sub>(G&prime;)}} and {{math|σ<sub>R</sub>(G&prime;)}} stand for mean and standard deviation frequency in set {{math|R(G)}}, respectively.<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> The larger the {{math|Z(G&prime;)}}, the more significant is the sub-graph {{math|G&prime;}} as a motif. Alternatively, another measurement in statistical hypothesis testing that can be considered in motif detection is the P-Value, given as the probability of  {{math|F<sub>R</sub>(G&prime;) ≥ F<sub>G</sub>(G&prime;)}} (as its null-hypothesis), where  {{math|F<sub>R</sub>(G&prime;)}} indicates the frequency of G' in a randomized network.<ref name="sch1" /> A sub-graph with P-value less than a threshold (commonly 0.01 or 0.05) will be treated as a significant pattern.  The P-value is defined as
 
where {{math|μ<sub>R</sub>(G&prime;)}} and {{math|σ<sub>R</sub>(G&prime;)}} stand for mean and standard deviation frequency in set {{math|R(G)}}, respectively.<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> The larger the {{math|Z(G&prime;)}}, the more significant is the sub-graph {{math|G&prime;}} as a motif. Alternatively, another measurement in statistical hypothesis testing that can be considered in motif detection is the P-Value, given as the probability of  {{math|F<sub>R</sub>(G&prime;) ≥ F<sub>G</sub>(G&prime;)}} (as its null-hypothesis), where  {{math|F<sub>R</sub>(G&prime;)}} indicates the frequency of G' in a randomized network.<ref name="sch1" /> A sub-graph with P-value less than a threshold (commonly 0.01 or 0.05) will be treated as a significant pattern.  The P-value is defined as
 +
 +
<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>
 +
 +
式中,{{math|μ<sub>R</sub>(G&prime;)}} 和 {{math|σ<sub>R</sub>(G&prime;)}}分别代表集合{{math|R(G)}}中的平均和标准偏差频率。.<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> {{math|Z(G&prime;)}}越大,子图{{math|G&prime;}}作为模体的意义也就越大。
 +
 +
此外还可以使用统计假设检验中的另一个测量方法,可以作为模体检测中的一种方法,即P值(P-value),以 {{math|F<sub>R</sub>(G&prime;) ≥ F<sub>G</sub>(G&prime;)}}的概率给出(作为其零假设null-hypothesis),其中{{math|F<sub>R</sub>(G&prime;)}}表示随机网络中{{math|G&prime;}}的频率。<ref name="sch1" /> 当P值小于阈值(通常为0.01或0.05)时,该子图可以被称为显著模式。该P值定义为:
    
<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>
 
<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>
第44行: 第60行:     
where index {{math|i}} is defined over the set of all non-isomorphic n-size graphs. Another statistical measurement is defined for evaluating network motifs, but it is rarely used in known algorithms. This measurement is introduced by Picard ''et al.'' in 2008 and used the Poisson distribution, rather than the Gaussian normal distribution that is implicitly being used above.<ref name="pic1">{{cite journal |vauthors=Picard F, Daudin JJ, Schbath S, Robin S |title=Assessing the Exceptionality of Network Motifs |journal=J. Comp. Bio. |year=2005 |volume=15 |issue=1 |pages=1–20|doi=10.1089/cmb.2007.0137 |pmid=18257674 |citeseerx=10.1.1.475.4300 }}</ref>
 
where index {{math|i}} is defined over the set of all non-isomorphic n-size graphs. Another statistical measurement is defined for evaluating network motifs, but it is rarely used in known algorithms. This measurement is introduced by Picard ''et al.'' in 2008 and used the Poisson distribution, rather than the Gaussian normal distribution that is implicitly being used above.<ref name="pic1">{{cite journal |vauthors=Picard F, Daudin JJ, Schbath S, Robin S |title=Assessing the Exceptionality of Network Motifs |journal=J. Comp. Bio. |year=2005 |volume=15 |issue=1 |pages=1–20|doi=10.1089/cmb.2007.0137 |pmid=18257674 |citeseerx=10.1.1.475.4300 }}</ref>
 +
 +
其中{{math|N}}表示随机网络的数目,{{math|i}}定义在随机网络的集合上,若条件{{math|c(i)}}成立,则Kroneckerδ函数{{math|δ(c(i))}}是1。在网络{{math|G}}中,一个特定的n维子图{{math|N&prime;}}的集中度是指子图在网络中出现频率与n维非同构子图的总频率之比,其计算公式如下:
 +
 +
<math>C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)}</math>
 +
    
In addition, three specific concepts of sub-graph frequency have been proposed.<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> As the figure illustrates, the first frequency concept {{math|F<sub>1</sub>}} considers all matches of a graph in original network. This definition is similar to what we have introduced above. The second concept {{math|F<sub>2</sub>}} is defined as the maximum number of edge-disjoint instances of a given graph in original network. And finally, the frequency concept {{math|F<sub>3</sub>}} entails matches with disjoint edges and nodes. Therefore, the two concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}} restrict the usage of elements of the graph, and as can be inferred, the frequency of a sub-graph declines by imposing restrictions on network element usage. As a result, a network motif detection algorithm would pass over more candidate sub-graphs if we insist on frequency concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}}.
 
In addition, three specific concepts of sub-graph frequency have been proposed.<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> As the figure illustrates, the first frequency concept {{math|F<sub>1</sub>}} considers all matches of a graph in original network. This definition is similar to what we have introduced above. The second concept {{math|F<sub>2</sub>}} is defined as the maximum number of edge-disjoint instances of a given graph in original network. And finally, the frequency concept {{math|F<sub>3</sub>}} entails matches with disjoint edges and nodes. Therefore, the two concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}} restrict the usage of elements of the graph, and as can be inferred, the frequency of a sub-graph declines by imposing restrictions on network element usage. As a result, a network motif detection algorithm would pass over more candidate sub-graphs if we insist on frequency concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}}.
 +
 +
此外,他们还提出了子图频率的三个具体概念。<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> 如图所示,第一频率概念 {{math|F<sub>1</sub>}}考虑原始网络中图的所有匹配,这与我们前面介绍过的类似。第二个概念{{math|F<sub>2</sub>}}定义为原始网络中给定图的最大不相交边的数量。最后,频率概念{{math|F<sub>3</sub>}}包含与不相交边(disjoint edges)和节点的匹配。因此,两个概念F2和F3限制了图元素的使用,并且可以看出,通过对网络元素的使用施加限制,子图的频率下降。因此,如果我们坚持使用频率概念{{math|F<sub>2</sub>}}和{{math|F<sub>3</sub>}},网络模体检测算法将可以筛选出更多的候选子图。
    
==History==
 
==History==
377

个编辑