添加90,777字节
、 2020年5月3日 (日) 11:19
{{Network Science}}
All networks, including [[biological network]]s, social networks, technological networks (e.g., computer networks and electrical circuits) and more, can be represented as [[complex network|graphs]], which include a wide variety of subgraphs. One important local property of networks are so-called '''network motifs''', which are defined as recurrent and [[statistically significant]] sub-graphs or patterns.
Network motifs are sub-graphs that repeat themselves in a specific network or even among various networks. Each of these sub-graphs, defined by a particular pattern of interactions between vertices, may reflect a framework in which particular functions are achieved efficiently. Indeed, motifs are of notable importance largely because they may reflect functional properties. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks.<ref name="mas1">{{cite journal |vauthors=Masoudi-Nejad A, Schreiber F, Razaghi MK Z |title=Building Blocks of Biological Networks: A Review on Major Network Motif Discovery Algorithms |journal=IET Systems Biology |volume=6 |issue=5 |pages=164–74 |year=2012|doi=10.1049/iet-syb.2011.0011 |pmid=23101871 }}</ref> Although network motifs may provide a deep insight into the network's functional abilities, their detection is computationally challenging.
==Definition==
Let {{math|G {{=}} (V, E)}} and {{math|G′ {{=}} (V′, E′)}} be two graphs. Graph {{math|G′}} is a ''sub-graph'' of graph {{math|G}} (written as {{math|G′ ⊆ G}}) if {{math|V′ ⊆ V}} and {{math|E′ ⊆ E ∩ (V′ × V′)}}. If {{math|G′ ⊆ G}} and {{math|G′}} contains all of the edges {{math|⟨u, v⟩ ∈ E}} with {{math|u, v ∈ V′}}, then {{math|G′}} is an ''induced sub-graph'' of {{math|G}}. We call {{math|G′}} and {{math|G}} isomorphic (written as {{math|G′ ↔ G}}), if there exists a bijection (one-to-one) {{math|f:V′ → V}} with {{math|⟨u, v⟩ ∈ E′ ⇔ ⟨f(u), f(v)⟩ ∈ E}} for all {{math|u, v ∈ V′}}. The mapping {{math|f}} is called an isomorphism between {{math|G}} and {{math|G′}}.<ref name="die1">{{cite journal |author=Diestel R |title=Graph Theory (Graduate Texts in Mathematics) |volume=173 |year=2005|publisher=New York: Springer-Verlag Heidelberg}}</ref>
When {{math|G″ ⊂ G}} and there exists an isomorphism between the sub-graph {{math|G″}} and a graph {{math|G′}}, this mapping represents an ''appearance'' of {{math|G′}} in {{math|G}}. The number of appearances of graph {{math|G′}} in {{math|G}} is called the frequency {{math|F<sub>G</sub>}} of {{math|G′}} in {{math|G}}. A graph is called ''recurrent'' (or ''frequent'') in {{math|G}}, when its ''frequency'' {{math|F<sub>G</sub>(G′)}} is above a predefined threshold or cut-off value. We use terms ''pattern'' and ''frequent sub-graph'' in this review interchangeably. There is an [[Statistical ensemble (mathematical physics)|ensemble]] {{math|Ω(G)}} of random graphs corresponding to the [[Null model|null-model]] associated to {{math|G}}. We should choose {{math|N}} random graphs uniformly from {{math|Ω(G)}} and calculate the frequency for a particular frequent sub-graph {{math|G′}} in {{math|G}}. If the frequency of {{math|G′}} in {{math|G}} is higher than its arithmetic mean frequency in {{math|N}} random graphs {{math|R<sub>i</sub>}}, where {{math|1 ≤ i ≤ N}}, we call this recurrent pattern ''significant'' and hence treat {{math|G′}} as a ''network motif'' for {{math|G}}. For a small graph {{math|G′}}, the network {{math|G}} and a set of randomized networks {{math|R(G) ⊆ Ω(R)}}, where {{math|1=R(G) {{=}} N}}, the ''Z-Score'' that has been defined by the following formula:
<math>Z(G^\prime) = \frac{F_G(G^\prime) - \mu_R(G^\prime)}{\sigma_R(G^\prime)}</math>
where {{math|μ<sub>R</sub>(G′)}} and {{math|σ<sub>R</sub>(G′)}} stand for mean and standard deviation frequency in set {{math|R(G)}}, respectively.<ref name="mil1">{{cite journal |vauthors=Milo R, Shen-Orr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |volume=298 |year=2002|journal=Science|issue=5594|pages=824–827 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="alb1">{{cite journal |vauthors=Albert R, Barabási AL |title=Statistical mechanics of complex networks |journal=Reviews of Modern Physics |volume=74 |issue=1 |year=2002 |pages=47–49 |doi=10.1103/RevModPhys.74.47 |bibcode=2002RvMP...74...47A|arxiv=cond-mat/0106096 |citeseerx=10.1.1.242.4753 }}</ref><ref name="mil2">{{cite journal |vauthors=Milo R, Itzkovitz S, Kashtan N, Levitt R, Shen-Orr S, Ayzenshtat I, Sheffer M, Alon U |title=Superfamilies of designed and evolved networks |journal=Science |volume=303 |issue=5663 |year=2004 |pages=1538–1542 |pmid=15001784 |doi=10.1126/science.1089167 |bibcode=2004Sci...303.1538M |url=https://semanticscholar.org/paper/87ad24efd5329046d37cd704f970323e878710ca }}</ref><ref name="sch1">{{cite encyclopedia|last=Schwöbbermeyer |first=H |title=Network Motifs |encyclopedia=Analysis of Biological Networks |editor=Junker BH, Schreiber F |publisher=Hoboken, New Jersey: John Wiley & Sons |year=2008 |pages=85–108}}</ref><ref name="bor1">{{cite encyclopedia |last1=Bornholdt |first1=S |last2=Schuster |first2=HG |title=Handbook of graphs and networks : from the genome to the Internet |journal=Handbook of Graphs and Networks: From the Genome to the Internet |pages=417 |year=2003|bibcode=2003hgnf.book.....B }}</ref><ref name="cir1">{{cite journal |vauthors=Ciriello G, Guerra C |title=A review on models and algorithms for motif discovery in protein-protein interaction networks |journal=Briefings in Functional Genomics and Proteomics |year=2008 |volume=7 |issue=2 |pages=147–156 |doi=10.1093/bfgp/eln015|pmid=18443014 |doi-access=free }}</ref> The larger the {{math|Z(G′)}}, the more significant is the sub-graph {{math|G′}} as a motif. Alternatively, another measurement in statistical hypothesis testing that can be considered in motif detection is the P-Value, given as the probability of {{math|F<sub>R</sub>(G′) ≥ F<sub>G</sub>(G′)}} (as its null-hypothesis), where {{math|F<sub>R</sub>(G′)}} indicates the frequency of G' in a randomized network.<ref name="sch1" /> A sub-graph with P-value less than a threshold (commonly 0.01 or 0.05) will be treated as a significant pattern. The P-value is defined as
<math>P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime)</math>
[[File:Different occurrences of a sub-graph in a graph.jpg|thumb|''Different occurrences of a sub-graph in a graph''. (M1 – M4) are different occurrences of sub-graph (b) in graph (a). For frequency concept {{math|F<sub>1</sub>}}, the set M1, M2, M3, M4 represent all matches, so {{math|F<sub>1</sub> {{=}} 4}}. For {{math|F<sub>2</sub>}}, one of the two set M1, M4 or M2, M3 are possible matches, {{math|F<sub>2</sub> {{=}} 2}}. Finally, for frequency concept {{math|F<sub>3</sub>}}, merely one of the matches (M1 to M4) is allowed, therefore {{math|F<sub>3</sub> {{=}} 1}}. The frequency of these three frequency concepts decrease as the usage of network elements are restricted.]]
Where {{math|N}} indicates number of randomized networks, {{math|i}} is defined over an ensemble of randomized networks and the Kronecker delta function {{math|δ(c(i))}} is one if the condition {{math|c(i)}} holds. The concentration <ref name="kas1">{{cite journal |vauthors=Kashtan N, Itzkovitz S, Milo R, Alon U |title=Efficient sampling algorithm for estimating sub-graph concentrations and detecting network motifs |journal=Bioinformatics |year=2004 |volume=20 |issue=11 |pages=1746–1758 |doi=10.1093/bioinformatics/bth163|pmid=15001476 |doi-access=free }}</ref><ref name="wer1">{{cite journal |author=Wernicke S |title=Efficient detection of network motifs |journal=IEEE/ACM Transactions on Computational Biology and Bioinformatics |year=2006 |volume=3 |issue=4 |pages=347–359 |doi=10.1109/tcbb.2006.51|pmid=17085844 |citeseerx=10.1.1.304.2576 }}</ref> of a particular n-size sub-graph {{math|G′}} in network {{math|G}} refers to the ratio of the sub-graph appearance in the network to the total ''n''-size non-isomorphic sub-graphs’ frequencies, which is formulated by
<math>C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)}</math>
where index {{math|i}} is defined over the set of all non-isomorphic n-size graphs. Another statistical measurement is defined for evaluating network motifs, but it is rarely used in known algorithms. This measurement is introduced by Picard ''et al.'' in 2008 and used the Poisson distribution, rather than the Gaussian normal distribution that is implicitly being used above.<ref name="pic1">{{cite journal |vauthors=Picard F, Daudin JJ, Schbath S, Robin S |title=Assessing the Exceptionality of Network Motifs |journal=J. Comp. Bio. |year=2005 |volume=15 |issue=1 |pages=1–20|doi=10.1089/cmb.2007.0137 |pmid=18257674 |citeseerx=10.1.1.475.4300 }}</ref>
In addition, three specific concepts of sub-graph frequency have been proposed.<ref name="schr1">{{cite book |vauthors=Schreiber F, Schwöbbermeyer H |title=Frequency concepts and pattern detection for the analysis of motifs in networks |journal=Transactions on Computational Systems Biology III |volume=3737 |year=2005 |pages=89–104|doi=10.1007/11599128_7 |citeseerx=10.1.1.73.1130 |series=Lecture Notes in Computer Science |isbn=978-3-540-30883-6 }}</ref> As the figure illustrates, the first frequency concept {{math|F<sub>1</sub>}} considers all matches of a graph in original network. This definition is similar to what we have introduced above. The second concept {{math|F<sub>2</sub>}} is defined as the maximum number of edge-disjoint instances of a given graph in original network. And finally, the frequency concept {{math|F<sub>3</sub>}} entails matches with disjoint edges and nodes. Therefore, the two concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}} restrict the usage of elements of the graph, and as can be inferred, the frequency of a sub-graph declines by imposing restrictions on network element usage. As a result, a network motif detection algorithm would pass over more candidate sub-graphs if we insist on frequency concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}}.
==History==
The study of network motifs was pioneered by Holland and Leinhardt<ref>Holland, P. W., & Leinhardt, S. (1974). The statistical analysis of local structure in social networks. Working Paper No. 44, National Bureau of Economic Research.</ref><ref>Hollandi, P., & Leinhardt, S. (1975). The Statistical Analysis of Local. Structure in Social Networks. Sociological Methodology, David Heise, ed. San Francisco: Josey-Bass.</ref><ref> Holland, P. W., & Leinhardt, S. (1976). Local structure in social networks. Sociological methodology, 7, 1-45.</ref><ref>Holland, P. W., & Leinhardt, S. (1977). A method for detecting structure in sociometric data. In Social Networks (pp. 411-432). Academic Press.</ref> who introduced the concept of a triad census of networks. They introduced methods to enumerate various types of subgraph configurations, and test whether the subgraph counts are statistically different from those expected in random networks.
This idea was further generalized in 2002 by [[Uri Alon]] and his group <ref name="she1">{{cite journal |vauthors=Shen-Orr SS, Milo R, Mangan S, Alon U |title=Network motifs in the transcriptional regulation network of ''Escherichia coli'' |journal=Nat. Genet. |volume=31 |issue=1 |pages=64–8 |date=May 2002 |pmid=11967538 |doi=10.1038/ng881}}</ref> when network motifs were discovered in the gene regulation (transcription) network of the bacteria ''[[Escherichia coli|E. coli]]'' and then in a large set of natural networks. Since then, a considerable number of studies have been conducted on the subject. Some of these studies focus on the biological applications, while others focus on the computational theory of network motifs.
The biological studies endeavor to interpret the motifs detected for biological networks. For example, in work following,<ref name="she1" /> the network motifs found in ''[[Escherichia coli|E. coli]]'' were discovered in the transcription networks of other bacteria<ref name="eic1">{{cite journal |vauthors=Eichenberger P, Fujita M, Jensen ST, etal |title=The program of gene transcription for a single differentiating cell type during sporulation in ''Bacillus subtilis'' |journal=PLOS Biology |volume=2 |issue=10 |pages=e328 |date=October 2004 |pmid=15383836 |pmc=517825 |doi=10.1371/journal.pbio.0020328 }} {{open access}}</ref> as well as yeast<ref name="mil3">{{cite journal |vauthors=Milo R, Shen-Orr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U |title=Network motifs: simple building blocks of complex networks |journal=Science |volume=298 |issue=5594 |pages=824–7 |date=October 2002 |doi=10.1126/science.298.5594.824 |pmid=12399590|bibcode=2002Sci...298..824M |citeseerx=10.1.1.225.8750 }}</ref><ref name="lee1">{{cite journal |vauthors=Lee TI, Rinaldi NJ, Robert F, etal |title=Transcriptional regulatory networks in Saccharomyces cerevisiae |journal=Science |volume=298 |issue=5594 |pages=799–804 |date=October 2002 |pmid=12399584 |doi=10.1126/science.1075090 |bibcode=2002Sci...298..799L }}</ref> and higher organisms.<ref name="odo1">{{cite journal |vauthors=Odom DT, Zizlsperger N, Gordon DB, etal |title=Control of pancreas and liver gene expression by HNF transcription factors |journal=Science |volume=303 |issue=5662 |pages=1378–81 |date=February 2004 |pmid=14988562 |pmc=3012624 |doi=10.1126/science.1089769 |bibcode=2004Sci...303.1378O }}</ref><ref name="boy1">{{cite journal |vauthors=Boyer LA, Lee TI, Cole MF, etal |title=Core transcriptional regulatory circuitry in human embryonic stem cells |journal=Cell |volume=122 |issue=6 |pages=947–56 |date=September 2005 |pmid=16153702 |pmc=3006442 |doi=10.1016/j.cell.2005.08.020 }}</ref><ref name="ira1">{{cite journal |vauthors=Iranfar N, Fuller D, Loomis WF |title=Transcriptional regulation of post-aggregation genes in Dictyostelium by a feed-forward loop involving GBF and LagC |journal=Dev. Biol. |volume=290 |issue=2 |pages=460–9 |date=February 2006 |pmid=16386729 |doi=10.1016/j.ydbio.2005.11.035 |doi-access=free }}</ref> A distinct set of network motifs were identified in other types of biological networks such as neuronal networks and protein interaction networks.<ref name="mil2" /><ref name="maa1">{{cite journal |vauthors=Ma'ayan A, Jenkins SL, Neves S, etal |title=Formation of regulatory patterns during signal propagation in a Mammalian cellular network |journal=Science |volume=309 |issue=5737 |pages=1078–83 |date=August 2005 |pmid=16099987 |pmc=3032439 |doi=10.1126/science.1108876 |bibcode=2005Sci...309.1078M }}</ref><ref name="pta1">{{cite journal |vauthors=Ptacek J, Devgan G, Michaud G, etal |title=Global analysis of protein phosphorylation in yeast |journal=Nature |volume=438 |issue=7068 |pages=679–84 |date=December 2005 |pmid=16319894 |doi=10.1038/nature04187|bibcode=2005Natur.438..679P |url=https://authors.library.caltech.edu/56271/2/Tables.pdf |type=Submitted manuscript }}</ref>
The computational research has focused on improving existing motif detection tools to assist the biological investigations and allow larger networks to be analyzed. Several different algorithms have been provided so far, which are elaborated in the next section in chronological order.
Most recently, the acc-MOTIF tool to detect network motifs was released.<ref>{{Cite web | url=http://www.ft.unicamp.br/docentes/meira/accmotifs/ |title = Acc-Motif: Accelerated Motif Detection}}</ref>
==Motif discovery algorithms==
Various solutions have been proposed for the challenging problem of motif discovery. These algorithms can be classified under various paradigms such as exact counting methods, sampling methods, pattern growth methods and so on. However, motif discovery problem comprises two main steps: first, calculating the number of occurrences of a sub-graph and then, evaluating the sub-graph significance. The recurrence is significant if it is detectably far more than expected. Roughly speaking, the expected number of appearances of a sub-graph can be determined by a Null-model, which is defined by an ensemble of random networks with some of the same properties as the original network.
Here, a review on computational aspects of major algorithms is given and their related benefits and drawbacks from an algorithmic perspective are discussed.
===mfinder===
''mfinder'', the first motif-mining tool, implements two kinds of motif finding algorithms: a full enumeration and a sampling method. Until 2004, the only exact counting method for NM (network motif) detection was the brute-force one proposed by Milo ''et al.''.<ref name="mil1" /> This algorithm was successful for discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible. Hence, a new approach to this problem was needed.
Kashtan ''et al.'' <ref name="kas1" /> presented the first sampling NM discovery algorithm, which was based on ''edge sampling'' throughout the network. This algorithm estimates concentrations of induced sub-graphs and can be utilized for motif discovery in directed or undirected networks. The sampling procedure of the algorithm starts from an arbitrary edge of the network that leads to a sub-graph of size two, and then expands the sub-graph by choosing a random edge that is incident to the current sub-graph. After that, it continues choosing random neighboring edges until a sub-graph of size n is obtained. Finally, the sampled sub-graph is expanded to include all of the edges that exist in the network between these n nodes. When an algorithm uses a sampling approach, taking unbiased samples is the most important issue that the algorithm might address. The sampling procedure, however, does not take samples uniformly and therefore Kashtan ''et al.'' proposed a weighting scheme that assigns different weights to the different sub-graphs within network.<ref name="kas1" /> The underlying principle of weight allocation is exploiting the information of the [[sampling probability]] for each sub-graph, i.e. the probable sub-graphs will obtain comparatively less weights in comparison to the improbable sub-graphs; hence, the algorithm must calculate the sampling probability of each sub-graph that has been sampled. This weighting technique assists ''mfinder'' to determine sub-graph concentrations impartially.
In expanded to include sharp contrast to exhaustive search, the computational time of the algorithm surprisingly is asymptotically independent of the network size. An analysis of the computational time of the algorithm has shown that it takes {{math|O(n<sup>n</sup>)}} for each sample of a sub-graph of size {{math|n}} from the network. On the other hand, there is no analysis in <ref name="kas1" /> on the classification time of sampled sub-graphs that requires solving the ''graph isomorphism'' problem for each sub-graph sample. Additionally, an extra computational effort is imposed on the algorithm by the sub-graph weight calculation. But it is unavoidable to say that the algorithm may sample the same sub-graph multiple times – spending time without gathering any information.<ref name="wer1" /> In conclusion, by taking the advantages of sampling, the algorithm performs more efficiently than an exhaustive search algorithm; however, it only determines sub-graphs concentrations approximately. This algorithm can find motifs up to size 6 because of its main implementation, and as result it gives the most significant motif, not all the others too. Also, it is necessary to mention that this tool has no option of visual presentation. The sampling algorithm is shown briefly:
{| class="wikitable"
|-
! mfinder
|-
| '''Definitions:''' {{math|E<sub>s</sub>}}is the set of picked edges. {{math|V<sub>s</sub>}} is the set of all nodes that are touched by the edges in {{math|E}}.
|-
| Init {{math|V<sub>s</sub>}} and {{math|E<sub>s</sub>}} to be empty sets.
1. Pick a random edge {{math|e<sub>1</sub> {{=}} (v<sub>i</sub>, v<sub>j</sub>)}}. Update {{math|E<sub>s</sub> {{=}} {e<sub>1</sub>}}}, {{math|V<sub>s</sub> {{=}} {v<sub>i</sub>, v<sub>j</sub>}}}
2. Make a list {{math|L}} of all neighbor edges of {{math|E<sub>s</sub>}}. Omit from {{math|L}} all edges between members of {{math|V<sub>s</sub>}}.
3. Pick a random edge {{math|e {{=}} {v<sub>k</sub>,v<sub>l</sub>}}} from {{math|L}}. Update {{math|E<sub>s</sub> {{=}} E<sub>s</sub> ⋃ {e}}}, {{math|V<sub>s</sub> {{=}} V<sub>s</sub> ⋃ {v<sub>k</sub>, v<sub>l</sub>}}}.
4. Repeat steps 2-3 until completing an ''n''-node subgraph (until {{math|{{!}}V<sub>s</sub>{{!}} {{=}} n}}).
5. Calculate the probability to sample the picked ''n''-node subgraph.
|}
===FPF (Mavisto)===
Schreiber and Schwöbbermeyer <ref name="schr1" /> proposed an algorithm named ''flexible pattern finder (FPF)'' for extracting frequent sub-graphs of an input network and implemented it in a system named ''Mavisto''.<ref name="schr2">{{cite journal |vauthors=Schreiber F, Schwobbermeyer H |title=MAVisto: a tool for the exploration of network motifs |journal=Bioinformatics |volume=21 |issue=17|pages=3572–3574 |year=2005 |doi=10.1093/bioinformatics/bti556|pmid=16020473 |doi-access=free }}</ref> Their algorithm exploits the ''downward closure'' property which is applicable for frequency concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}}. The downward closure property asserts that the frequency for sub-graphs decrease monotonically by increasing the size of sub-graphs; however, this property does not hold necessarily for frequency concept {{math|F<sub>1</sub>}}. FPF is based on a ''pattern tree'' (see figure) consisting of nodes that represents different graphs (or patterns), where the parent of each node is a sub-graph of its children nodes; in other words, the corresponding graph of each pattern tree's node is expanded by adding a new edge to the graph of its parent node.
[[File:The pattern tree in FPF algorithm.jpg|thumb|''Illustration of the pattern tree in FPF algorithm''.<ref name="schr1" />]]
At first, the FPF algorithm enumerates and maintains the information of all matches of a sub-graph located at the root of the pattern tree. Then, one-by-one it builds child nodes of the previous node in the pattern tree by adding one edge supported by a matching edge in the target graph, and tries to expand all of the previous information about matches to the new sub-graph (child node). In next step, it decides whether the frequency of the current pattern is lower than a predefined threshold or not. If it is lower and if downward closure holds, FPF can abandon that path and not traverse further in this part of the tree; as a result, unnecessary computation is avoided. This procedure is continued until there is no remaining path to traverse.
The advantage of the algorithm is that it does not consider infrequent sub-graphs and tries to finish the enumeration process as soon as possible; therefore, it only spends time for promising nodes in the pattern tree and discards all other nodes. As an added bonus, the pattern tree notion permits FPF to be implemented and executed in a parallel manner since it is possible to traverse each path of the pattern tree independently. However, FPF is most useful for frequency concepts {{math|F<sub>2</sub>}} and {{math|F<sub>3</sub>}}, because downward closure is not applicable to {{math|F<sub>1</sub>}}. Nevertheless, the pattern tree is still practical for {{math|F<sub>1</sub>}} if the algorithm runs in parallel. Another advantage of the algorithm is that the implementation of this algorithm has no limitation on motif size, which makes it more amenable to improvements. The pseudocode of FPF (Mavisto) is shown below:
{| class="wikitable"
|-
! Mavisto
|-
| '''Data:''' Graph {{math|G}}, target pattern size {{math|t}}, frequency concept {{math|F}}
'''Result:''' Set {{math|R}} of patterns of size {{math|t}} with maximum frequency.
|-
| {{math|R ← φ}}, {{math|f<sub>max</sub> ← 0}}
{{math|P ←}}start pattern {{math|p1}} of size 1
{{math|M<sub>p<sub>1</sub></sub> ←}}all matches of {{math|p<sub>1</sub>}} in {{math|G}}
'''While''' {{math|P ≠ φ}} '''do'''
{{pad|1em}}{{math|P<sub>max</sub> ←}}select all patterns from {{math|P}} with maximum size.
{{pad|1em}}{{math|P ←}} select pattern with maximum frequency from {{math|P<sub>max</sub>}}
{{pad|1em}}{{math|Ε {{=}} ''ExtensionLoop''(G, p, M<sub>p</sub>)}}
{{pad|1em}}'''Foreach''' pattern {{math|p ∈ E}}
{{pad|2em}}'''If''' {{math|F {{=}} F<sub>1</sub>}} '''then''' {{math|f ← ''size''(M<sub>p</sub>)}}
{{pad|2em}}'''Else''' {{math|f ←}} ''Maximum Independent set''{{math|(F, M<sub>p</sub>)}}
{{pad|2em}}'''End'''
{{pad|2em}}'''If''' {{math|''size''(p) {{=}} t}} '''then'''
{{pad|3em}}'''If''' {{math|f {{=}} f<sub>max</sub>}} '''then''' {{math|R ← R ⋃ {p}}}
{{pad|3em}}'''Else if''' {{math|f > f<sub>max</sub>}} '''then''' {{math|R ← {p}}}; {{math|f<sub>max</sub> ← f}}
{{pad|3em}}'''End'''
{{pad|2em}}'''Else'''
{{pad|3em}}'''If''' {{math|F {{=}} F<sub>1</sub>}} '''or''' {{math|f ≥ f<sub>max</sub>}} '''then''' {{math|P ← P ⋃ {p}}}
{{pad|3em}}'''End'''
{{pad|2em}}'''End'''
{{pad|1em}}'''End'''
'''End'''
|}
===ESU (FANMOD)===
The sampling bias of Kashtan ''et al.'' <ref name="kas1" /> provided great impetus for designing better algorithms for the NM discovery problem. Although Kashtan ''et al.'' tried to settle this drawback by means of a weighting scheme, this method imposed an undesired overhead on the running time as well a more complicated implementation. This tool is one of the most useful ones, as it supports visual options and also is an efficient algorithm with respect to time. But, it has a limitation on motif size as it does not allow searching for motifs of size 9 or higher because of the way the tool is implemented.
Wernicke <ref name="wer1" /> introduced an algorithm named ''RAND-ESU'' that provides a significant improvement over ''mfinder''.<ref name="kas1" /> This algorithm, which is based on the exact enumeration algorithm ''ESU'', has been implemented as an application called ''FANMOD''.<ref name="wer1" /> ''RAND-ESU'' is a NM discovery algorithm applicable for both directed and undirected networks, effectively exploits an unbiased node sampling throughout the network, and prevents overcounting sub-graphs more than once. Furthermore, ''RAND-ESU'' uses a novel analytical approach called ''DIRECT'' for determining sub-graph significance instead of using an ensemble of random networks as a Null-model. The ''DIRECT'' method estimates the sub-graph concentration without explicitly generating random networks.<ref name="wer1" /> Empirically, the DIRECT method is more efficient in comparison with the random network ensemble in case of sub-graphs with a very low concentration; however, the classical Null-model is faster than the ''DIRECT'' method for highly concentrated sub-graphs.<ref name="mil1" /><ref name="wer1" /> In the following, we detail the ''ESU'' algorithm and then we show how this exact algorithm can be modified efficiently to ''RAND-ESU'' that estimates sub-graphs concentrations.
The algorithms ''ESU'' and ''RAND-ESU'' are fairly simple, and hence easy to implement. ''ESU'' first finds the set of all induced sub-graphs of size {{math|k}}, let {{math|S<sub>k</sub>}} be this set. ''ESU'' can be implemented as a recursive function; the running of this function can be displayed as a tree-like structure of depth {{math|k}}, called the ESU-Tree (see figure). Each of the ESU-Tree nodes indicate the status of the recursive function that entails two consecutive sets SUB and EXT. SUB refers to nodes in the target network that are adjacent and establish a partial sub-graph of size {{math|{{!}}SUB{{!}} ≤ k}}. If {{math|{{!}}SUB{{!}} {{=}} k}}, the algorithm has found an induced complete sub-graph, so {{math|S<sub>k</sub> {{=}} SUB ∪ S<sub>k</sub>}}. However, if {{math|{{!}}SUB{{!}} < k}}, the algorithm must expand SUB to achieve cardinality {{math|k}}. This is done by the EXT set that contains all the nodes that satisfy two conditions: First, each of the nodes in EXT must be adjacent to at least one of the nodes in SUB; second, their numerical labels must be larger than the label of first element in SUB. The first condition makes sure that the expansion of SUB nodes yields a connected graph and the second condition causes ESU-Tree leaves (see figure) to be distinct; as a result, it prevents overcounting. Note that, the EXT set is not a static set, so in each step it may expand by some new nodes that do not breach the two conditions. The next step of ESU involves classification of sub-graphs placed in the ESU-Tree leaves into non-isomorphic size-{{math|k}} graph classes; consequently, ESU determines sub-graphs frequencies and concentrations. This stage has been implemented simply by employing McKay's ''nauty'' algorithm,<ref name="mck1">{{cite journal |author=McKay BD |title=Practical graph isomorphism |journal=Congressus Numerantium |year=1981 |volume=30 |pages=45–87|bibcode=2013arXiv1301.1493M |arxiv=1301.1493 }}</ref><ref name="mck2">{{cite journal |author=McKay BD |title=Isomorph-free exhaustive generation |journal=Journal of Algorithms |year=1998 |volume=26 |issue=2 |pages=306–324 |doi=10.1006/jagm.1997.0898}}</ref> which classifies each sub-graph by performing a graph isomorphism test. Therefore, ESU finds the set of all induced {{math|k}}-size sub-graphs in a target graph by a recursive algorithm and then determines their frequency using an efficient tool.
The procedure of implementing ''RAND-ESU'' is quite straightforward and is one of the main advantages of ''FANMOD''. One can change the ''ESU'' algorithm to explore just a portion of the ESU-Tree leaves by applying a probability value {{math|0 ≤ p<sub>d</sub> ≤ 1}} for each level of the ESU-Tree and oblige ''ESU'' to traverse each child node of a node in level {{math|d-1}} with probability {{math|p<sub>d</sub>}}. This new algorithm is called ''RAND-ESU''. Evidently, when {{math|p<sub>d</sub> {{=}} 1}} for all levels, ''RAND-ESU'' acts like ''ESU''. For {{math|p<sub>d</sub> {{=}} 0}} the algorithm finds nothing. Note that, this procedure ensures that the chances of visiting each leaf of the ESU-Tree are the same, resulting in ''unbiased'' sampling of sub-graphs through the network. The probability of visiting each leaf is {{math|∏<sub>d</sub>p<sub>d</sub>}} and this is identical for all of the ESU-Tree leaves; therefore, this method guarantees unbiased sampling of sub-graphs from the network. Nonetheless, determining the value of {{math|p<sub>d</sub>}} for {{math|1 ≤ d ≤ k}} is another issue that must be determined manually by an expert to get precise results of sub-graph concentrations.<ref name="cir1" /> While there is no lucid prescript for this matter, the Wernicke provides some general observations that may help in determining p_d values. In summary, ''RAND-ESU'' is a very fast algorithm for NM discovery in the case of induced sub-graphs supporting unbiased sampling method. Although, the main ''ESU'' algorithm and so the ''FANMOD'' tool is known for discovering induced sub-graphs, there is trivial modification to ''ESU'' which makes it possible for finding non-induced sub-graphs, too. The pseudo code of ''ESU (FANMOD)'' is shown below:
[[File:ESU-Tree.jpg|thumb|(a) ''A target graph of size 5'', (b) ''the ESU-tree of depth k that is associated to the extraction of sub-graphs of size 3 in the target graph''. Leaves correspond to set S3 or all of the size-3 induced sub-graphs of the target graph (a). Nodes in the ESU-tree include two adjoining sets, the first set contains adjacent nodes called SUB and the second set named EXT holds all nodes that are adjacent to at least one of the SUB nodes and where their numerical labels are larger than the SUB nodes labels. The EXT set is utilized by the algorithm to expand a SUB set until it reaches a desired sub-graph size that are placed at the lowest level of ESU-Tree (or its leaves).]]
{| class="wikitable"
|-
! Enumeration of ESU (FANMOD)
|-
|'''''EnumerateSubgraphs(G,k)'''''
'''Input:''' A graph {{math|G {{=}} (V, E)}} and an integer {{math|1 ≤ k ≤ {{!}}V{{!}}}}.
'''Output:''' All size-{{math|k}} subgraphs in {{math|G}}.
'''for each''' vertex {{math|v ∈ V}} '''do'''
{{pad|2em}}{{math|VExtension ← {u ∈ N({v}) {{!}} u > v}}}
{{pad|2em}}'''call''' {{math|''ExtendSubgraph''({v}, VExtension, v)}}
'''endfor'''
|-
|'''''ExtendSubgraph(VSubgraph, VExtension, v)'''''
'''if''' {{math|{{!}}VSubgraph{{!}} {{=}} k}} '''then''' output {{math|G[VSubgraph]}} and '''return'''
'''while''' {{math|VExtension ≠ ∅}} '''do'''
{{pad|2em}}Remove an arbitrarily chosen vertex {{math|w}} from {{math|VExtension}}
{{pad|2em}}{{math|VExtension′ ← VExtension ∪ {u ∈ N<sub>excl</sub>(w, VSubgraph) {{!}} u > v}}}
{{pad|2em}}'''call''' {{math|''ExtendSubgraph''(VSubgraph ∪ {w}, VExtension′, v)}}
'''return'''
|}
===NeMoFinder===
Chen ''et al.'' <ref name="che1">{{cite conference |vauthors=Chen J, Hsu W, Li Lee M, etal |title=NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs |conference=the 12th ACM SIGKDD international conference on Knowledge discovery and data mining |year=2006 |location=Philadelphia, Pennsylvania, USA |pages=106–115}}</ref> introduced a new NM discovery algorithm called ''NeMoFinder'', which adapts the idea in ''SPIN'' <ref name="hua1">{{cite conference |vauthors=Huan J, Wang W, Prins J, etal |title=SPIN: mining maximal frequent sub-graphs from graph databases |conference=the 10th ACM SIGKDD international conference on Knowledge discovery and data mining |year=2004 |pages=581–586}}</ref> to extract frequent trees and after that expands them into non-isomorphic graphs.<ref name="cir1" /> ''NeMoFinder'' utilizes frequent size-n trees to partition the input network into a collection of size-{{math|n}} graphs, afterward finding frequent size-n sub-graphs by expansion of frequent trees edge-by-edge until getting a complete size-{{math|n}} graph {{math|K<sub>n</sub>}}. The algorithm finds NMs in undirected networks and is not limited to extracting only induced sub-graphs. Furthermore, ''NeMoFinder'' is an exact enumeration algorithm and is not based on a sampling method. As Chen ''et al.'' claim, ''NeMoFinder'' is applicable for detecting relatively large NMs, for instance, finding NMs up to size-12 from the whole ''S. cerevisiae'' (yeast) PPI network as the authors claimed.<ref name="uet1">{{cite journal |vauthors=Uetz P, Giot L, Cagney G, etal |title=A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae |journal=Nature |year=2000 |volume=403 |issue=6770 |pages=623–627 |doi=10.1038/35001009 |pmid=10688190|bibcode=2000Natur.403..623U }}</ref>
''NeMoFinder'' consists of three main steps. First, finding frequent size-{{math|n}} trees, then utilizing repeated size-n trees to divide the entire network into a collection of size-{{math|n}} graphs, finally, performing sub-graph join operations to find frequent size-n sub-graphs.<ref name="che1" /> In the first step, the algorithm detects all non-isomorphic size-{{math|n}} trees and mappings from a tree to the network. In the second step, the ranges of these mappings are employed to partition the network into size-n graphs. Up to this step, there is no distinction between ''NeMoFinder'' and an exact enumeration method. However, a large portion of non-isomorphic size-n graphs still remain. ''NeMoFinder'' exploits a heuristic to enumerate non-tree size-n graphs by the obtained information from the preceding steps. The main advantage of the algorithm is in the third step, which generates candidate sub-graphs from previously enumerated sub-graphs. This generation of new size-{{math|n}} sub-graphs is done by joining each previous sub-graph with derivative sub-graphs from itself called ''cousin sub-graphs''. These new sub-graphs contain one additional edge in comparison to the previous sub-graphs. However, there exist some problems in generating new sub-graphs: There is no clear method to derive cousins from a graph, joining a sub-graph with its cousins leads to redundancy in generating particular sub-graph more than once, and cousin determination is done by a canonical representation of the adjacency matrix which is not closed under join operation. ''NeMoFinder'' is an efficient network motif finding algorithm for motifs up to size 12 only for protein-protein interaction networks, which are presented as undirected graphs. And it is not able to work on directed networks which are so important in the field of complex and biological networks. The pseudocode of ''NeMoFinder'' is shown below:
{| class="wikitable"
|-
! NeMoFinder
|-
|'''Input:'''
{{math|G}} - PPI network;
{{math|N}} - Number of randomized networks;
{{math|K}} - Maximal network motif size;
{{math|F}} - Frequency threshold;
{{math|S}} - Uniqueness threshold;
'''Output:'''
{{math|U}} - Repeated and unique network motif set;
{{math|D ← ∅}};
'''for''' motif-size {{math|k}} '''from''' 3 '''to''' {{math|K}} '''do'''
{{pad|1em}}{{math|T ← ''FindRepeatedTrees''(k)}};
{{pad|1em}}{{math|GD<sub>k</sub> ← ''GraphPartition''(G, T)}}
{{pad|1em}}{{math|D ← D ∪ T}};
{{pad|1em}}{{math|D′ ← T}};
{{pad|1em}}{{math|i ← k}};
{{pad|1em}}'''while''' {{math|D′ ≠ ∅}} '''and''' {{math|i ≤ k × (k - 1) / 2}} '''do'''
{{pad|2em}}{{math|D′ ← ''FindRepeatedGraphs''(k, i, D′)}};
{{pad|2em}}{{math|D ← D ∪ D′}};
{{pad|2em}}{{math|i ← i + 1}};
{{pad|1em}}'''end while'''
'''end for'''
'''for''' counter {{math|i}} '''from''' 1 '''to''' {{math|N}} '''do'''
{{pad|1em}}{{math|G<sub>rand</sub> ← ''RandomizedNetworkGeneration''()}};
{{pad|1em}}'''for each''' {{math|g ∈ D}} '''do'''
{{pad|2em}}{{math|''GetRandFrequency''(g, G<sub>rand</sub>)}};
{{pad|1em}}'''end for'''
'''end for'''
{{math|U ← ∅}};
'''for each''' {{math|g ∈ D}} '''do'''
{{pad|1em}}{{math|s ← ''GetUniqunessValue''(g)}};
{{pad|1em}}'''if''' {{math|s ≥ S}} '''then'''
{{pad|2em}}{{math|U ← U ∪ {g}}};
{{pad|1em}}'''end if'''
'''end for'''
'''return''' {{math|U}};
|}
===Grochow–Kellis===
Grochow and Kellis <ref name="gro1">{{cite conference|vauthors=Grochow JA, Kellis M |title=Network Motif Discovery Using Sub-graph Enumeration and Symmetry-Breaking |conference=RECOMB |year=2007 |pages=92–106| doi=10.1007/978-3-540-71681-5_7| url=http://www.cs.colorado.edu/~jgrochow/Grochow_Kellis_RECOMB_07_Network_Motifs.pdf}}</ref> proposed an ''exact'' algorithm for enumerating sub-graph appearances. The algorithm is based on a ''motif-centric'' approach, which means that the frequency of a given sub-graph,called the ''query graph'', is exhaustively determined by searching for all possible mappings from the query graph into the larger network. It is claimed <ref name="gro1" /> that a ''motif-centric'' method in comparison to ''network-centric'' methods has some beneficial features. First of all it avoids the increased complexity of sub-graph enumeration. Also, by using mapping instead of enumerating, it enables an improvement in the isomorphism test. To improve the performance of the algorithm, since it is an inefficient exact enumeration algorithm, the authors introduced a fast method which is called ''symmetry-breaking conditions''. During straightforward sub-graph isomorphism tests, a sub-graph may be mapped to the same sub-graph of the query graph multiple times. In the Grochow–Kellis (GK) algorithm symmetry-breaking is used to avoid such multiple mappings. Here we introduce the GK algorithm and the symmetry-breaking condition which eliminates redundant isomorphism tests.
[[File:Automorphisms of a subgraph.jpg|thumb|(a) ''graph G'', (b) ''illustration of all automorphisms of G that is showed in (a)''. From set AutG we can obtain a set of symmetry-breaking conditions of G given by SymG in (c). Only the first mapping in AutG satisfies the SynG conditions; as a result, by applying SymG in the Isomorphism Extension module the algorithm only enumerate each match-able sub-graph in the network to G once. Note that SynG is not necessarily a unique set for an arbitrary graph G.]]
The GK algorithm discovers the whole set of mappings of a given query graph to the network in two major steps. It starts with the computation of symmetry-breaking conditions of the query graph. Next, by means of a branch-and-bound method, the algorithm tries to find every possible mapping from the query graph to the network that meets the associated symmetry-breaking conditions. An example of the usage of symmetry-breaking conditions in GK algorithm is demonstrated in figure.
As it is mentioned above, the symmetry-breaking technique is a simple mechanism that precludes spending time finding a sub-graph more than once due to its symmetries.<ref name="gro1" /><ref name="gro2">{{cite conference|author=Grochow JA |title=On the structure and evolution of protein interaction networks |conference=Thesis M. Eng., Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science|year=2006| url=http://www.cs.toronto.edu/~jgrochow/Grochow_MIT_Masters_06_PPI_Networks.pdf}}</ref> Note that, computing symmetry-breaking conditions requires finding all automorphisms of a given query graph. Even though, there is no efficient (or polynomial time) algorithm for the graph automorphism problem, this problem can be tackled efficiently in practice by McKay's tools.<ref name="mck1" /><ref name="mck2" /> As it is claimed, using symmetry-breaking conditions in NM detection lead to save a great deal of running time. Moreover, it can be inferred from the results in <ref name="gro1" /><ref name="gro2" /> that using the symmetry-breaking conditions results in high efficiency particularly for directed networks in comparison to undirected networks. The symmetry-breaking conditions used in the GK algorithm are similar to the restriction which ''ESU'' algorithm applies to the labels in EXT and SUB sets. In conclusion, the GK algorithm computes the exact number of appearance of a given query graph in a large complex network and exploiting symmetry-breaking conditions improves the algorithm performance. Also, GK algorithm is one of the known algorithms having no limitation for motif size in implementation and potentially it can find motifs of any size.
===Color-coding approach===
Most algorithms in the field of NM discovery are used to find induced sub-graphs of a network. In 2008, Noga Alon ''et al.'' <ref name="alo1">{{cite journal|author1=Alon N |author2=Dao P |author3=Hajirasouliha I |author4=Hormozdiari F |author5=Sahinalp S.C |title=Biomolecular network motif counting and discovery by color coding |journal=Bioinformatics |year=2008 |volume=24 |issue=13 |pages=i241–i249 |doi=10.1093/bioinformatics/btn163|pmid=18586721 |pmc=2718641 }}</ref> introduced an approach for finding non-induced sub-graphs too. Their technique works on undirected networks such as PPI ones. Also, it counts non-induced trees and bounded treewidth sub-graphs. This method is applied for sub-graphs of size up to 10.
This algorithm counts the number of non-induced occurrences of a tree {{math|T}} with {{math|k {{=}} O(logn)}} vertices in a network {{math|G}} with {{math|n}} vertices as follows:
# '''Color coding.''' Color each vertex of input network G independently and uniformly at random with one of the {{math|k}} colors.
# '''Counting.''' Apply a dynamic programming routine to count the number of non-induced occurrences of {{math|T}} in which each vertex has a unique color. For more details on this step, see.<ref name="alo1" />
# Repeat the above two steps {{math|O(e<sup>k</sup>)}} times and add up the number of occurrences of {{math|T}} to get an estimate on the number of its occurrences in {{math|G}}.
As available PPI networks are far from complete and error free, this approach is suitable for NM discovery for such networks. As Grochow–Kellis Algorithm and this one are the ones popular for non-induced sub-graphs, it is worth to mention that the algorithm introduced by Alon ''et al.'' is less time-consuming than the Grochow–Kellis Algorithm.<ref name="alo1" />
===MODA===
Omidi ''et al.'' <ref name="omi1">{{cite journal|vauthors=Omidi S, Schreiber F, Masoudi-Nejad A |title=MODA: an efficient algorithm for network motif discovery in biological networks |journal=Genes Genet Syst |year=2009 |volume=84 |issue=5 |pages=385–395 |doi=10.1266/ggs.84.385|pmid=20154426 |doi-access=free }}</ref> introduced a new algorithm for motif detection named ''MODA'' which is applicable for induced and non-induced NM discovery in undirected networks. It is based on the motif-centric approach discussed in the Grochow–Kellis algorithm section. It is very important to distinguish motif-centric algorithms such as MODA and GK algorithm because of their ability to work as query-finding algorithms. This feature allows such algorithms to be able to find a single motif query or a small number of motif queries (not all possible sub-graphs of a given size) with larger sizes. As the number of possible non-isomorphic sub-graphs increases exponentially with sub-graph size, for large size motifs (even larger than 10), the network-centric algorithms, those looking for all possible sub-graphs, face a problem. Although motif-centric algorithms also have problems in discovering all possible large size sub-graphs, but their ability to find small numbers of them is sometimes a significant property.
Using a hierarchical structure called an ''expansion tree'', the ''MODA'' algorithm is able to extract NMs of a given size systematically and similar to ''FPF'' that avoids enumerating unpromising sub-graphs; ''MODA'' takes into consideration potential queries (or candidate sub-graphs) that would result in frequent sub-graphs. Despite the fact that ''MODA'' resembles ''FPF'' in using a tree like structure, the expansion tree is applicable merely for computing frequency concept {{math|F<sub>1</sub>}}. As we will discuss next, the advantage of this algorithm is that it does not carry out the sub-graph isomorphism test for ''non-tree'' query graphs. Additionally, it utilizes a sampling method in order to speed up the running time of the algorithm.
Here is the main idea: by a simple criterion one can generalize a mapping of a k-size graph into the network to its same size supergraphs. For example, suppose there is mapping {{math|f(G)}} of graph {{math|G}} with {{math|k}} nodes into the network and we have a same size graph {{math|G′}} with one more edge {{math|&langu, v⟩}}; {{math|f<sub>G</sub>}} will map {{math|G′}} into the network, if there is an edge {{math|⟨f<sub>G</sub>(u), f<sub>G</sub>(v)⟩}} in the network. As a result, we can exploit the mapping set of a graph to determine the frequencies of its same order supergraphs simply in {{math|O(1)}} time without carrying out sub-graph isomorphism testing. The algorithm starts ingeniously with minimally connected query graphs of size k and finds their mappings in the network via sub-graph isomorphism. After that, with conservation of the graph size, it expands previously considered query graphs edge-by-edge and computes the frequency of these expanded graphs as mentioned above. The expansion process continues until reaching a complete graph {{math|K<sub>k</sub>}} (fully connected with {{math|{{frac|k(k-1)|2}}}} edge).
As discussed above, the algorithm starts by computing sub-tree frequencies in the network and then expands sub-trees edge by edge. One way to implement this idea is called the expansion tree {{math|T<sub>k</sub>}} for each {{math|k}}. Figure shows the expansion tree for size-4 sub-graphs. {{math|T<sub>k</sub>}} organizes the running process and provides query graphs in a hierarchical manner. Strictly speaking, the expansion tree {{math|T<sub>k</sub>}} is simply a [[directed acyclic graph]] or DAG, with its root number {{math|k}} indicating the graph size existing in the expansion tree and each of its other nodes containing the adjacency matrix of a distinct {{math|k}}-size query graph. Nodes in the first level of {{math|T<sub>k</sub>}} are all distinct {{math|k}}-size trees and by traversing {{math|T<sub>k</sub>}} in depth query graphs expand with one edge at each level. A query graph in a node is a sub-graph of the query graph in a node's child with one edge difference. The longest path in {{math|T<sub>k</sub>}} consists of {{math|(k<sup>2</sup>-3k+4)/2}} edges and is the path from the root to the leaf node holding the complete graph. Generating expansion trees can be done by a simple routine which is explained in.<ref name="omi1" />
''MODA'' traverses {{math|T<sub>k</sub>}} and when it extracts query trees from the first level of {{math|T<sub>k</sub>}} it computes their mapping sets and saves these mappings for the next step. For non-tree queries from {{math|T<sub>k</sub>}}, the algorithm extracts the mappings associated with the parent node in {{math|T<sub>k</sub>}} and determines which of these mappings can support the current query graphs. The process will continue until the algorithm gets the complete query graph. The query tree mappings are extracted using the Grochow–Kellis algorithm. For computing the frequency of non-tree query graphs, the algorithm employs a simple routine that takes {{math|O(1)}} steps. In addition, ''MODA'' exploits a sampling method where the sampling of each node in the network is linearly proportional to the node degree, the probability distribution is exactly similar to the well-known Barabási-Albert preferential attachment model in the field of complex networks.<ref name="bar1">{{cite journal|vauthors=Barabasi AL, Albert R |title=Emergence of scaling in random networks |journal=Science |year=1999 |volume=286 |issue=5439 |pages=509–512 |doi=10.1126/science.286.5439.509 |pmid=10521342|bibcode=1999Sci...286..509B |arxiv=cond-mat/9910332 }}</ref> This approach generates approximations; however, the results are almost stable in different executions since sub-graphs aggregate around highly connected nodes.<ref name="vaz1">{{cite journal |vauthors=Vázquez A, Dobrin R, Sergi D, etal |title=The topological relationship between the large-scale attributes and local interaction patterns of complex networks |journal=PNAS |year=2004 |volume=101 |issue=52 |pages=17940–17945 |doi=10.1073/pnas.0406024101|pmid=15598746 |pmc=539752 |bibcode=2004PNAS..10117940V |arxiv=cond-mat/0408431 }}</ref> The pseudocode of ''MODA'' is shown below:
[[File:Expansion Tree.jpg|thumb|''Illustration of the expansion tree T4 for 4-node query graphs''. At the first level, there are non-isomorphic k-size trees and at each level, an edge is added to the parent graph to form a child graph. In the second level, there is a graph with two alternative edges that is shown by a dashed red edge. In fact, this node represents two expanded graphs that are isomorphic.<ref name="omi1" />]]
{| class="wikitable"
|-
! MODA
|-
|'''Input:''' {{math|G}}: Input graph, {{math|k}}: sub-graph size, {{math|Δ}}: threshold value
'''Output:''' Frequent Subgraph List: List of all frequent {{math|k}}-size sub-graphs
'''Note:''' {{math|F<sub>G</sub>}}: set of mappings from {{math|G}} in the input graph {{math|G}}
'''fetch''' {{math|T<sub>k</sub>}}
'''do'''
{{pad|1em}}{{math|G′ {{=}} ''Get-Next-BFS''(T<sub>k</sub>)}} // {{math|G′}} is a query graph
{{pad|1em}}if {{math|{{!}}E(G′){{!}} {{=}} (k – 1)}}
{{pad|1em}}'''call''' {{math|''Mapping-Module''(G′, G)}}
{{pad|1em}}'''else'''
{{pad|2em}}'''call''' {{math|''Enumerating-Module''(G′, G, T<sub>k</sub>)}}
{{pad|1em}}'''end if'''
{{pad|1em}}'''save''' {{math|F<sub>2</sub>}}
{{pad|1em}}'''if''' {{math|{{!}}F<sub>G</sub>{{!}} > Δ}} '''then'''
{{pad|2em}}add {{math|G′}} into Frequent Subgraph List
{{pad|1em}}'''end if'''
'''Until''' {{math|{{!}}E(G'){{!}} {{=}} (k – 1)/2}}
'''return''' Frequent Subgraph List
|}
===Kavosh===
A recently introduced algorithm named ''Kavosh'' <ref name="kash1">{{cite journal|vauthors=Kashani ZR, Ahrabian H, Elahi E, Nowzari-Dalini A, Ansari ES, Asadi S, Mohammadi S, Schreiber F, Masoudi-Nejad A |title=Kavosh: a new algorithm for finding network motifs |journal=BMC Bioinformatics |year=2009 |volume=10 |issue=318|pages=318 |doi=10.1186/1471-2105-10-318 |pmid=19799800 |pmc=2765973}} {{open access}}</ref> aims at improved main memory usage. ''Kavosh'' is usable to detect NM in both directed and undirected networks. The main idea of the enumeration is similar to the ''GK'' and ''MODA'' algorithms, which first find all {{math|k}}-size sub-graphs that a particular node participated in, then remove the node, and subsequently repeat this process for the remaining nodes.<ref name="kash1" />
For counting the sub-graphs of size {{math|k}} that include a particular node, trees with maximum depth of k, rooted at this node and based on neighborhood relationship are implicitly built. Children of each node include both incoming and outgoing adjacent nodes. To descend the tree, a child is chosen at each level with the restriction that a particular child can be included only if it has not been included at any upper level. After having descended to the lowest level possible, the tree is again ascended and the process is repeated with the stipulation that nodes visited in earlier paths of a descendant are now considered unvisited nodes. A final restriction in building trees is that all children in a particular tree must have numerical labels larger than the label of the root of the tree. The restrictions on the labels of the children are similar to the conditions which ''GK'' and ''ESU'' algorithm use to avoid overcounting sub-graphs.
The protocol for extracting sub-graphs makes use of the compositions of an integer. For the extraction of sub-graphs of size {{math|k}}, all possible compositions of the integer {{math|k-1}} must be considered. The compositions of {{math|k-1}} consist of all possible manners of expressing {{math|k-1}} as a sum of positive integers. Summations in which the order of the summands differs are considered distinct. A composition can be expressed as {{math|k<sub>2</sub>,k<sub>3</sub>,…,k<sub>m</sub>}} where {{math|k<sub>2</sub> + k<sub>3</sub> + … + k<sub>m</sub> {{=}} k-1}}. To count sub-graphs based on the composition, {{math|k<sub>i</sub>}} nodes are selected from the {{math|i}}-th level of the tree to be nodes of the sub-graphs ({{math|i {{=}} 2,3,…,m}}). The {{math|k-1}} selected nodes along with the node at the root define a sub-graph within the network. After discovering a sub-graph involved as a match in the target network, in order to be able to evaluate the size of each class according to the target network, ''Kavosh'' employs the ''nauty'' algorithm <ref name="mck1" /><ref name="mck2" /> in the same way as ''FANMOD''. The enumeration part of Kavosh algorithm is shown below:
{| class="wikitable"
|-
! Enumeration of Kavosh
|-
|'''''Enumerate_Vertex(G, u, S, Remainder, i)'''''
'''Input:''' {{math|G}}: Input graph<br>
{{pad|3em}}{{math|u}}: Root vertex<br>
{{pad|3em}}{{math|S}}: selection ({{math|S {{=}} { S<sub>0</sub>,S<sub>1</sub>,...,S<sub>k-1</sub>}}} is an array of the set of all {{math|S<sub>i</sub>}})<br>
{{pad|3em}}{{math|Remainder}}: number of remaining vertices to be selected<br>
{{pad|3em}}{{math|i}}: Current depth of the tree.<br>
'''Output:''' all {{math|k}}-size sub-graphs of graph {{math|G}} rooted in {{math|u}}.
'''if''' {{math|Remainder {{=}} 0}} '''then'''<br>
{{pad|1em}}'''return'''<br>
'''else'''<br>
{{pad|1em}}{{math|ValList ← ''Validate''(G, S<sub>i-1</sub>, u)}}<br>
{{pad|1em}}{{math|n<sub>i</sub> ← ''Min''({{!}}ValList{{!}}, Remainder)}}<br>
{{pad|1em}}'''for''' {{math|k<sub>i</sub> {{=}} 1}} '''to''' {{math|n<sub>i</sub>}} '''do'''<br>
{{pad|2em}}{{math|C ← ''Initial_Comb''(ValList, k<sub>i</sub>)}}<br>
{{pad|2em}}(Make the first vertex combination selection according)<br>
{{pad|2em}}'''repeat'''<br>
{{pad|3em}}{{math|S<sub>i</sub> ← C}}<br>
{{pad|3em}}{{math|''Enumerate_Vertex''(G, u, S, Remainder-k<sub>i</sub>, i+1)}}<br>
{{pad|3em}}{{math|''Next_Comb''(ValList, k<sub>i</sub>)}}<br>
{{pad|3em}}(Make the next vertex combination selection according)<br>
{{pad|2em}}'''until''' {{math|C {{=}} NILL}}<br>
{{pad|2em}}'''end for'''<br>
{{pad|1em}}'''for each''' {{math|v ∈ ValList}} '''do'''<br>
{{pad|2em}}{{math|Visited[v] ← false}}<br>
{{pad|1em}}'''end for'''<br>
'''end if'''
|-
|'''''Validate(G, Parents, u)'''''<br>
'''Input:''' {{math|G}}: input graph, {{math|Parents}}: selected vertices of last layer, {{math|u}}: Root vertex.<br>
'''Output:''' Valid vertices of the current level.
{{math|ValList ← NILL}}<br>
'''for each''' {{math|v ∈ Parents}} '''do'''<br>
{{pad|1em}}'''for each''' {{math|w ∈ Neighbor[u]}} '''do'''<br>
{{pad|2em}}'''if''' {{math|label[u] < label[w]}} '''AND NOT''' {{math|Visited[w]}} '''then'''<br>
{{pad|3em}}{{math|Visited[w] ← true}}<br>
{{pad|3em}}{{math|ValList {{=}} ValList + w}}<br>
{{pad|2em}}'''end if'''<br>
{{pad|1em}}'''end for'''<br>
'''end for'''<br>
'''return''' {{math|ValList}}<br>
|}
Recently a ''Cytoscape'' plugin called ''CytoKavosh'' <ref name="mas2">{{cite journal|author1=Ali Masoudi-Nejad |author2=Mitra Anasariola |author3=Ali Salehzadeh-Yazdi |author4=Sahand Khakabimamaghani |title=CytoKavosh: a Cytoscape Plug-in for Finding Network Motifs in Large Biological Networks |journal=PLoS ONE |volume=7 |issue=8 |pages=e43287 |year=2012 |doi=10.1371/journal.pone.0043287|pmid=22952659 |pmc=3430699 |bibcode=2012PLoSO...743287M }} {{open access}}</ref> is developed for this software. It is available via ''Cytoscape'' web page [http://apps.cytoscape.org/apps/cytokavosh].
===G-Tries===
In 2010, Pedro Ribeiro and Fernando Silva proposed a novel data structure for storing a collection of sub-graphs, called a ''g-trie''.<ref name="rib1">{{cite conference|vauthors=Ribeiro P, Silva F |title=G-Tries: an efficient data structure for discovering network motifs |conference=ACM 25th Symposium On Applied Computing - Bioinformatics Track |location=Sierre, Switzerland |year=2010 |pages=1559–1566 |url=http://www.nrcbioinformatics.ca/acmsac2010/}}</ref> This data structure, which is conceptually akin to a prefix tree, stores sub-graphs according to their structures and finds occurrences of each of these sub-graphs in a larger graph. One of the noticeable aspects of this data structure is that coming to the network motif discovery, the sub-graphs in the main network are needed to be evaluated. So, there is no need to find the ones in random network which are not in the main network. This can be one of the time-consuming parts in the algorithms in which all sub-graphs in random networks are derived.
A ''g-trie'' is a multiway tree that can store a collection of graphs. Each tree node contains information about a single graph vertex and its corresponding edges to ancestor nodes. A path from the root to a leaf corresponds to one single graph. Descendants of a g-trie node share a common sub-graph. Constructing a ''g-trie'' is well described in.<ref name="rib1" /> After constructing a ''g-trie'', the counting part takes place. The main idea in counting process is to backtrack by all possible sub-graphs, but at the same time do the isomorphism tests. This backtracking technique is essentially the same technique employed by other motif-centric approaches like ''MODA'' and ''GK'' algorithms. Taking advantage of common substructures in the sense that at a given time there is a partial isomorphic match for several different candidate sub-graphs.
Among the mentioned algorithms, ''G-Tries'' is the fastest. But, the excessive use of memory is the drawback of this algorithm, which might limit the size of discoverable motifs by a personal computer with average memory.
===Comparison===
Tables and figure below show the results of running the mentioned algorithms on different standard networks. These results are taken from the corresponding sources,<ref name="omi1" /><ref name="kash1" /><ref name="rib1" /> thus they should be treated individually.
[[Image:Runtimes of algorithms.jpg|thumb|''Runtimes of Grochow–Kellis, mfinder, FANMOD, FPF and MODA for subgraphs from three nodes up to nine nodes''.<ref name="omi1" />]]
{|class="wikitable"
|+ Runtimes of Grochow–Kellis, FANMOD, and G-Trie for subgraphs from three nodes up to nine nodes on five different networks.<ref name="rib1" />
|-
!rowspan="2"|Network
!rowspan="2"|Size
!colspan="3"|Census Original Network
!colspan="3"|Average Census on Random Networks
|-
!FANMOD
!GK
!G-Trie
!FANMOD
!GK
!G-Trie
|-
|rowspan="5"|Dolphins
|5 || 0.07 || 0.03 || 0.01 || 0.13 || 0.04 || 0.01
|-
|6||0.48||0.28||0.04||1.14||0.35||0.07
|-
|7||3.02||3.44||0.23||8.34||3.55||0.46
|-
|8||19.44||73.16||1.69||67.94||37.31||4.03
|-
|9||100.86||2984.22||6.98||493.98||366.79||24.84
|-
|rowspan="3"|Circuit
|6||0.49||0.41||0.03||0.55||0.24||0.03
|-
|7||3.28||3.73||0.22||3.53||1.34||0.17
|-
|8||17.78||48.00||1.52||21.42||7.91||1.06
|-
|rowspan="3"|Social
|3||0.31||0.11||0.02||0.35||0.11||0.02
|-
|4||7.78||1.37||0.56||13.27||1.86||0.57
|-
|5||208.30||31.85||14.88||531.65||62.66||22.11
|-
|rowspan="3"|Yeast
|3||0.47||0.33||0.02||0.57||0.35||0.02
|-
|4||10.07||2.04||0.36||12.90||2.25||0.41
|-
|5||268.51||34.10||12.73||400.13||47.16||14.98
|-
|rowspan="5"|Power
|3||0.51||1.46||0.00||0.91||1.37||0.01
|-
|4||1.38||4.34||0.02||3.01||4.40||0.03
|-
|5||4.68||16.95||0.10||12.38||17.54||0.14
|-
|6||20.36||95.58||0.55||67.65||92.74||0.88
|-
|7||101.04||765.91||3.36||408.15||630.65||5.17
|}
{|class="wikitable"
|+ Runtimes of mfinder, FANMOD, Mavisto and Kavosh for subgraphs from three nodes up to ten nodes on three different networks.<ref name="kash1" />
|-
!
!Size →
!rowspan="2"|3
!rowspan="2"|4
!rowspan="2"|5
!rowspan="2"|6
!rowspan="2"|7
!rowspan="2"|8
!rowspan="2"|9
!rowspan="2"|10
|-
!Networks ↓
!Algorithms ↓
|-
|rowspan="4"|E. coli
|Kavosh||0.30||1.84||14.91||141.98||1374.0||13173.7||121110.3||1120560.1
|-
|FANMOD||0.81||2.53||15.71||132.24||1205.9||9256.6||-||-
|-
|Mavisto||13532||-||-||-||-||-||-||-
|-
|Mfinder||31.0||297||23671||-||-||-||-||-
|-
|rowspan="4"|Electronic
|Kavosh||0.08||0.36||8.02||11.39||77.22||422.6||2823.7||18037.5
|-
|FANMOD||0.53||1.06||4.34||24.24||160||967.99||-||-
|-
|Mavisto||210.0||1727||-||-||-||-||-||-
|-
|Mfinder||7||14||109.8||2020.2||-||-||-||-
|-
|rowspan="4"|Social
|Kavosh||0.04||0.23||1.63||10.48||69.43||415.66||2594.19||14611.23
|-
|FANMOD||0.46||0.84||3.07||17.63||117.43||845.93||-||-
|-
|Mavisto||393||1492||-||-||-||-||-||-
|-
|Mfinder||12||49||798||181077||-||-||-||-
|}
===Classification of algorithms===
As seen in the table, motif discovery algorithms can be divided into two general categories: those based on exact counting and those using statistical sampling and estimations instead. Because the second group does not count all the occurrences of a subgraph in the main network, the algorithms belonging to this group are faster, but they might yield in biased and unrealistic results.
In the next level, the exact counting algorithms can be classified to network-centric and subgraph-centric methods. The algorithms of the first class search the given network for all subgraphs of a given size, while the algorithms falling into the second class first generate different possible non-isomorphic graphs of the given size, and then explore the network for each generated subgraph separately. Each approach has its advantages and disadvantages which are discussed above.
On the other hand, estimation methods might utilize color-coding approach as described before. Other approaches used in this category usually skip some subgraphs during enumeration (e.g., as in FANMOD) and base their estimation on the enumerated subgraphs.
Furthermore, table indicates whether an algorithm can be used for directed or undirected networks as well as induced or non-induced subgraphs. For more information refer to the provided web links or lab addresses.
{|class="wikitable"
|+ Classification of Motif Discovery Algorithms
|-
!Counting Method
!Basis
!Name
!Directed / Undirected
!Induced / Non-Induced
|-
| rowspan="9" |Exact
| rowspan="5" |Network-Centric
|[http://www.weizmann.ac.il/mcb/UriAlon/ mfinder]||Both||Induced
|-
|[http://theinf1.informatik.uni-jena.de/~wernicke/motifs/ ESU (FANMOD)]||Both||Induced
|-
|[http://lbb.ut.ac.ir/Download/LBBsoft/Kavosh/ Kavosh] (used in [http://apps.cytoscape.org/apps/cytokavosh CytoKavosh])||Both||Induced
|-
|[http://www.dcc.fc.up.pt/gtries/ G-Tries]||Both||Induced
|-
|[http://nesreenahmed.com/graphlets PGD]
|Undirected
|Induced
|-
|rowspan="4"|Subgraph-Centric
|[http://mavisto.ipk-gatersleben.de/ FPF (Mavisto)]||Both||Induced
|-
|[https://www.msu.edu/~jinchen/ NeMoFinder]||Undirected||Induced
|-
|[http://people.cs.uchicago.edu/~joshuag/index.html Grochow–Kellis]||Both||Both
|-
|[http://lbb.ut.ac.ir/Download/LBBsoft/MODA/ MODA]||Both||Both
|-
|rowspan="3"|Estimation / Sampling
|Color-Coding Approach
|[http://www.math.tau.ac.il/~nogaa/ N. Alon] ''et al.''’s Algorithm||Undirected||Non-Induced
|-
|rowspan="2"|Other Approaches
|[http://www.weizmann.ac.il/mcb/UriAlon/ mfinder]||Both||Induced
|-
|[http://theinf1.informatik.uni-jena.de/~wernicke/motifs/ ESU (FANMOD)]||Both||Induced
|}
{|class="wikitable"
|+ Addresses of Designers of Algorithms
|-
!Algorithm
!Lab / Dept. Name
!Department / School
!Institute
!Address
!E-Mail
|-
|[http://www.weizmann.ac.il/mcb/UriAlon/ mfinder]||Uri Alon's Group||Department of Molecular Cell Biology||Weizmann Institute of Science||Rehovot, Israel, Wolfson, Rm. 607||urialon at weizmann.ac.il
|-
|[http://mavisto.ipk-gatersleben.de/ FPF (Mavisto)]||----||----||Leibniz-Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK)||Corrensstraße 3, D-06466 Stadt Seeland, OT Gatersleben, Deutschland||schreibe at ipk-gatersleben.de
|-
|[http://theinf1.informatik.uni-jena.de/~wernicke/motifs/ ESU (FANMOD)]||Lehrstuhl Theoretische Informatik I||Institut für Informatik||Friedrich-Schiller-Universität Jena||Ernst-Abbe-Platz 2,D-07743 Jena, Deutschland||sebastian.wernicke at gmail.com
|-
|[https://www.msu.edu/~jinchen/ NeMoFinder]||----||School of Computing||National University of Singapore||Singapore 119077||chenjin at comp.nus.edu.sg
|-
|[http://www.cs.colorado.edu/~jgrochow/ Grochow–Kellis]||CS Theory Group & Complex Systems Group||Computer Science||University of Colorado, Boulder||1111 Engineering Dr. ECOT 717, 430 UCB Boulder, CO 80309-0430 USA||jgrochow at colorado.edu
|-
|[http://www.math.tau.ac.il/~nogaa/ N. Alon] ''et al.''’s Algorithm||Department of Pure Mathematics||School of Mathematical Sciences||Tel Aviv University||Tel Aviv 69978, Israel||nogaa at post.tau.ac.il
|-
|[http://lbb.ut.ac.ir/Download/LBBsoft/MODA/ MODA]||Laboratory of Systems Biology and Bioinformatics (LBB)||Institute of Biochemistry and Biophysics (IBB)||University of Tehran||Enghelab Square, Enghelab Ave, Tehran, Iran||amasoudin at ibb.ut.ac.ir
|-
|[http://lbb.ut.ac.ir/Download/LBBsoft/Kavosh/ Kavosh] (used in [http://apps.cytoscape.org/apps/cytokavosh CytoKavosh])||Laboratory of Systems Biology and Bioinformatics (LBB)||Institute of Biochemistry and Biophysics (IBB)||University of Tehran||Enghelab Square, Enghelab Ave, Tehran, Iran||amasoudin at ibb.ut.ac.ir
|-
|[http://www.dcc.fc.up.pt/gtries/ G-Tries]||Center for Research in Advanced Computing Systems||Computer Science||University of Porto||Rua Campo Alegre 1021/1055, Porto, Portugal||pribeiro at dcc.fc.up.pt
|-
|[http://nesreenahmed.com/graphlets PGD]
|Network Learning and Discovery Lab
|Department of Computer Science
|Purdue University
|Purdue University, 305 N University St, West Lafayette, IN 47907
|nkahmed@purdue.edu
|}
==Well-established motifs and their functions==
Much experimental work has been devoted to understanding network motifs in [[gene regulatory networks]]. These networks control which genes are expressed in the cell in response to biological signals. The network is defined such that genes are nodes, and directed edges represent the control of one gene by a transcription factor (regulatory protein that binds DNA) encoded by another gene. Thus, network motifs are patterns of genes regulating each other's transcription rate. When analyzing transcription networks, it is seen that the same network motifs appear again and again in diverse organisms from bacteria to human. The transcription network of ''[[Escherichia coli|E. coli]]'' and yeast, for example, is made of three main motif families, that make up almost the entire network. The leading hypothesis is that the network motif were independently selected by evolutionary processes in a converging manner,<ref name="bab1">{{cite journal |vauthors=Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA |title=Structure and evolution of transcriptional regulatory networks |journal=Current Opinion in Structural Biology |volume=14 |issue=3 |pages=283–91 |date=June 2004 |pmid=15193307 |doi=10.1016/j.sbi.2004.05.004 |citeseerx=10.1.1.471.9692 }}</ref><ref name="con1">{{cite journal |vauthors=Conant GC, Wagner A |title=Convergent evolution of gene circuits |journal=Nat. Genet. |volume=34 |issue=3 |pages=264–6 |date=July 2003 |pmid=12819781 |doi=10.1038/ng1181}}</ref> since the creation or elimination of regulatory interactions is fast on evolutionary time scale, relative to the rate at which genes change,<ref name="bab1"/><ref name="con1"/><ref name="dek1">{{cite journal |vauthors=Dekel E, Alon U |title=Optimality and evolutionary tuning of the expression level of a protein |journal=Nature |volume=436 |issue=7050 |pages=588–92 |date=July 2005 |pmid=16049495 |doi=10.1038/nature03842 |bibcode=2005Natur.436..588D }}</ref> Furthermore, experiments on the dynamics generated by network motifs in living cells indicate that they have characteristic dynamical functions. This suggests that the network motif serve as building blocks in gene regulatory networks that are beneficial to the organism.
The functions associated with common network motifs in transcription networks were explored and demonstrated by several research projects both theoretically and experimentally. Below are some of the most common network motifs and their associated function.
===Negative auto-regulation (NAR)===
[[Image:Autoregulation motif.png|thumb|Schematic representation of an auto-regulation motif]]
One of simplest and most abundant network motifs in ''[[Escherichia coli|E. coli]]'' is negative auto-regulation in which a transcription factor (TF) represses its own transcription. This motif was shown to perform two important functions. The first function is response acceleration. NAR was shown to speed-up the response to signals both theoretically <ref name="zab1">{{cite journal |doi=10.1016/j.jtbi.2011.06.021 |author=Zabet NR |title=Negative feedback and physical limits of genes |journal=Journal of Theoretical Biology |volume= 284|issue=1 |pages=82–91 |date=September 2011 |pmid=21723295 |arxiv=1408.1869 |citeseerx=10.1.1.759.5418 }}</ref> and experimentally. This was first shown in a synthetic transcription network<ref name="ros1">{{cite journal |doi=10.1016/S0022-2836(02)00994-4 |vauthors=Rosenfeld N, Elowitz MB, Alon U |title=Negative autoregulation speeds the response times of transcription networks |journal=J. Mol. Biol. |volume=323 |issue=5 |pages=785–93 |date=November 2002 |pmid=12417193 |citeseerx=10.1.1.126.2604 }}</ref> and later on in the natural context in the SOS DNA repair system of E .coli.<ref name="cam1">{{cite journal |vauthors=Camas FM, Blázquez J, Poyatos JF |title=Autogenous and nonautogenous control of response in a genetic network |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=103 |issue=34 |pages=12718–23 |date=August 2006 |pmid=16908855 |pmc=1568915 |doi=10.1073/pnas.0602119103 |bibcode=2006PNAS..10312718C }}</ref> The second function is increased stability of the auto-regulated gene product concentration against stochastic noise, thus reducing variations in protein levels between different cells.<ref name="bec1">{{cite journal |vauthors=Becskei A, Serrano L |title=Engineering stability in gene networks by autoregulation |journal=Nature |volume=405 |issue=6786 |pages=590–3 |date=June 2000 |pmid=10850721 |doi=10.1038/35014651}}</ref><ref name="dub1">{{cite journal |vauthors=Dublanche Y, Michalodimitrakis K, Kümmerer N, Foglierini M, Serrano L |title=Noise in transcription negative feedback loops: simulation and experimental analysis |journal=Mol. Syst. Biol. |volume=2 |pages=41 |year=2006 |pmid=16883354 |pmc=1681513 |doi=10.1038/msb4100081 |issue=1}}</ref><ref name="shi1">{{cite journal |vauthors=Shimoga V, White J, Li Y, Sontag E, Bleris L |title= Synthetic mammalian transgene negative autoregulation |journal=Mol. Syst. Biol. |volume=9 |pages=670 |year=2013|doi=10.1038/msb.2013.27|pmid= 23736683 |pmc= 3964311 }}</ref>
===Positive auto-regulation (PAR)===
Positive auto-regulation (PAR) occurs when a transcription factor enhances its own rate of production. Opposite to the NAR motif this motif slows the response time compared to simple regulation.<ref name="mae1">{{cite journal |vauthors=Maeda YT, Sano M |title=Regulatory dynamics of synthetic gene networks with positive feedback |journal=J. Mol. Biol. |volume=359 |issue=4 |pages=1107–24 |date=June 2006 |pmid=16701695 |doi=10.1016/j.jmb.2006.03.064 }}</ref> In the case of a strong PAR the motif may lead to a bimodal distribution of protein levels in cell populations.<ref name="bec2">{{cite journal |vauthors=Becskei A, Séraphin B, Serrano L |title=Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion |journal=EMBO J. |volume=20 |issue=10 |pages=2528–35 |date=May 2001 |pmid=11350942 |pmc=125456 |doi=10.1093/emboj/20.10.2528}}</ref>
===Feed-forward loops (FFL)===
[[Image:Feed-forward motif.GIF|thumb|Schematic representation of a Feed-forward motif]]
This motif is commonly found in many gene systems and organisms. The motif consists of three genes and three regulatory interactions. The target gene C is regulated by 2 TFs A and B and in addition TF B is also regulated by TF A . Since each of the regulatory interactions may either be positive or negative there are possibly eight types of FFL motifs.<ref name="man1">{{cite journal |vauthors=Mangan S, Alon U |title=Structure and function of the feed-forward loop network motif |journal=Proc. Natl. Acad. Sci. U.S.A. |volume=100 |issue=21 |pages=11980–5 |date=October 2003 |pmid=14530388 |pmc=218699 |doi=10.1073/pnas.2133841100 |bibcode=2003PNAS..10011980M }}</ref> Two of those eight types: the coherent type 1 FFL (C1-FFL) (where all interactions are positive) and the incoherent type 1 FFL (I1-FFL) (A activates C and also activates B which represses C) are found much more frequently in the transcription network of ''[[Escherichia coli|E. coli]]'' and yeast than the other six types.<ref name="man1"/><ref name="ma1">{{cite journal |vauthors=Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP |title=An extended transcriptional regulatory network of ''Escherichia coli'' and analysis of its hierarchical structure and network motifs |journal=Nucleic Acids Res. |volume=32 |issue=22 |pages=6643–9 |year=2004 |pmid=15604458 |pmc=545451 |doi=10.1093/nar/gkh1009 |url=http://nar.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15604458}}</ref> In addition to the structure of the circuitry the way in which the signals from A and B are integrated by the C promoter should also be considered. In most of the cases the FFL is either an AND gate (A and B are required for C activation) or OR gate (either A or B are sufficient for C activation) but other input function are also possible.
===Coherent type 1 FFL (C1-FFL)===
The C1-FFL with an AND gate was shown to have a function of a ‘sign-sensitive delay’ element and a persistence detector both theoretically <ref name="man1"/> and experimentally<ref name="man2">{{cite journal |doi=10.1016/j.jmb.2003.09.049 |vauthors=Mangan S, Zaslaver A, Alon U |title=The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks |journal=J. Mol. Biol. |volume=334 |issue=2 |pages=197–204 |date=November 2003 |pmid=14607112 |citeseerx=10.1.1.110.4629 }}</ref> with the arabinose system of ''[[Escherichia coli|E. coli]]''. This means that this motif can provide pulse filtration in which short pulses of signal will not generate a response but persistent signals will generate a response after short delay. The shut off of the output when a persistent pulse is ended will be fast. The opposite behavior emerges in the case of a sum gate with fast response and delayed shut off as was demonstrated in the flagella system of ''[[Escherichia coli|E. coli]]''.<ref name="kal1">{{cite journal |vauthors=Kalir S, Mangan S, Alon U |title=A coherent feed-forward loop with a SUM input function prolongs flagella expression in ''Escherichia coli'' |journal=Mol. Syst. Biol. |volume=1 |pages=E1–E6 |year=2005 |pmid=16729041 |pmc=1681456 |doi=10.1038/msb4100010 |issue=1}}</ref> De novo evolution of C1-FFLs in [[gene regulatory network]]s has been demonstrated computationally in response to selection to filter out an idealized short signal pulse, but for non-idealized noise, a dynamics-based system of feed-forward regulation with different topology was instead favored.<ref>{{cite journal |last1=Xiong |first1=Kun |last2=Lancaster |first2=Alex K. |last3=Siegal |first3=Mark L. |last4=Masel |first4=Joanna |title=Feed-forward regulation adaptively evolves via dynamics rather than topology when there is intrinsic noise |journal=Nature Communications |date=3 June 2019 |volume=10 |issue=1 |pages=2418 |doi=10.1038/s41467-019-10388-6|pmid=31160574 |pmc=6546794 }}</ref>
===Incoherent type 1 FFL (I1-FFL)===
The I1-FFL is a pulse generator and response accelerator. The two signal pathways of the I1-FFL act in opposite directions where one pathway activates Z and the other represses it. When the repression is complete this leads to a pulse-like dynamics. It was also demonstrated experimentally that the I1-FFL can serve as response accelerator in a way which is similar to the NAR motif. The difference is that the I1-FFL can speed-up the response of any gene and not necessarily a transcription factor gene.<ref name="man3">{{cite journal |vauthors=Mangan S, Itzkovitz S, Zaslaver A, Alon U |title=The incoherent feed-forward loop accelerates the response-time of the gal system of ''Escherichia coli'' |journal=J. Mol. Biol. |volume=356 |issue=5 |pages=1073–81 |date=March 2006 |pmid=16406067 |doi=10.1016/j.jmb.2005.12.003 |citeseerx=10.1.1.184.8360 }}</ref> An additional function was assigned to the I1-FFL network motif: it was shown both theoretically and experimentally that the I1-FFL can generate non-monotonic input function in both a synthetic <ref name="ent1">{{cite journal |vauthors=Entus R, Aufderheide B, Sauro HM |title=Design and implementation of three incoherent feed-forward motif based biological concentration sensors |journal=Syst Synth Biol |volume=1 |issue=3 |pages=119–28 |date=August 2007 |pmid=19003446 |pmc=2398716 |doi=10.1007/s11693-007-9008-6 }}</ref> and native systems.<ref name="kap1">{{cite journal |vauthors=Kaplan S, Bren A, Dekel E, Alon U |title=The incoherent feed-forward loop can generate non-monotonic input functions for genes |journal=Mol. Syst. Biol. |volume=4 |pages=203 |year=2008 |pmid=18628744 |pmc=2516365 |doi=10.1038/msb.2008.43 |issue=1}}</ref> Finally, expression units that incorporate incoherent feedforward control of the gene product provide adaptation to the amount of DNA template and can be superior to simple combinations of constitutive promoters.<ref name="ble1">{{cite journal |vauthors=Bleris L, Xie Z, Glass D, Adadey A, Sontag E, Benenson Y |title=Synthetic incoherent feedforward circuits show adaptation to the amount of their genetic template |journal=Mol. Syst. Biol. |volume=7 |pages=519|year=2011 |doi=10.1038/msb.2011.49 |issue=1 |pmid=21811230 |pmc=3202791}}</ref> Feedforward regulation displayed better adaptation than negative feedback, and circuits based on RNA interference were the most robust to variation in DNA template amounts.<ref name="ble1"/>
===Multi-output FFLs===
In some cases the same regulators X and Y regulate several Z genes of the same system. By adjusting the strength of the interactions this motif was shown to determine the temporal order of gene activation. This was demonstrated experimentally in the flagella system of ''[[Escherichia coli|E. coli]]''.<ref name="kal2">{{cite journal |vauthors=Kalir S, McClure J, Pabbaraju K, etal |title=Ordering genes in a flagella pathway by analysis of expression kinetics from living bacteria |journal=Science |volume=292 |issue=5524 |pages=2080–3 |date=June 2001 |pmid=11408658 |doi=10.1126/science.1058758 }}</ref>
===Single-input modules (SIM)===
This motif occurs when a single regulator regulates a set of genes with no additional regulation. This is useful when the genes are cooperatively carrying out a specific function and therefore always need to be activated in a synchronized manner. By adjusting the strength of the interactions it can create temporal expression program of the genes it regulates.<ref name="zas1">{{cite journal |vauthors=Zaslaver A, Mayo AE, Rosenberg R, etal |title=Just-in-time transcription program in metabolic pathways |journal=Nat. Genet. |volume=36 |issue=5 |pages=486–91 |date=May 2004 |pmid=15107854 |doi=10.1038/ng1348|doi-access=free }}</ref>
In the literature, Multiple-input modules (MIM) arose as a generalization of SIM. However, the precise definitions of SIM and MIM have been a source of inconsistency. There are attempts to provide orthogonal definitions for canonical motifs in biological networks and algorithms to enumerate them, especially SIM, MIM and Bi-Fan (2x2 MIM).<ref>{{cite journal |vauthors=Konagurthu AS, Lesk AM |title=Single and Multiple Input Modules in regulatory networks |journal=Proteins |volume=73 |issue=2 |pages=320–324 |year=2008 |doi=10.1002/prot.22053|pmid=18433061 }}</ref>
===Dense overlapping regulons (DOR)===
This motif occurs in the case that several regulators combinatorially control a set of genes with diverse regulatory combinations. This motif was found in ''[[Escherichia coli|E. coli]]'' in various systems such as carbon utilization, anaerobic growth, stress response and others.<ref name="she1"/><ref name="boy1"/> In order to better understand the function of this motif one has to obtain more information about the way the multiple inputs are integrated by the genes. Kaplan ''et al.''<ref name="kap2">{{cite journal |vauthors=Kaplan S, Bren A, Zaslaver A, Dekel E, Alon U |title=Diverse two-dimensional input functions control bacterial sugar genes |journal=Mol. Cell |volume=29 |issue=6 |pages=786–92 |date=March 2008 |pmid=18374652 |pmc=2366073 |doi=10.1016/j.molcel.2008.01.021 }}</ref> has mapped the input functions of the sugar utilization genes in ''[[Escherichia coli|E. coli]]'', showing diverse shapes.
==Activity motifs==
An interesting generalization of the network-motifs, '''activity motifs''' are over occurring patterns that can be found when nodes and edges in the network are annotated with quantitative features. For instance, when edges in a metabolic pathways are annotated with the magnitude or timing of the corresponding gene expression, some patterns are over occurring '''given''' the underlying network structure.<ref name="agc">{{cite journal |vauthors=Chechik G, Oh E, Rando O, Weissman J, Regev A, Koller D |title=Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network |journal=Nat. Biotechnol. |volume=26 |issue=11 |pages=1251–9 |date=November 2008 |pmid=18953355 |pmc=2651818 |doi=10.1038/nbt.1499}}</ref>
==Criticism==
An assumption (sometimes more sometimes less implicit) behind the preservation of a topological sub-structure is that it is of a particular functional importance. This assumption has recently been questioned. Some authors have argued that motifs, like ''bi-fan motifs'', might show a variety depending on the network context, and therefore,<ref name="ad">{{cite journal |vauthors=Ingram PJ, Stumpf MP, Stark J |title=Network motifs: structure does not determine function |journal=BMC Genomics |volume=7 |pages=108 |year=2006 |pmid=16677373 |pmc=1488845 |doi=10.1186/1471-2164-7-108 }} {{open access}}</ref> structure of the motif does not necessarily determine function. Network structure certainly does not always indicate function; this is an idea that has been around for some time, for an example see the Sin operon.<ref>{{cite journal |vauthors=Voigt CA, Wolf DM, Arkin AP |title=The ''Bacillus subtilis'' sin operon: an evolvable network motif |journal=Genetics |volume=169 |issue=3 |pages=1187–202 |date=March 2005 |pmid=15466432 |pmc=1449569 |doi=10.1534/genetics.104.031955 |url=http://www.genetics.org/cgi/pmidlookup?view=long&pmid=15466432}}</ref>
Most analyses of motif function are carried out looking at the motif operating in isolation. Recent research<ref>{{cite journal |vauthors=Knabe JF, Nehaniv CL, Schilstra MJ |title=Do motifs reflect evolved function?—No convergent evolution of genetic regulatory network subgraph topologies |journal=BioSystems |volume=94 |issue=1–2 |pages=68–74 |year=2008 |pmid=18611431 |doi=10.1016/j.biosystems.2008.05.012 }}</ref> provides good evidence that network context, i.e. the connections of the motif to the rest of the network, is too important to draw inferences on function from local structure only — the cited paper also reviews the criticisms and alternative explanations for the observed data. An analysis of the impact of a single motif module on the global dynamics of a network is studied in.<ref>{{cite journal |vauthors=Taylor D, Restrepo JG |title=Network connectivity during mergers and growth: Optimizing the addition of a module |journal=Physical Review E |volume=83 |issue=6 |year=2011 |page=66112 |doi=10.1103/PhysRevE.83.066112 |pmid=21797446 |bibcode=2011PhRvE..83f6112T |arxiv=1102.4876 }}</ref> Yet another recent work suggests that certain topological features of biological networks naturally give rise to the common appearance of canonical motifs, thereby questioning whether frequencies of occurrences are reasonable evidence that the structures of motifs are selected for their functional contribution to the operation of networks.<ref>{{cite journal|last1=Konagurthu|first1=Arun S.|last2=Lesk|first2=Arthur M.|title=Single and multiple input modules in regulatory networks|journal=Proteins: Structure, Function, and Bioinformatics|date=23 April 2008|volume=73|issue=2|pages=320–324|doi=10.1002/prot.22053|pmid=18433061}}</ref><ref>{{cite journal |vauthors=Konagurthu AS, Lesk AM |title=On the origin of distribution patterns of motifs in biological networks |journal=BMC Syst Biol |volume=2 |pages=73 |year=2008 |pmid=18700017 |pmc=2538512 |doi=10.1186/1752-0509-2-73 }} {{open access}}</ref>
While the study of motifs was mostly applied to static complex networks, research of temporal complex networks<ref>Braha, D., & Bar‐Yam, Y. (2006). [https://static1.squarespace.com/static/5b68a4e4a2772c2a206180a1/t/5c5de3faf4e1fc43e7b3d21e/1549657083988/Complexity_Braha_Original_w_Cover.pdf From centrality to temporary fame: Dynamic centrality in complex networks]. Complexity, 12(2), 59-63. </ref> suggested a significant reinterpretation of network motifs, and introduced the concept of '''temporal network motifs'''. Braha and Bar-Yam<ref> Braha D., Bar-Yam Y. (2009) [https://s3.amazonaws.com/academia.edu.documents/4892116/Adaptive_Networks__Theory__Models_and_Applications__Understanding_Complex_Systems_.pdf?response-content-disposition=inline%3B%20filename%3DRedes_teoria.pdf&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWOWYYGZ2Y53UL3A%2F20191111%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20191111T173250Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=89d08c9e92b88ed817e4eb0f87c480757ef79c4b865919a5e0890cbefa164c61#page=55 Time-Dependent Complex Networks: Dynamic Centrality, Dynamic Motifs, and Cycles of Social Interactions]. In: Gross T., Sayama H. (eds) Adaptive Networks. Understanding Complex Systems. Springer, Berlin, Heidelberg </ref> studied the dynamics of local motif structure in time-dependent/temporal networks, and find recurrent patterns that might provide empirical evidence for cycles of social interaction. Counter to the perspective of stable motifs and motif profiles in complex networks, they demonstrated that for temporal networks the local structure is time-dependent and might evolve over time. Braha and Bar-Yam further suggested that analyzing the temporal local structure might provide important information about the dynamics of system-level task and functionality.
==See also==
* [[Clique (graph theory)]]
* [[Graphical model]]
==References==
{{reflist|2}}
==External links==
* [http://www.weizmann.ac.il/mcb/UriAlon/groupNetworkMotifSW.html A software tool that can detect network motifs]
* [http://www.bio-physics.at/wiki/index.php?title=Network_Motifs bio-physics-wiki NETWORK MOTIFS]
* [http://theinf1.informatik.uni-jena.de/~wernicke/motifs/ FANMOD: a tool for fast network motif detection]
* [http://mavisto.ipk-gatersleben.de/ MAVisto: network motif analysis and visualisation tool]
* [https://www.msu.edu/~jinchen/ NeMoFinder]
* [http://people.cs.uchicago.edu/~joshuag/index.html Grochow–Kellis]
* [http://lbb.ut.ac.ir/Download/LBBsoft/MODA/ MODA]
* [http://lbb.ut.ac.ir/Download/LBBsoft/Kavosh/ Kavosh]
* [http://apps.cytoscape.org/apps/cytokavosh CytoKavosh]
* [http://www.dcc.fc.up.pt/gtries/ G-Tries]
* [http://www.ft.unicamp.br/docentes/meira/accmotifs/ acc-MOTIF detection tool]
[[Category:Gene expression]]
[[Category:Networks]]