信息生产：Network motif定义历史
大家好，我们的公众号计划要推送一篇关于网络模体的综述文章，我们希望可以配套建议该重要概念：网络模体。现在希望可以大家一起协作完成这个词条。 翻译任务主要分为以下5个内容：
网络定义和历史 许菁
网络模体的发现算法 mfinder和FPF算法李鹏
已有网络模体及其函数表示 祺哥
活动模体+批判  孙宇
代码实现
大家可以在对应感兴趣的部分下面，写上姓名。我们的协作方式是石墨文档上翻译，最后再编辑成文。 对应的词条链接：https://en.wikipedia.org/wiki/Network_motif#Wellestablished_motifs_and_their_functions
选取好了的同学可以私信给厚朴哦
Network science  

模板:CSS image crop  
模板:Hlist  
Network types  
Graphs  


模板:Hlist  
Models  


模板:Hlist  
All networks, including biological networks, social networks, technological networks (e.g., computer networks and electrical circuits) and more, can be represented as graphs, which include a wide variety of subgraphs. One important local property of networks are socalled network motifs, which are defined as recurrent and statistically significant subgraphs or patterns.
所有网络，包括生物网络(biological networks)、社会网络(social networks)、技术网络（例如计算机网络和电路）等，都可以用图的形式来表示，这些图中会包括各种各样的子图(subgraphs)。网络的一个重要的局部性质是所谓的网络基序，即重复且具有统计意义的子图或模式(patterns)。
Network motifs are subgraphs that repeat themselves in a specific network or even among various networks. Each of these subgraphs, defined by a particular pattern of interactions between vertices, may reflect a framework in which particular functions are achieved efficiently. Indeed, motifs are of notable importance largely because they may reflect functional properties. They have recently gathered much attention as a useful concept to uncover structural design principles of complex networks.^{[1]} Although network motifs may provide a deep insight into the network's functional abilities, their detection is computationally challenging.
网络模体(Network motifs)是指在特定网络或各种网络中重复出现的相同的子图。这些子图由顶点之间特定的交互模式定义，一个子图便可以反映一个框架，这个框架可以有效地实现某个特定的功能。事实上，之所以说模体是一个重要的特性，正是因为它们可能反映出对应网络功能的这一性质。近年来这一概念作为揭示复杂网络结构设计原理的一个有用概念而受到了广泛的关注。[1]但是，虽然通过研究网络模体可以深入了解网络的功能，但是对于模体的检测在计算上是具有挑战性的。
Definition 定义
Let G = (V, E) and G′ = (V′, E′) be two graphs. Graph G′ is a subgraph of graph G (written as G′ ⊆ G) if V′ ⊆ V and E′ ⊆ E ∩ (V′ × V′). If G′ ⊆ G and G′ contains all of the edges 〈u, v〉 ∈ E with u, v ∈ V′, then G′ is an induced subgraph of G. We call G′ and G isomorphic (written as G′ ↔ G), if there exists a bijection (onetoone) f:V′ → V with 〈u, v〉 ∈ E′ ⇔ 〈f(u), f(v)〉 ∈ E for all u, v ∈ V′. The mapping f is called an isomorphism between G and G′.^{[2]}
设G(V, E)和G′(V′, E′)是两个图，若V′ ⊆ V且满足E′ ⊆ E ∩ (V′ × V′)（即图G‘的所有边和点都属于图G），则称图G′是图G的一个子图（记作：G‘ ⊆ G）。
若G′ ⊆ G，且对于顶点u、v及其连边，只要u和v存在于G′（即若u, v ∈ V′），那么G′中就应该包含原图G中的所有u和v的对应连边（即<u, v> ∈ E），则称此时图G′就是图G的导出子图。
如果存在一个双射（一对一）f:V′→ V，且对所有u，v ∈ V′都有〈u, v〉∈ E′ ⇔ 〈f(u), f(v)〉∈ E，则称G′是G的同构图（记作：G′ ↔ G），映射f则称为G与G′之间的同构（isomorphism）。[2]
When G″ ⊂ G and there exists an isomorphism between the subgraph G″ and a graph G′, this mapping represents an appearance of G′ in G. The number of appearances of graph G′ in G is called the frequency F_{G} of G′ in G. A graph is called recurrent (or frequent) in G, when its frequency F_{G}(G′) is above a predefined threshold or cutoff value. We use terms pattern and frequent subgraph in this review interchangeably. There is an ensemble Ω(G) of random graphs corresponding to the nullmodel associated to G. We should choose N random graphs uniformly from Ω(G) and calculate the frequency for a particular frequent subgraph G′ in G. If the frequency of G′ in G is higher than its arithmetic mean frequency in N random graphs R_{i}, where 1 ≤ i ≤ N, we call this recurrent pattern significant and hence treat G′ as a network motif for G. For a small graph G′, the network G and a set of randomized networks R(G) ⊆ Ω(R), where R(G) = N, the ZScore that has been defined by the following formula:
当G″ ⊂ G，且G〃与图G’之间存在同构时，该映射表示G′对于G存在。图G′在G的出现次数称为G′出现在G的频率FG(G′)。当一个子图的频率FG(G′)高于预定的阈值或截止值时，则称G‘是G中的递归（或频繁）子图。
在接下来的内容中，我们交替使用术语“模式（motifs）”和“频繁子图（frequent subgraph）”。
设从与G相关联的零模型（the nullmodel）获得的随机图集合为Ω(G)，我们从Ω(G)中均匀地选择N个随机图，并计算其特定频繁子图的频率。如果G′出现在G的频率高于N个随机图Ri的算术平均频率，其中1≤i≤N，我们称此递归模式是有意义的，因此可以将G′视为G的网络模体。
对于一个小图G′，网络G和一组随机网络R(G)⊆Ω(R)，当R(G)=N时，由其Z分数（Zscore）的定义如下式：
[math]\displaystyle{ Z(G^\prime) = \frac{F_G(G^\prime)  \mu_R(G^\prime)}{\sigma_R(G^\prime)} }[/math]
where μ_{R}(G′) and σ_{R}(G′) stand for mean and standard deviation frequency in set R(G), respectively.^{[3]}^{[4]}^{[5]}^{[6]}^{[7]}^{[8]} The larger the Z(G′), the more significant is the subgraph G′ as a motif. Alternatively, another measurement in statistical hypothesis testing that can be considered in motif detection is the PValue, given as the probability of F_{R}(G′) ≥ F_{G}(G′) (as its nullhypothesis), where F_{R}(G′) indicates the frequency of G' in a randomized network.^{[6]} A subgraph with Pvalue less than a threshold (commonly 0.01 or 0.05) will be treated as a significant pattern. The Pvalue is defined as
式中，μR（G′）和σR（G′）分别代表集合R（G）中的平均和标准偏差频率。[3][4][5][6][7][8] Z（G′）越大，子图G′作为模体的意义也就越大。
此外还可以使用统计假设检验中的另一个测量方法，可以作为模体检测中的一种方法，即P值（Pvalue），以FR（G′）≥FG（G′）的概率给出（作为其零假设nullhypothesis），其中FR(G′)表示随机网络中G′的频率。[6]当P值小于阈值（通常为0.01或0.05）时，该子图可以被称为显著模式。该P值定义为：
[math]\displaystyle{ P(G^\prime) = \frac{1}{N}\sum_{i=1}^N \delta(c(i)) ; c(i): F_R^i(G^\prime) \ge F_G(G^\prime) }[/math]
Where N indicates number of randomized networks, i is defined over an ensemble of randomized networks and the Kronecker delta function δ(c(i)) is one if the condition c(i) holds. The concentration ^{[9]}^{[10]} of a particular nsize subgraph G′ in network G refers to the ratio of the subgraph appearance in the network to the total nsize nonisomorphic subgraphs’ frequencies, which is formulated by
其中N表示随机网络的数目，i定义在随机网络的集合上，若条件c(i)成立，则Kroneckerδ函数δ(c(i))是1。在网络G中，一个特定的n维子图G′的集中度是指子图在网络中出现频率与n维非同构子图的总频率之比，其计算公式如下：
[math]\displaystyle{ C_G(G^\prime) = \frac{F_G(G^\prime)}{\sum_i F_G(G_i)} }[/math]
where index i is defined over the set of all nonisomorphic nsize graphs. Another statistical measurement is defined for evaluating network motifs, but it is rarely used in known algorithms. This measurement is introduced by Picard et al. in 2008 and used the Poisson distribution, rather than the Gaussian normal distribution that is implicitly being used above.^{[11]}
其中索引i是在所有非同构n维图的集合上定义的。还有另一种用于评估网络模体统计测量方法，但在已知算法中很少使用。2008年Picard等人介绍了这种测量方法，这种使用了泊松分布（Poisson distribution），而不是上面隐含使用到的高斯正态分布（Gaussian normal distribution）。[11]
In addition, three specific concepts of subgraph frequency have been proposed.^{[12]} As the figure illustrates, the first frequency concept F_{1} considers all matches of a graph in original network. This definition is similar to what we have introduced above. The second concept F_{2} is defined as the maximum number of edgedisjoint instances of a given graph in original network. And finally, the frequency concept F_{3} entails matches with disjoint edges and nodes. Therefore, the two concepts F_{2} and F_{3} restrict the usage of elements of the graph, and as can be inferred, the frequency of a subgraph declines by imposing restrictions on network element usage. As a result, a network motif detection algorithm would pass over more candidate subgraphs if we insist on frequency concepts F_{2} and F_{3}.
此外，他们还提出了子图频率的三个具体概念。如图所示，第一频率概念F1考虑原始网络中图的所有匹配，这与我们前面介绍过的类似。第二个概念F2定义为原始网络中给定图的最大不相交边的数量。最后，频率概念F3包含与不相交边（disjoint edges）和节点的匹配。因此，两个概念F2和F3限制了图元素的使用，并且可以看出，通过对网络元素的使用施加限制，子图的频率下降。因此，如果我们坚持使用频率概念F2和F3，网络模体检测算法将可以筛选出更多的候选子图。
History 历史
The study of network motifs was pioneered by Holland and Leinhardt^{[13]}^{[14]}^{[15]}^{[16]} who introduced the concept of a triad census of networks. They introduced methods to enumerate various types of subgraph configurations, and test whether the subgraph counts are statistically different from those expected in random networks.
霍兰（Holland）和莱因哈特（Leinhardt）率先提出了网络三合会普查（a triad census of networks）的概念，开创了网络模体研究的先河。[13][14][15][16]他们介绍了枚举各种子图配置的方法，并测试子图计数是否与随机网络中的期望值存在统计学上的差异。
这里对于网络三合会普查（a triad census of networks）这一概念的翻译存疑
This idea was further generalized in 2002 by Uri Alon and his group ^{[17]} when network motifs were discovered in the gene regulation (transcription) network of the bacteria E. coli and then in a large set of natural networks. Since then, a considerable number of studies have been conducted on the subject. Some of these studies focus on the biological applications, while others focus on the computational theory of network motifs.
2002年，Uri Alon和他的团队[17]在大肠杆菌的基因调控（gene regulation network）（转录 transcription）网络中发现了网络模体，随后在大量的自然网络中也发现了网络模体，从而进一步推广了这一观点。自那时起，许多科学家都对这一问题进行了大量的研究。其中一些研究集中在生物学应用上，而另一些则集中在网络模体的计算理论上。
The biological studies endeavor to interpret the motifs detected for biological networks. For example, in work following,^{[17]} the network motifs found in E. coli were discovered in the transcription networks of other bacteria^{[18]} as well as yeast^{[19]}^{[20]} and higher organisms.^{[21]}^{[22]}^{[23]} A distinct set of network motifs were identified in other types of biological networks such as neuronal networks and protein interaction networks.^{[5]}^{[24]}^{[25]}
生物学研究试图解释为生物网络检测到的模体。例如，在接下来的工作中，文献[17]在大肠杆菌中发现的网络模体存在于其他细菌[18]以及酵母[19][20]和高等生物的转录网络中。文献[21][22][23]在其他类型的生物网络中发现了一组不同的网络模体，如神经元网络和蛋白质相互作用网络。[5][24][25]
The computational research has focused on improving existing motif detection tools to assist the biological investigations and allow larger networks to be analyzed. Several different algorithms have been provided so far, which are elaborated in the next section in chronological order.
另一方面，对于计算研究的重点则是改进现有的模体检测工具，以协助生物学研究，并允许对更大的网络进行分析。到目前为止，已经提供了几种不同的算法，这些算法将在下一节按时间顺序进行阐述。
Most recently, the accMOTIF tool to detect network motifs was released.^{[26]}
最近，还发布了用于检测网络基序的acc基序工具。
Motif discovery algorithms
Various solutions have been proposed for the challenging problem of motif discovery. These algorithms can be classified under various paradigms such as exact counting methods, sampling methods, pattern growth methods and so on. However, motif discovery problem comprises two main steps: first, calculating the number of occurrences of a subgraph and then, evaluating the subgraph significance. The recurrence is significant if it is detectably far more than expected. Roughly speaking, the expected number of appearances of a subgraph can be determined by a Nullmodel, which is defined by an ensemble of random networks with some of the same properties as the original network.
Here, a review on computational aspects of major algorithms is given and their related benefits and drawbacks from an algorithmic perspective are discussed.
mfinder
mfinder, the first motifmining tool, implements two kinds of motif finding algorithms: a full enumeration and a sampling method. Until 2004, the only exact counting method for NM (network motif) detection was the bruteforce one proposed by Milo et al..^{[3]} This algorithm was successful for discovering small motifs, but using this method for finding even size 5 or 6 motifs was not computationally feasible. Hence, a new approach to this problem was needed.
Kashtan et al. ^{[9]} presented the first sampling NM discovery algorithm, which was based on edge sampling throughout the network. This algorithm estimates concentrations of induced subgraphs and can be utilized for motif discovery in directed or undirected networks. The sampling procedure of the algorithm starts from an arbitrary edge of the network that leads to a subgraph of size two, and then expands the subgraph by choosing a random edge that is incident to the current subgraph. After that, it continues choosing random neighboring edges until a subgraph of size n is obtained. Finally, the sampled subgraph is expanded to include all of the edges that exist in the network between these n nodes. When an algorithm uses a sampling approach, taking unbiased samples is the most important issue that the algorithm might address. The sampling procedure, however, does not take samples uniformly and therefore Kashtan et al. proposed a weighting scheme that assigns different weights to the different subgraphs within network.^{[9]} The underlying principle of weight allocation is exploiting the information of the sampling probability for each subgraph, i.e. the probable subgraphs will obtain comparatively less weights in comparison to the improbable subgraphs; hence, the algorithm must calculate the sampling probability of each subgraph that has been sampled. This weighting technique assists mfinder to determine subgraph concentrations impartially.
In expanded to include sharp contrast to exhaustive search, the computational time of the algorithm surprisingly is asymptotically independent of the network size. An analysis of the computational time of the algorithm has shown that it takes O(n^{n}) for each sample of a subgraph of size n from the network. On the other hand, there is no analysis in ^{[9]} on the classification time of sampled subgraphs that requires solving the graph isomorphism problem for each subgraph sample. Additionally, an extra computational effort is imposed on the algorithm by the subgraph weight calculation. But it is unavoidable to say that the algorithm may sample the same subgraph multiple times – spending time without gathering any information.^{[10]} In conclusion, by taking the advantages of sampling, the algorithm performs more efficiently than an exhaustive search algorithm; however, it only determines subgraphs concentrations approximately. This algorithm can find motifs up to size 6 because of its main implementation, and as result it gives the most significant motif, not all the others too. Also, it is necessary to mention that this tool has no option of visual presentation. The sampling algorithm is shown briefly:
mfinder 

Definitions: E_{s}is the set of picked edges. V_{s} is the set of all nodes that are touched by the edges in E. 
Init V_{s} and E_{s} to be empty sets.
1. Pick a random edge e_{1} = (v_{i}, v_{j}). Update E_{s} = {e_{1}}, V_{s} = {v_{i}, v_{j}} 2. Make a list L of all neighbor edges of E_{s}. Omit from L all edges between members of V_{s}. 3. Pick a random edge e = {v_{k},v_{l}} from L. Update E_{s} = E_{s} ⋃ {e}, V_{s} = V_{s} ⋃ {v_{k}, v_{l}}. 4. Repeat steps 23 until completing an nnode subgraph (until V_{s} = n). 5. Calculate the probability to sample the picked nnode subgraph. 
FPF (Mavisto)
Schreiber and Schwöbbermeyer ^{[12]} proposed an algorithm named flexible pattern finder (FPF) for extracting frequent subgraphs of an input network and implemented it in a system named Mavisto.^{[27]} Their algorithm exploits the downward closure property which is applicable for frequency concepts F_{2} and F_{3}. The downward closure property asserts that the frequency for subgraphs decrease monotonically by increasing the size of subgraphs; however, this property does not hold necessarily for frequency concept F_{1}. FPF is based on a pattern tree (see figure) consisting of nodes that represents different graphs (or patterns), where the parent of each node is a subgraph of its children nodes; in other words, the corresponding graph of each pattern tree's node is expanded by adding a new edge to the graph of its parent node.
At first, the FPF algorithm enumerates and maintains the information of all matches of a subgraph located at the root of the pattern tree. Then, onebyone it builds child nodes of the previous node in the pattern tree by adding one edge supported by a matching edge in the target graph, and tries to expand all of the previous information about matches to the new subgraph (child node). In next step, it decides whether the frequency of the current pattern is lower than a predefined threshold or not. If it is lower and if downward closure holds, FPF can abandon that path and not traverse further in this part of the tree; as a result, unnecessary computation is avoided. This procedure is continued until there is no remaining path to traverse.
The advantage of the algorithm is that it does not consider infrequent subgraphs and tries to finish the enumeration process as soon as possible; therefore, it only spends time for promising nodes in the pattern tree and discards all other nodes. As an added bonus, the pattern tree notion permits FPF to be implemented and executed in a parallel manner since it is possible to traverse each path of the pattern tree independently. However, FPF is most useful for frequency concepts F_{2} and F_{3}, because downward closure is not applicable to F_{1}. Nevertheless, the pattern tree is still practical for F_{1} if the algorithm runs in parallel. Another advantage of the algorithm is that the implementation of this algorithm has no limitation on motif size, which makes it more amenable to improvements. The pseudocode of FPF (Mavisto) is shown below:
Mavisto 

Data: Graph G, target pattern size t, frequency concept F
Result: Set R of patterns of size t with maximum frequency. 
R ← φ, f_{max} ← 0
P ←start pattern p1 of size 1 M_{p1} ←all matches of p_{1} in G While P ≠ φ do P_{max} ←select all patterns from P with maximum size. P ← select pattern with maximum frequency from P_{max} Ε = ExtensionLoop(G, p, M_{p}) Foreach pattern p ∈ E If F = F_{1} then f ← size(M_{p}) Else f ← Maximum Independent set(F, M_{p}) End If size(p) = t then If f = f_{max} then R ← R ⋃ {p} Else if f > f_{max} then R ← {p}; f_{max} ← f End Else If F = F_{1} or f ≥ f_{max} then P ← P ⋃ {p} End End End End 
ESU (FANMOD)
The sampling bias of Kashtan et al. ^{[9]} provided great impetus for designing better algorithms for the NM discovery problem. Although Kashtan et al. tried to settle this drawback by means of a weighting scheme, this method imposed an undesired overhead on the running time as well a more complicated implementation. This tool is one of the most useful ones, as it supports visual options and also is an efficient algorithm with respect to time. But, it has a limitation on motif size as it does not allow searching for motifs of size 9 or higher because of the way the tool is implemented.
Wernicke ^{[10]} introduced an algorithm named RANDESU that provides a significant improvement over mfinder.^{[9]} This algorithm, which is based on the exact enumeration algorithm ESU, has been implemented as an application called FANMOD.^{[10]} RANDESU is a NM discovery algorithm applicable for both directed and undirected networks, effectively exploits an unbiased node sampling throughout the network, and prevents overcounting subgraphs more than once. Furthermore, RANDESU uses a novel analytical approach called DIRECT for determining subgraph significance instead of using an ensemble of random networks as a Nullmodel. The DIRECT method estimates the subgraph concentration without explicitly generating random networks.^{[10]} Empirically, the DIRECT method is more efficient in comparison with the random network ensemble in case of subgraphs with a very low concentration; however, the classical Nullmodel is faster than the DIRECT method for highly concentrated subgraphs.^{[3]}^{[10]} In the following, we detail the ESU algorithm and then we show how this exact algorithm can be modified efficiently to RANDESU that estimates subgraphs concentrations.
The algorithms ESU and RANDESU are fairly simple, and hence easy to implement. ESU first finds the set of all induced subgraphs of size k, let S_{k} be this set. ESU can be implemented as a recursive function; the running of this function can be displayed as a treelike structure of depth k, called the ESUTree (see figure). Each of the ESUTree nodes indicate the status of the recursive function that entails two consecutive sets SUB and EXT. SUB refers to nodes in the target network that are adjacent and establish a partial subgraph of size SUB ≤ k. If SUB = k, the algorithm has found an induced complete subgraph, so S_{k} = SUB ∪ S_{k}. However, if SUB < k, the algorithm must expand SUB to achieve cardinality k. This is done by the EXT set that contains all the nodes that satisfy two conditions: First, each of the nodes in EXT must be adjacent to at least one of the nodes in SUB; second, their numerical labels must be larger than the label of first element in SUB. The first condition makes sure that the expansion of SUB nodes yields a connected graph and the second condition causes ESUTree leaves (see figure) to be distinct; as a result, it prevents overcounting. Note that, the EXT set is not a static set, so in each step it may expand by some new nodes that do not breach the two conditions. The next step of ESU involves classification of subgraphs placed in the ESUTree leaves into nonisomorphic sizek graph classes; consequently, ESU determines subgraphs frequencies and concentrations. This stage has been implemented simply by employing McKay's nauty algorithm,^{[28]}^{[29]} which classifies each subgraph by performing a graph isomorphism test. Therefore, ESU finds the set of all induced ksize subgraphs in a target graph by a recursive algorithm and then determines their frequency using an efficient tool.
The procedure of implementing RANDESU is quite straightforward and is one of the main advantages of FANMOD. One can change the ESU algorithm to explore just a portion of the ESUTree leaves by applying a probability value 0 ≤ p_{d} ≤ 1 for each level of the ESUTree and oblige ESU to traverse each child node of a node in level d1 with probability p_{d}. This new algorithm is called RANDESU. Evidently, when p_{d} = 1 for all levels, RANDESU acts like ESU. For p_{d} = 0 the algorithm finds nothing. Note that, this procedure ensures that the chances of visiting each leaf of the ESUTree are the same, resulting in unbiased sampling of subgraphs through the network. The probability of visiting each leaf is ∏_{d}p_{d} and this is identical for all of the ESUTree leaves; therefore, this method guarantees unbiased sampling of subgraphs from the network. Nonetheless, determining the value of p_{d} for 1 ≤ d ≤ k is another issue that must be determined manually by an expert to get precise results of subgraph concentrations.^{[8]} While there is no lucid prescript for this matter, the Wernicke provides some general observations that may help in determining p_d values. In summary, RANDESU is a very fast algorithm for NM discovery in the case of induced subgraphs supporting unbiased sampling method. Although, the main ESU algorithm and so the FANMOD tool is known for discovering induced subgraphs, there is trivial modification to ESU which makes it possible for finding noninduced subgraphs, too. The pseudo code of ESU (FANMOD) is shown below:
Enumeration of ESU (FANMOD) 

EnumerateSubgraphs(G,k)
Input: A graph G = (V, E) and an integer 1 ≤ k ≤ V. Output: All sizek subgraphs in G. for each vertex v ∈ V do VExtension ← {u ∈ N({v})  u > v} call ExtendSubgraph({v}, VExtension, v) endfor 
ExtendSubgraph(VSubgraph, VExtension, v)
if VSubgraph = k then output G[VSubgraph] and return while VExtension ≠ ∅ do Remove an arbitrarily chosen vertex w from VExtension VExtension′ ← VExtension ∪ {u ∈ N_{excl}(w, VSubgraph)  u > v} call ExtendSubgraph(VSubgraph ∪ {w}, VExtension′, v) return 
NeMoFinder
Chen et al. ^{[30]} introduced a new NM discovery algorithm called NeMoFinder, which adapts the idea in SPIN ^{[31]} to extract frequent trees and after that expands them into nonisomorphic graphs.^{[8]} NeMoFinder utilizes frequent sizen trees to partition the input network into a collection of sizen graphs, afterward finding frequent sizen subgraphs by expansion of frequent trees edgebyedge until getting a complete sizen graph K_{n}. The algorithm finds NMs in undirected networks and is not limited to extracting only induced subgraphs. Furthermore, NeMoFinder is an exact enumeration algorithm and is not based on a sampling method. As Chen et al. claim, NeMoFinder is applicable for detecting relatively large NMs, for instance, finding NMs up to size12 from the whole S. cerevisiae (yeast) PPI network as the authors claimed.^{[32]}
NeMoFinder consists of three main steps. First, finding frequent sizen trees, then utilizing repeated sizen trees to divide the entire network into a collection of sizen graphs, finally, performing subgraph join operations to find frequent sizen subgraphs.^{[30]} In the first step, the algorithm detects all nonisomorphic sizen trees and mappings from a tree to the network. In the second step, the ranges of these mappings are employed to partition the network into sizen graphs. Up to this step, there is no distinction between NeMoFinder and an exact enumeration method. However, a large portion of nonisomorphic sizen graphs still remain. NeMoFinder exploits a heuristic to enumerate nontree sizen graphs by the obtained information from the preceding steps. The main advantage of the algorithm is in the third step, which generates candidate subgraphs from previously enumerated subgraphs. This generation of new sizen subgraphs is done by joining each previous subgraph with derivative subgraphs from itself called cousin subgraphs. These new subgraphs contain one additional edge in comparison to the previous subgraphs. However, there exist some problems in generating new subgraphs: There is no clear method to derive cousins from a graph, joining a subgraph with its cousins leads to redundancy in generating particular subgraph more than once, and cousin determination is done by a canonical representation of the adjacency matrix which is not closed under join operation. NeMoFinder is an efficient network motif finding algorithm for motifs up to size 12 only for proteinprotein interaction networks, which are presented as undirected graphs. And it is not able to work on directed networks which are so important in the field of complex and biological networks. The pseudocode of NeMoFinder is shown below:
NeMoFinder 

Input:
G  PPI network; N  Number of randomized networks; K  Maximal network motif size; F  Frequency threshold; S  Uniqueness threshold; Output: U  Repeated and unique network motif set; D ← ∅; for motifsize k from 3 to K do T ← FindRepeatedTrees(k); GD_{k} ← GraphPartition(G, T) D ← D ∪ T; D′ ← T; i ← k; while D′ ≠ ∅ and i ≤ k × (k  1) / 2 do D′ ← FindRepeatedGraphs(k, i, D′); D ← D ∪ D′; i ← i + 1; end while end for for counter i from 1 to N do G_{rand} ← RandomizedNetworkGeneration(); for each g ∈ D do GetRandFrequency(g, G_{rand}); end for end for U ← ∅; for each g ∈ D do s ← GetUniqunessValue(g); if s ≥ S then U ← U ∪ {g}; end if end for return U; 
Grochow–Kellis
Grochow and Kellis ^{[33]} proposed an exact algorithm for enumerating subgraph appearances. The algorithm is based on a motifcentric approach, which means that the frequency of a given subgraph,called the query graph, is exhaustively determined by searching for all possible mappings from the query graph into the larger network. It is claimed ^{[33]} that a motifcentric method in comparison to networkcentric methods has some beneficial features. First of all it avoids the increased complexity of subgraph enumeration. Also, by using mapping instead of enumerating, it enables an improvement in the isomorphism test. To improve the performance of the algorithm, since it is an inefficient exact enumeration algorithm, the authors introduced a fast method which is called symmetrybreaking conditions. During straightforward subgraph isomorphism tests, a subgraph may be mapped to the same subgraph of the query graph multiple times. In the Grochow–Kellis (GK) algorithm symmetrybreaking is used to avoid such multiple mappings. Here we introduce the GK algorithm and the symmetrybreaking condition which eliminates redundant isomorphism tests.
The GK algorithm discovers the whole set of mappings of a given query graph to the network in two major steps. It starts with the computation of symmetrybreaking conditions of the query graph. Next, by means of a branchandbound method, the algorithm tries to find every possible mapping from the query graph to the network that meets the associated symmetrybreaking conditions. An example of the usage of symmetrybreaking conditions in GK algorithm is demonstrated in figure.
As it is mentioned above, the symmetrybreaking technique is a simple mechanism that precludes spending time finding a subgraph more than once due to its symmetries.^{[33]}^{[34]} Note that, computing symmetrybreaking conditions requires finding all automorphisms of a given query graph. Even though, there is no efficient (or polynomial time) algorithm for the graph automorphism problem, this problem can be tackled efficiently in practice by McKay's tools.^{[28]}^{[29]} As it is claimed, using symmetrybreaking conditions in NM detection lead to save a great deal of running time. Moreover, it can be inferred from the results in ^{[33]}^{[34]} that using the symmetrybreaking conditions results in high efficiency particularly for directed networks in comparison to undirected networks. The symmetrybreaking conditions used in the GK algorithm are similar to the restriction which ESU algorithm applies to the labels in EXT and SUB sets. In conclusion, the GK algorithm computes the exact number of appearance of a given query graph in a large complex network and exploiting symmetrybreaking conditions improves the algorithm performance. Also, GK algorithm is one of the known algorithms having no limitation for motif size in implementation and potentially it can find motifs of any size.
Colorcoding approach
Most algorithms in the field of NM discovery are used to find induced subgraphs of a network. In 2008, Noga Alon et al. ^{[35]} introduced an approach for finding noninduced subgraphs too. Their technique works on undirected networks such as PPI ones. Also, it counts noninduced trees and bounded treewidth subgraphs. This method is applied for subgraphs of size up to 10.
This algorithm counts the number of noninduced occurrences of a tree T with k = O(logn) vertices in a network G with n vertices as follows:
 Color coding. Color each vertex of input network G independently and uniformly at random with one of the k colors.
 Counting. Apply a dynamic programming routine to count the number of noninduced occurrences of T in which each vertex has a unique color. For more details on this step, see.^{[35]}
 Repeat the above two steps O(e^{k}) times and add up the number of occurrences of T to get an estimate on the number of its occurrences in G.
As available PPI networks are far from complete and error free, this approach is suitable for NM discovery for such networks. As Grochow–Kellis Algorithm and this one are the ones popular for noninduced subgraphs, it is worth to mention that the algorithm introduced by Alon et al. is less timeconsuming than the Grochow–Kellis Algorithm.^{[35]}
MODA
Omidi et al. ^{[36]} introduced a new algorithm for motif detection named MODA which is applicable for induced and noninduced NM discovery in undirected networks. It is based on the motifcentric approach discussed in the Grochow–Kellis algorithm section. It is very important to distinguish motifcentric algorithms such as MODA and GK algorithm because of their ability to work as queryfinding algorithms. This feature allows such algorithms to be able to find a single motif query or a small number of motif queries (not all possible subgraphs of a given size) with larger sizes. As the number of possible nonisomorphic subgraphs increases exponentially with subgraph size, for large size motifs (even larger than 10), the networkcentric algorithms, those looking for all possible subgraphs, face a problem. Although motifcentric algorithms also have problems in discovering all possible large size subgraphs, but their ability to find small numbers of them is sometimes a significant property.
Using a hierarchical structure called an expansion tree, the MODA algorithm is able to extract NMs of a given size systematically and similar to FPF that avoids enumerating unpromising subgraphs; MODA takes into consideration potential queries (or candidate subgraphs) that would result in frequent subgraphs. Despite the fact that MODA resembles FPF in using a tree like structure, the expansion tree is applicable merely for computing frequency concept F_{1}. As we will discuss next, the advantage of this algorithm is that it does not carry out the subgraph isomorphism test for nontree query graphs. Additionally, it utilizes a sampling method in order to speed up the running time of the algorithm.
Here is the main idea: by a simple criterion one can generalize a mapping of a ksize graph into the network to its same size supergraphs. For example, suppose there is mapping f(G) of graph G with k nodes into the network and we have a same size graph G′ with one more edge &langu, v〉; f_{G} will map G′ into the network, if there is an edge 〈f_{G}(u), f_{G}(v)〉 in the network. As a result, we can exploit the mapping set of a graph to determine the frequencies of its same order supergraphs simply in O(1) time without carrying out subgraph isomorphism testing. The algorithm starts ingeniously with minimally connected query graphs of size k and finds their mappings in the network via subgraph isomorphism. After that, with conservation of the graph size, it expands previously considered query graphs edgebyedge and computes the frequency of these expanded graphs as mentioned above. The expansion process continues until reaching a complete graph K_{k} (fully connected with 模板:Frac edge).
As discussed above, the algorithm starts by computing subtree frequencies in the network and then expands subtrees edge by edge. One way to implement this idea is called the expansion tree T_{k} for each k. Figure shows the expansion tree for size4 subgraphs. T_{k} organizes the running process and provides query graphs in a hierarchical manner. Strictly speaking, the expansion tree T_{k} is simply a directed acyclic graph or DAG, with its root number k indicating the graph size existing in the expansion tree and each of its other nodes containing the adjacency matrix of a distinct ksize query graph. Nodes in the first level of T_{k} are all distinct ksize trees and by traversing T_{k} in depth query graphs expand with one edge at each level. A query graph in a node is a subgraph of the query graph in a node's child with one edge difference. The longest path in T_{k} consists of (k^{2}3k+4)/2 edges and is the path from the root to the leaf node holding the complete graph. Generating expansion trees can be done by a simple routine which is explained in.^{[36]}
MODA traverses T_{k} and when it extracts query trees from the first level of T_{k} it computes their mapping sets and saves these mappings for the next step. For nontree queries from T_{k}, the algorithm extracts the mappings associated with the parent node in T_{k} and determines which of these mappings can support the current query graphs. The process will continue until the algorithm gets the complete query graph. The query tree mappings are extracted using the Grochow–Kellis algorithm. For computing the frequency of nontree query graphs, the algorithm employs a simple routine that takes O(1) steps. In addition, MODA exploits a sampling method where the sampling of each node in the network is linearly proportional to the node degree, the probability distribution is exactly similar to the wellknown BarabásiAlbert preferential attachment model in the field of complex networks.^{[37]} This approach generates approximations; however, the results are almost stable in different executions since subgraphs aggregate around highly connected nodes.^{[38]} The pseudocode of MODA is shown below:
MODA 

Input: G: Input graph, k: subgraph size, Δ: threshold value
Output: Frequent Subgraph List: List of all frequent ksize subgraphs Note: F_{G}: set of mappings from G in the input graph G fetch T_{k} do G′ = GetNextBFS(T_{k}) // G′ is a query graph if E(G′) = (k – 1) call MappingModule(G′, G) else call EnumeratingModule(G′, G, T_{k}) end if save F_{2} if F_{G} > Δ then add G′ into Frequent Subgraph List end if Until E(G') = (k – 1)/2 return Frequent Subgraph List 
Kavosh
A recently introduced algorithm named Kavosh ^{[39]} aims at improved main memory usage. Kavosh is usable to detect NM in both directed and undirected networks. The main idea of the enumeration is similar to the GK and MODA algorithms, which first find all ksize subgraphs that a particular node participated in, then remove the node, and subsequently repeat this process for the remaining nodes.^{[39]}
For counting the subgraphs of size k that include a particular node, trees with maximum depth of k, rooted at this node and based on neighborhood relationship are implicitly built. Children of each node include both incoming and outgoing adjacent nodes. To descend the tree, a child is chosen at each level with the restriction that a particular child can be included only if it has not been included at any upper level. After having descended to the lowest level possible, the tree is again ascended and the process is repeated with the stipulation that nodes visited in earlier paths of a descendant are now considered unvisited nodes. A final restriction in building trees is that all children in a particular tree must have numerical labels larger than the label of the root of the tree. The restrictions on the labels of the children are similar to the conditions which GK and ESU algorithm use to avoid overcounting subgraphs.
The protocol for extracting subgraphs makes use of the compositions of an integer. For the extraction of subgraphs of size k, all possible compositions of the integer k1 must be considered. The compositions of k1 consist of all possible manners of expressing k1 as a sum of positive integers. Summations in which the order of the summands differs are considered distinct. A composition can be expressed as k_{2},k_{3},…,k_{m} where k_{2} + k_{3} + … + k_{m} = k1. To count subgraphs based on the composition, k_{i} nodes are selected from the ith level of the tree to be nodes of the subgraphs (i = 2,3,…,m). The k1 selected nodes along with the node at the root define a subgraph within the network. After discovering a subgraph involved as a match in the target network, in order to be able to evaluate the size of each class according to the target network, Kavosh employs the nauty algorithm ^{[28]}^{[29]} in the same way as FANMOD. The enumeration part of Kavosh algorithm is shown below:
Enumeration of Kavosh 

Enumerate_Vertex(G, u, S, Remainder, i)
Input: G: Input graph if Remainder = 0 then 
Validate(G, Parents, u) Input: G: input graph, Parents: selected vertices of last layer, u: Root vertex. ValList ← NILL 
Recently a Cytoscape plugin called CytoKavosh ^{[40]} is developed for this software. It is available via Cytoscape web page [1].
GTries
In 2010, Pedro Ribeiro and Fernando Silva proposed a novel data structure for storing a collection of subgraphs, called a gtrie.^{[41]} This data structure, which is conceptually akin to a prefix tree, stores subgraphs according to their structures and finds occurrences of each of these subgraphs in a larger graph. One of the noticeable aspects of this data structure is that coming to the network motif discovery, the subgraphs in the main network are needed to be evaluated. So, there is no need to find the ones in random network which are not in the main network. This can be one of the timeconsuming parts in the algorithms in which all subgraphs in random networks are derived.
A gtrie is a multiway tree that can store a collection of graphs. Each tree node contains information about a single graph vertex and its corresponding edges to ancestor nodes. A path from the root to a leaf corresponds to one single graph. Descendants of a gtrie node share a common subgraph. Constructing a gtrie is well described in.^{[41]} After constructing a gtrie, the counting part takes place. The main idea in counting process is to backtrack by all possible subgraphs, but at the same time do the isomorphism tests. This backtracking technique is essentially the same technique employed by other motifcentric approaches like MODA and GK algorithms. Taking advantage of common substructures in the sense that at a given time there is a partial isomorphic match for several different candidate subgraphs.
Among the mentioned algorithms, GTries is the fastest. But, the excessive use of memory is the drawback of this algorithm, which might limit the size of discoverable motifs by a personal computer with average memory.
Comparison
Tables and figure below show the results of running the mentioned algorithms on different standard networks. These results are taken from the corresponding sources,^{[36]}^{[39]}^{[41]} thus they should be treated individually.
Network  Size  Census Original Network  Average Census on Random Networks  

FANMOD  GK  GTrie  FANMOD  GK  GTrie  
Dolphins  5  0.07  0.03  0.01  0.13  0.04  0.01 
6  0.48  0.28  0.04  1.14  0.35  0.07  
7  3.02  3.44  0.23  8.34  3.55  0.46  
8  19.44  73.16  1.69  67.94  37.31  4.03  
9  100.86  2984.22  6.98  493.98  366.79  24.84  
Circuit  6  0.49  0.41  0.03  0.55  0.24  0.03 
7  3.28  3.73  0.22  3.53  1.34  0.17  
8  17.78  48.00  1.52  21.42  7.91  1.06  
Social  3  0.31  0.11  0.02  0.35  0.11  0.02 
4  7.78  1.37  0.56  13.27  1.86  0.57  
5  208.30  31.85  14.88  531.65  62.66  22.11  
Yeast  3  0.47  0.33  0.02  0.57  0.35  0.02 
4  10.07  2.04  0.36  12.90  2.25  0.41  
5  268.51  34.10  12.73  400.13  47.16  14.98  
Power  3  0.51  1.46  0.00  0.91  1.37  0.01 
4  1.38  4.34  0.02  3.01  4.40  0.03  
5  4.68  16.95  0.10  12.38  17.54  0.14  
6  20.36  95.58  0.55  67.65  92.74  0.88  
7  101.04  765.91  3.36  408.15  630.65  5.17 
Size →  3  4  5  6  7  8  9  10  

Networks ↓  Algorithms ↓  
E. coli  Kavosh  0.30  1.84  14.91  141.98  1374.0  13173.7  121110.3  1120560.1 
FANMOD  0.81  2.53  15.71  132.24  1205.9  9256.6      
Mavisto  13532                
Mfinder  31.0  297  23671            
Electronic  Kavosh  0.08  0.36  8.02  11.39  77.22  422.6  2823.7  18037.5 
FANMOD  0.53  1.06  4.34  24.24  160  967.99      
Mavisto  210.0  1727              
Mfinder  7  14  109.8  2020.2          
Social  Kavosh  0.04  0.23  1.63  10.48  69.43  415.66  2594.19  14611.23 
FANMOD  0.46  0.84  3.07  17.63  117.43  845.93      
Mavisto  393  1492              
Mfinder  12  49  798  181077         
Classification of algorithms
As seen in the table, motif discovery algorithms can be divided into two general categories: those based on exact counting and those using statistical sampling and estimations instead. Because the second group does not count all the occurrences of a subgraph in the main network, the algorithms belonging to this group are faster, but they might yield in biased and unrealistic results.
In the next level, the exact counting algorithms can be classified to networkcentric and subgraphcentric methods. The algorithms of the first class search the given network for all subgraphs of a given size, while the algorithms falling into the second class first generate different possible nonisomorphic graphs of the given size, and then explore the network for each generated subgraph separately. Each approach has its advantages and disadvantages which are discussed above.
On the other hand, estimation methods might utilize colorcoding approach as described before. Other approaches used in this category usually skip some subgraphs during enumeration (e.g., as in FANMOD) and base their estimation on the enumerated subgraphs.
Furthermore, table indicates whether an algorithm can be used for directed or undirected networks as well as induced or noninduced subgraphs. For more information refer to the provided web links or lab addresses.
Counting Method  Basis  Name  Directed / Undirected  Induced / NonInduced 

Exact  NetworkCentric  mfinder  Both  Induced 
ESU (FANMOD)  Both  Induced  
Kavosh (used in CytoKavosh)  Both  Induced  
GTries  Both  Induced  
PGD  Undirected  Induced  
SubgraphCentric  FPF (Mavisto)  Both  Induced  
NeMoFinder  Undirected  Induced  
Grochow–Kellis  Both  Both  
MODA  Both  Both  
Estimation / Sampling  ColorCoding Approach  N. Alon et al.’s Algorithm  Undirected  NonInduced 
Other Approaches  mfinder  Both  Induced  
ESU (FANMOD)  Both  Induced 
Algorithm  Lab / Dept. Name  Department / School  Institute  Address  

mfinder  Uri Alon's Group  Department of Molecular Cell Biology  Weizmann Institute of Science  Rehovot, Israel, Wolfson, Rm. 607  urialon at weizmann.ac.il 
FPF (Mavisto)      LeibnizInstitut für Pflanzengenetik und Kulturpflanzenforschung (IPK)  Corrensstraße 3, D06466 Stadt Seeland, OT Gatersleben, Deutschland  schreibe at ipkgatersleben.de 
ESU (FANMOD)  Lehrstuhl Theoretische Informatik I  Institut für Informatik  FriedrichSchillerUniversität Jena  ErnstAbbePlatz 2,D07743 Jena, Deutschland  sebastian.wernicke at gmail.com 
NeMoFinder    School of Computing  National University of Singapore  Singapore 119077  chenjin at comp.nus.edu.sg 
Grochow–Kellis  CS Theory Group & Complex Systems Group  Computer Science  University of Colorado, Boulder  1111 Engineering Dr. ECOT 717, 430 UCB Boulder, CO 803090430 USA  jgrochow at colorado.edu 
N. Alon et al.’s Algorithm  Department of Pure Mathematics  School of Mathematical Sciences  Tel Aviv University  Tel Aviv 69978, Israel  nogaa at post.tau.ac.il 
MODA  Laboratory of Systems Biology and Bioinformatics (LBB)  Institute of Biochemistry and Biophysics (IBB)  University of Tehran  Enghelab Square, Enghelab Ave, Tehran, Iran  amasoudin at ibb.ut.ac.ir 
Kavosh (used in CytoKavosh)  Laboratory of Systems Biology and Bioinformatics (LBB)  Institute of Biochemistry and Biophysics (IBB)  University of Tehran  Enghelab Square, Enghelab Ave, Tehran, Iran  amasoudin at ibb.ut.ac.ir 
GTries  Center for Research in Advanced Computing Systems  Computer Science  University of Porto  Rua Campo Alegre 1021/1055, Porto, Portugal  pribeiro at dcc.fc.up.pt 
PGD  Network Learning and Discovery Lab  Department of Computer Science  Purdue University  Purdue University, 305 N University St, West Lafayette, IN 47907  nkahmed@purdue.edu 
Wellestablished motifs and their functions
Much experimental work has been devoted to understanding network motifs in gene regulatory networks. These networks control which genes are expressed in the cell in response to biological signals. The network is defined such that genes are nodes, and directed edges represent the control of one gene by a transcription factor (regulatory protein that binds DNA) encoded by another gene. Thus, network motifs are patterns of genes regulating each other's transcription rate. When analyzing transcription networks, it is seen that the same network motifs appear again and again in diverse organisms from bacteria to human. The transcription network of E. coli and yeast, for example, is made of three main motif families, that make up almost the entire network. The leading hypothesis is that the network motif were independently selected by evolutionary processes in a converging manner,^{[42]}^{[43]} since the creation or elimination of regulatory interactions is fast on evolutionary time scale, relative to the rate at which genes change,^{[42]}^{[43]}^{[44]} Furthermore, experiments on the dynamics generated by network motifs in living cells indicate that they have characteristic dynamical functions. This suggests that the network motif serve as building blocks in gene regulatory networks that are beneficial to the organism.
The functions associated with common network motifs in transcription networks were explored and demonstrated by several research projects both theoretically and experimentally. Below are some of the most common network motifs and their associated function.
Negative autoregulation (NAR)
One of simplest and most abundant network motifs in E. coli is negative autoregulation in which a transcription factor (TF) represses its own transcription. This motif was shown to perform two important functions. The first function is response acceleration. NAR was shown to speedup the response to signals both theoretically ^{[45]} and experimentally. This was first shown in a synthetic transcription network^{[46]} and later on in the natural context in the SOS DNA repair system of E .coli.^{[47]} The second function is increased stability of the autoregulated gene product concentration against stochastic noise, thus reducing variations in protein levels between different cells.^{[48]}^{[49]}^{[50]}
Positive autoregulation (PAR)
Positive autoregulation (PAR) occurs when a transcription factor enhances its own rate of production. Opposite to the NAR motif this motif slows the response time compared to simple regulation.^{[51]} In the case of a strong PAR the motif may lead to a bimodal distribution of protein levels in cell populations.^{[52]}
Feedforward loops (FFL)
This motif is commonly found in many gene systems and organisms. The motif consists of three genes and three regulatory interactions. The target gene C is regulated by 2 TFs A and B and in addition TF B is also regulated by TF A . Since each of the regulatory interactions may either be positive or negative there are possibly eight types of FFL motifs.^{[53]} Two of those eight types: the coherent type 1 FFL (C1FFL) (where all interactions are positive) and the incoherent type 1 FFL (I1FFL) (A activates C and also activates B which represses C) are found much more frequently in the transcription network of E. coli and yeast than the other six types.^{[53]}^{[54]} In addition to the structure of the circuitry the way in which the signals from A and B are integrated by the C promoter should also be considered. In most of the cases the FFL is either an AND gate (A and B are required for C activation) or OR gate (either A or B are sufficient for C activation) but other input function are also possible.
Coherent type 1 FFL (C1FFL)
The C1FFL with an AND gate was shown to have a function of a ‘signsensitive delay’ element and a persistence detector both theoretically ^{[53]} and experimentally^{[55]} with the arabinose system of E. coli. This means that this motif can provide pulse filtration in which short pulses of signal will not generate a response but persistent signals will generate a response after short delay. The shut off of the output when a persistent pulse is ended will be fast. The opposite behavior emerges in the case of a sum gate with fast response and delayed shut off as was demonstrated in the flagella system of E. coli.^{[56]} De novo evolution of C1FFLs in gene regulatory networks has been demonstrated computationally in response to selection to filter out an idealized short signal pulse, but for nonidealized noise, a dynamicsbased system of feedforward regulation with different topology was instead favored.^{[57]}
Incoherent type 1 FFL (I1FFL)
The I1FFL is a pulse generator and response accelerator. The two signal pathways of the I1FFL act in opposite directions where one pathway activates Z and the other represses it. When the repression is complete this leads to a pulselike dynamics. It was also demonstrated experimentally that the I1FFL can serve as response accelerator in a way which is similar to the NAR motif. The difference is that the I1FFL can speedup the response of any gene and not necessarily a transcription factor gene.^{[58]} An additional function was assigned to the I1FFL network motif: it was shown both theoretically and experimentally that the I1FFL can generate nonmonotonic input function in both a synthetic ^{[59]} and native systems.^{[60]} Finally, expression units that incorporate incoherent feedforward control of the gene product provide adaptation to the amount of DNA template and can be superior to simple combinations of constitutive promoters.^{[61]} Feedforward regulation displayed better adaptation than negative feedback, and circuits based on RNA interference were the most robust to variation in DNA template amounts.^{[61]}
Multioutput FFLs
In some cases the same regulators X and Y regulate several Z genes of the same system. By adjusting the strength of the interactions this motif was shown to determine the temporal order of gene activation. This was demonstrated experimentally in the flagella system of E. coli.^{[62]}
Singleinput modules (SIM)
This motif occurs when a single regulator regulates a set of genes with no additional regulation. This is useful when the genes are cooperatively carrying out a specific function and therefore always need to be activated in a synchronized manner. By adjusting the strength of the interactions it can create temporal expression program of the genes it regulates.^{[63]}
In the literature, Multipleinput modules (MIM) arose as a generalization of SIM. However, the precise definitions of SIM and MIM have been a source of inconsistency. There are attempts to provide orthogonal definitions for canonical motifs in biological networks and algorithms to enumerate them, especially SIM, MIM and BiFan (2x2 MIM).^{[64]}
Dense overlapping regulons (DOR)
This motif occurs in the case that several regulators combinatorially control a set of genes with diverse regulatory combinations. This motif was found in E. coli in various systems such as carbon utilization, anaerobic growth, stress response and others.^{[17]}^{[22]} In order to better understand the function of this motif one has to obtain more information about the way the multiple inputs are integrated by the genes. Kaplan et al.^{[65]} has mapped the input functions of the sugar utilization genes in E. coli, showing diverse shapes.
Activity motifs
An interesting generalization of the networkmotifs, activity motifs are over occurring patterns that can be found when nodes and edges in the network are annotated with quantitative features. For instance, when edges in a metabolic pathways are annotated with the magnitude or timing of the corresponding gene expression, some patterns are over occurring given the underlying network structure.^{[66]}
Criticism
An assumption (sometimes more sometimes less implicit) behind the preservation of a topological substructure is that it is of a particular functional importance. This assumption has recently been questioned. Some authors have argued that motifs, like bifan motifs, might show a variety depending on the network context, and therefore,^{[67]} structure of the motif does not necessarily determine function. Network structure certainly does not always indicate function; this is an idea that has been around for some time, for an example see the Sin operon.^{[68]}
Most analyses of motif function are carried out looking at the motif operating in isolation. Recent research^{[69]} provides good evidence that network context, i.e. the connections of the motif to the rest of the network, is too important to draw inferences on function from local structure only — the cited paper also reviews the criticisms and alternative explanations for the observed data. An analysis of the impact of a single motif module on the global dynamics of a network is studied in.^{[70]} Yet another recent work suggests that certain topological features of biological networks naturally give rise to the common appearance of canonical motifs, thereby questioning whether frequencies of occurrences are reasonable evidence that the structures of motifs are selected for their functional contribution to the operation of networks.^{[71]}^{[72]}
While the study of motifs was mostly applied to static complex networks, research of temporal complex networks^{[73]} suggested a significant reinterpretation of network motifs, and introduced the concept of temporal network motifs. Braha and BarYam^{[74]} studied the dynamics of local motif structure in timedependent/temporal networks, and find recurrent patterns that might provide empirical evidence for cycles of social interaction. Counter to the perspective of stable motifs and motif profiles in complex networks, they demonstrated that for temporal networks the local structure is timedependent and might evolve over time. Braha and BarYam further suggested that analyzing the temporal local structure might provide important information about the dynamics of systemlevel task and functionality.
See also
References
 ↑ MasoudiNejad A, Schreiber F, Razaghi MK Z (2012). "Building Blocks of Biological Networks: A Review on Major Network Motif Discovery Algorithms". IET Systems Biology. 6 (5): 164–74. doi:10.1049/ietsyb.2011.0011. PMID 23101871.
 ↑ Diestel R (2005). "Graph Theory (Graduate Texts in Mathematics)". 173. New York: SpringerVerlag Heidelberg.
{{cite journal}}
: Cite journal requiresjournal=
(help)  ↑ ^{3.0} ^{3.1} ^{3.2} Milo R, ShenOrr SS, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (2002). "Network motifs: simple building blocks of complex networks". Science. 298 (5594): 824–827. Bibcode:2002Sci...298..824M. CiteSeerX 10.1.1.225.8750. doi:10.1126/science.298.5594.824. PMID 12399590.
 ↑ Albert R, Barabási AL (2002). "Statistical mechanics of complex networks". Reviews of Modern Physics. 74 (1): 47–49. arXiv:condmat/0106096. Bibcode:2002RvMP...74...47A. CiteSeerX 10.1.1.242.4753. doi:10.1103/RevModPhys.74.47.
 ↑ ^{5.0} ^{5.1} Milo R, Itzkovitz S, Kashtan N, Levitt R, ShenOrr S, Ayzenshtat I, Sheffer M, Alon U (2004). "Superfamilies of designed and evolved networks". Science. 303 (5663): 1538–1542. Bibcode:2004Sci...303.1538M. doi:10.1126/science.1089167. PMID 15001784.
 ↑ ^{6.0} ^{6.1} Schwöbbermeyer, H (2008). "Network Motifs". In Junker BH, Schreiber F (ed.). Analysis of Biological Networks. Hoboken, New Jersey: John Wiley & Sons. pp. 85–108.
 ↑ Bornholdt, S; Schuster, HG (2003). "Handbook of graphs and networks : from the genome to the Internet". Handbook of Graphs and Networks: From the Genome to the Internet. p. 417. Bibcode:2003hgnf.book.....B.
 ↑ ^{8.0} ^{8.1} ^{8.2} Ciriello G, Guerra C (2008). "A review on models and algorithms for motif discovery in proteinprotein interaction networks". Briefings in Functional Genomics and Proteomics. 7 (2): 147–156. doi:10.1093/bfgp/eln015. PMID 18443014.
 ↑ ^{9.0} ^{9.1} ^{9.2} ^{9.3} ^{9.4} ^{9.5} Kashtan N, Itzkovitz S, Milo R, Alon U (2004). "Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs". Bioinformatics. 20 (11): 1746–1758. doi:10.1093/bioinformatics/bth163. PMID 15001476.
 ↑ ^{10.0} ^{10.1} ^{10.2} ^{10.3} ^{10.4} ^{10.5} Wernicke S (2006). "Efficient detection of network motifs". IEEE/ACM Transactions on Computational Biology and Bioinformatics. 3 (4): 347–359. CiteSeerX 10.1.1.304.2576. doi:10.1109/tcbb.2006.51. PMID 17085844.
 ↑ Picard F, Daudin JJ, Schbath S, Robin S (2005). "Assessing the Exceptionality of Network Motifs". J. Comp. Bio. 15 (1): 1–20. CiteSeerX 10.1.1.475.4300. doi:10.1089/cmb.2007.0137. PMID 18257674.
 ↑ ^{12.0} ^{12.1} ^{12.2} Frequency concepts and pattern detection for the analysis of motifs in networks. Lecture Notes in Computer Science. 3737. 2005. pp. 89–104. doi:10.1007/11599128_7. ISBN 9783540308836.
 ↑ Holland, P. W., & Leinhardt, S. (1974). The statistical analysis of local structure in social networks. Working Paper No. 44, National Bureau of Economic Research.
 ↑ Hollandi, P., & Leinhardt, S. (1975). The Statistical Analysis of Local. Structure in Social Networks. Sociological Methodology, David Heise, ed. San Francisco: JoseyBass.
 ↑ Holland, P. W., & Leinhardt, S. (1976). Local structure in social networks. Sociological methodology, 7, 145.
 ↑ Holland, P. W., & Leinhardt, S. (1977). A method for detecting structure in sociometric data. In Social Networks (pp. 411432). Academic Press.
 ↑ ^{17.0} ^{17.1} ^{17.2} ShenOrr SS, Milo R, Mangan S, Alon U (May 2002). "Network motifs in the transcriptional regulation network of Escherichia coli". Nat. Genet. 31 (1): 64–8. doi:10.1038/ng881. PMID 11967538.
 ↑ Eichenberger P, Fujita M, Jensen ST, et al. (October 2004). "The program of gene transcription for a single differentiating cell type during sporulation in Bacillus subtilis". PLOS Biology. 2 (10): e328. doi:10.1371/journal.pbio.0020328. PMC 517825. PMID 15383836.
 ↑ Milo R, ShenOrr S, Itzkovitz S, Kashtan N, Chklovskii D, Alon U (October 2002). "Network motifs: simple building blocks of complex networks". Science. 298 (5594): 824–7. Bibcode:2002Sci...298..824M. CiteSeerX 10.1.1.225.8750. doi:10.1126/science.298.5594.824. PMID 12399590.
 ↑ Lee TI, Rinaldi NJ, Robert F, et al. (October 2002). "Transcriptional regulatory networks in Saccharomyces cerevisiae". Science. 298 (5594): 799–804. Bibcode:2002Sci...298..799L. doi:10.1126/science.1075090. PMID 12399584.
 ↑ Odom DT, Zizlsperger N, Gordon DB, et al. (February 2004). "Control of pancreas and liver gene expression by HNF transcription factors". Science. 303 (5662): 1378–81. Bibcode:2004Sci...303.1378O. doi:10.1126/science.1089769. PMC 3012624. PMID 14988562.
 ↑ ^{22.0} ^{22.1} Boyer LA, Lee TI, Cole MF, et al. (September 2005). "Core transcriptional regulatory circuitry in human embryonic stem cells". Cell. 122 (6): 947–56. doi:10.1016/j.cell.2005.08.020. PMC 3006442. PMID 16153702.
 ↑ Iranfar N, Fuller D, Loomis WF (February 2006). "Transcriptional regulation of postaggregation genes in Dictyostelium by a feedforward loop involving GBF and LagC". Dev. Biol. 290 (2): 460–9. doi:10.1016/j.ydbio.2005.11.035. PMID 16386729.
 ↑ Ma'ayan A, Jenkins SL, Neves S, et al. (August 2005). "Formation of regulatory patterns during signal propagation in a Mammalian cellular network". Science. 309 (5737): 1078–83. Bibcode:2005Sci...309.1078M. doi:10.1126/science.1108876. PMC 3032439. PMID 16099987.
 ↑ Ptacek J, Devgan G, Michaud G, et al. (December 2005). "Global analysis of protein phosphorylation in yeast" (PDF). Nature (Submitted manuscript). 438 (7068): 679–84. Bibcode:2005Natur.438..679P. doi:10.1038/nature04187. PMID 16319894.
 ↑ "AccMotif: Accelerated Motif Detection".
 ↑ Schreiber F, Schwobbermeyer H (2005). "MAVisto: a tool for the exploration of network motifs". Bioinformatics. 21 (17): 3572–3574. doi:10.1093/bioinformatics/bti556. PMID 16020473.
 ↑ ^{28.0} ^{28.1} ^{28.2} McKay BD (1981). "Practical graph isomorphism". Congressus Numerantium. 30: 45–87. arXiv:1301.1493. Bibcode:2013arXiv1301.1493M.
 ↑ ^{29.0} ^{29.1} ^{29.2} McKay BD (1998). "Isomorphfree exhaustive generation". Journal of Algorithms. 26 (2): 306–324. doi:10.1006/jagm.1997.0898.
 ↑ ^{30.0} ^{30.1} Chen J, Hsu W, Li Lee M, et al. (2006). NeMoFinder: dissecting genomewide proteinprotein interactions with mesoscale network motifs. the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. Philadelphia, Pennsylvania, USA. pp. 106–115.
 ↑ Huan J, Wang W, Prins J, et al. (2004). SPIN: mining maximal frequent subgraphs from graph databases. the 10th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 581–586.
 ↑ Uetz P, Giot L, Cagney G, et al. (2000). "A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae". Nature. 403 (6770): 623–627. Bibcode:2000Natur.403..623U. doi:10.1038/35001009. PMID 10688190.
 ↑ ^{33.0} ^{33.1} ^{33.2} ^{33.3} Grochow JA, Kellis M (2007). Network Motif Discovery Using Subgraph Enumeration and SymmetryBreaking (PDF). RECOMB. pp. 92–106. doi:10.1007/9783540716815_7.
 ↑ ^{34.0} ^{34.1} Grochow JA (2006). On the structure and evolution of protein interaction networks (PDF). Thesis M. Eng., Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science.
 ↑ ^{35.0} ^{35.1} ^{35.2} Alon N; Dao P; Hajirasouliha I; Hormozdiari F; Sahinalp S.C (2008). "Biomolecular network motif counting and discovery by color coding". Bioinformatics. 24 (13): i241–i249. doi:10.1093/bioinformatics/btn163. PMC 2718641. PMID 18586721.
 ↑ ^{36.0} ^{36.1} ^{36.2} ^{36.3} ^{36.4} Omidi S, Schreiber F, MasoudiNejad A (2009). "MODA: an efficient algorithm for network motif discovery in biological networks". Genes Genet Syst. 84 (5): 385–395. doi:10.1266/ggs.84.385. PMID 20154426.
 ↑ Barabasi AL, Albert R (1999). "Emergence of scaling in random networks". Science. 286 (5439): 509–512. arXiv:condmat/9910332. Bibcode:1999Sci...286..509B. doi:10.1126/science.286.5439.509. PMID 10521342.
 ↑ Vázquez A, Dobrin R, Sergi D, et al. (2004). "The topological relationship between the largescale attributes and local interaction patterns of complex networks". PNAS. 101 (52): 17940–17945. arXiv:condmat/0408431. Bibcode:2004PNAS..10117940V. doi:10.1073/pnas.0406024101. PMC 539752. PMID 15598746.
 ↑ ^{39.0} ^{39.1} ^{39.2} ^{39.3} Kashani ZR, Ahrabian H, Elahi E, NowzariDalini A, Ansari ES, Asadi S, Mohammadi S, Schreiber F, MasoudiNejad A (2009). "Kavosh: a new algorithm for finding network motifs". BMC Bioinformatics. 10 (318): 318. doi:10.1186/1471210510318. PMC 2765973. PMID 19799800.
 ↑ Ali MasoudiNejad; Mitra Anasariola; Ali SalehzadehYazdi; Sahand Khakabimamaghani (2012). "CytoKavosh: a Cytoscape Plugin for Finding Network Motifs in Large Biological Networks". PLoS ONE. 7 (8): e43287. Bibcode:2012PLoSO...743287M. doi:10.1371/journal.pone.0043287. PMC 3430699. PMID 22952659.
 ↑ ^{41.0} ^{41.1} ^{41.2} ^{41.3} Ribeiro P, Silva F (2010). GTries: an efficient data structure for discovering network motifs. ACM 25th Symposium On Applied Computing  Bioinformatics Track. Sierre, Switzerland. pp. 1559–1566.
 ↑ ^{42.0} ^{42.1} Babu MM, Luscombe NM, Aravind L, Gerstein M, Teichmann SA (June 2004). "Structure and evolution of transcriptional regulatory networks". Current Opinion in Structural Biology. 14 (3): 283–91. CiteSeerX 10.1.1.471.9692. doi:10.1016/j.sbi.2004.05.004. PMID 15193307.
 ↑ ^{43.0} ^{43.1} Conant GC, Wagner A (July 2003). "Convergent evolution of gene circuits". Nat. Genet. 34 (3): 264–6. doi:10.1038/ng1181. PMID 12819781.
 ↑ Dekel E, Alon U (July 2005). "Optimality and evolutionary tuning of the expression level of a protein". Nature. 436 (7050): 588–92. Bibcode:2005Natur.436..588D. doi:10.1038/nature03842. PMID 16049495.
 ↑ Zabet NR (September 2011). "Negative feedback and physical limits of genes". Journal of Theoretical Biology. 284 (1): 82–91. arXiv:1408.1869. CiteSeerX 10.1.1.759.5418. doi:10.1016/j.jtbi.2011.06.021. PMID 21723295.
 ↑ Rosenfeld N, Elowitz MB, Alon U (November 2002). "Negative autoregulation speeds the response times of transcription networks". J. Mol. Biol. 323 (5): 785–93. CiteSeerX 10.1.1.126.2604. doi:10.1016/S00222836(02)009944. PMID 12417193.
 ↑ Camas FM, Blázquez J, Poyatos JF (August 2006). "Autogenous and nonautogenous control of response in a genetic network". Proc. Natl. Acad. Sci. U.S.A. 103 (34): 12718–23. Bibcode:2006PNAS..10312718C. doi:10.1073/pnas.0602119103. PMC 1568915. PMID 16908855.
 ↑ Becskei A, Serrano L (June 2000). "Engineering stability in gene networks by autoregulation". Nature. 405 (6786): 590–3. doi:10.1038/35014651. PMID 10850721.
 ↑ Dublanche Y, Michalodimitrakis K, Kümmerer N, Foglierini M, Serrano L (2006). "Noise in transcription negative feedback loops: simulation and experimental analysis". Mol. Syst. Biol. 2 (1): 41. doi:10.1038/msb4100081. PMC 1681513. PMID 16883354.
 ↑ Shimoga V, White J, Li Y, Sontag E, Bleris L (2013). "Synthetic mammalian transgene negative autoregulation". Mol. Syst. Biol. 9: 670. doi:10.1038/msb.2013.27. PMC 3964311. PMID 23736683.
 ↑ Maeda YT, Sano M (June 2006). "Regulatory dynamics of synthetic gene networks with positive feedback". J. Mol. Biol. 359 (4): 1107–24. doi:10.1016/j.jmb.2006.03.064. PMID 16701695.
 ↑ Becskei A, Séraphin B, Serrano L (May 2001). "Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion". EMBO J. 20 (10): 2528–35. doi:10.1093/emboj/20.10.2528. PMC 125456. PMID 11350942.
 ↑ ^{53.0} ^{53.1} ^{53.2} Mangan S, Alon U (October 2003). "Structure and function of the feedforward loop network motif". Proc. Natl. Acad. Sci. U.S.A. 100 (21): 11980–5. Bibcode:2003PNAS..10011980M. doi:10.1073/pnas.2133841100. PMC 218699. PMID 14530388.
 ↑ Ma HW, Kumar B, Ditges U, Gunzer F, Buer J, Zeng AP (2004). "An extended transcriptional regulatory network of Escherichia coli and analysis of its hierarchical structure and network motifs". Nucleic Acids Res. 32 (22): 6643–9. doi:10.1093/nar/gkh1009. PMC 545451. PMID 15604458.
 ↑ Mangan S, Zaslaver A, Alon U (November 2003). "The coherent feedforward loop serves as a signsensitive delay element in transcription networks". J. Mol. Biol. 334 (2): 197–204. CiteSeerX 10.1.1.110.4629. doi:10.1016/j.jmb.2003.09.049. PMID 14607112.
 ↑ Kalir S, Mangan S, Alon U (2005). "A coherent feedforward loop with a SUM input function prolongs flagella expression in Escherichia coli". Mol. Syst. Biol. 1 (1): E1–E6. doi:10.1038/msb4100010. PMC 1681456. PMID 16729041.
 ↑ Xiong, Kun; Lancaster, Alex K.; Siegal, Mark L.; Masel, Joanna (3 June 2019). "Feedforward regulation adaptively evolves via dynamics rather than topology when there is intrinsic noise". Nature Communications. 10 (1): 2418. doi:10.1038/s41467019103886. PMC 6546794. PMID 31160574.
 ↑ Mangan S, Itzkovitz S, Zaslaver A, Alon U (March 2006). "The incoherent feedforward loop accelerates the responsetime of the gal system of Escherichia coli". J. Mol. Biol. 356 (5): 1073–81. CiteSeerX 10.1.1.184.8360. doi:10.1016/j.jmb.2005.12.003. PMID 16406067.
 ↑ Entus R, Aufderheide B, Sauro HM (August 2007). "Design and implementation of three incoherent feedforward motif based biological concentration sensors". Syst Synth Biol. 1 (3): 119–28. doi:10.1007/s1169300790086. PMC 2398716. PMID 19003446.
 ↑ Kaplan S, Bren A, Dekel E, Alon U (2008). "The incoherent feedforward loop can generate nonmonotonic input functions for genes". Mol. Syst. Biol. 4 (1): 203. doi:10.1038/msb.2008.43. PMC 2516365. PMID 18628744.
 ↑ ^{61.0} ^{61.1} Bleris L, Xie Z, Glass D, Adadey A, Sontag E, Benenson Y (2011). "Synthetic incoherent feedforward circuits show adaptation to the amount of their genetic template". Mol. Syst. Biol. 7 (1): 519. doi:10.1038/msb.2011.49. PMC 3202791. PMID 21811230.
 ↑ Kalir S, McClure J, Pabbaraju K, et al. (June 2001). "Ordering genes in a flagella pathway by analysis of expression kinetics from living bacteria". Science. 292 (5524): 2080–3. doi:10.1126/science.1058758. PMID 11408658.
 ↑ Zaslaver A, Mayo AE, Rosenberg R, et al. (May 2004). "Justintime transcription program in metabolic pathways". Nat. Genet. 36 (5): 486–91. doi:10.1038/ng1348. PMID 15107854.
 ↑ Konagurthu AS, Lesk AM (2008). "Single and Multiple Input Modules in regulatory networks". Proteins. 73 (2): 320–324. doi:10.1002/prot.22053. PMID 18433061.
 ↑ Kaplan S, Bren A, Zaslaver A, Dekel E, Alon U (March 2008). "Diverse twodimensional input functions control bacterial sugar genes". Mol. Cell. 29 (6): 786–92. doi:10.1016/j.molcel.2008.01.021. PMC 2366073. PMID 18374652.
 ↑ Chechik G, Oh E, Rando O, Weissman J, Regev A, Koller D (November 2008). "Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network". Nat. Biotechnol. 26 (11): 1251–9. doi:10.1038/nbt.1499. PMC 2651818. PMID 18953355.
 ↑ Ingram PJ, Stumpf MP, Stark J (2006). "Network motifs: structure does not determine function". BMC Genomics. 7: 108. doi:10.1186/147121647108. PMC 1488845. PMID 16677373.
 ↑ Voigt CA, Wolf DM, Arkin AP (March 2005). "The Bacillus subtilis sin operon: an evolvable network motif". Genetics. 169 (3): 1187–202. doi:10.1534/genetics.104.031955. PMC 1449569. PMID 15466432.
 ↑ Knabe JF, Nehaniv CL, Schilstra MJ (2008). "Do motifs reflect evolved function?—No convergent evolution of genetic regulatory network subgraph topologies". BioSystems. 94 (1–2): 68–74. doi:10.1016/j.biosystems.2008.05.012. PMID 18611431.
 ↑ Taylor D, Restrepo JG (2011). "Network connectivity during mergers and growth: Optimizing the addition of a module". Physical Review E. 83 (6): 66112. arXiv:1102.4876. Bibcode:2011PhRvE..83f6112T. doi:10.1103/PhysRevE.83.066112. PMID 21797446.
 ↑ Konagurthu, Arun S.; Lesk, Arthur M. (23 April 2008). "Single and multiple input modules in regulatory networks". Proteins: Structure, Function, and Bioinformatics. 73 (2): 320–324. doi:10.1002/prot.22053. PMID 18433061.
 ↑ Konagurthu AS, Lesk AM (2008). "On the origin of distribution patterns of motifs in biological networks". BMC Syst Biol. 2: 73. doi:10.1186/17520509273. PMC 2538512. PMID 18700017.
 ↑ Braha, D., & Bar‐Yam, Y. (2006). From centrality to temporary fame: Dynamic centrality in complex networks. Complexity, 12(2), 5963.
 ↑ Braha D., BarYam Y. (2009) TimeDependent Complex Networks: Dynamic Centrality, Dynamic Motifs, and Cycles of Social Interactions. In: Gross T., Sayama H. (eds) Adaptive Networks. Understanding Complex Systems. Springer, Berlin, Heidelberg