LFR算法

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
黄秋莉讨论 | 贡献2020年10月22日 (四) 23:05的版本
跳到导航 跳到搜索

此词条由袁一博翻译,未经人工整理和审校,带来阅读不便,请见谅。

模板:Network science


Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods.[1] The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.[2]

Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.

兰奇基内蒂-福图纳托-拉迪奇基准程序(Lancichinetti–Fortunato–Radicchi benchmark)是一种生成基准网络(baseline network)(类似于真实世界网络的人工网络)的算法。他们有一个预先已知的社区,用于比较不同的社区检测方法。与其他方法相比,基准测试的优点在于它解释了节点度(node degree)分布和社区规模(community sizes)分布的异质性。


The algorithm 算法

The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, [math]\displaystyle{ \gamma }[/math] and [math]\displaystyle{ \beta }[/math], respectively. [math]\displaystyle{ N }[/math] is the number of nodes and the average degree is [math]\displaystyle{ \langle k \rangle }[/math]. There is a mixing parameter [math]\displaystyle{ \mu }[/math], which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.[2] Thus, it reflects the amount of noise in the network. At the extremes, when [math]\displaystyle{ \mu = 0 }[/math] all links are within community links, if [math]\displaystyle{ \mu = 1 }[/math] all links are between nodes belonging to different communities.[3]

The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, [math]\displaystyle{ \gamma }[/math] and [math]\displaystyle{ \beta }[/math], respectively. [math]\displaystyle{ N }[/math] is the number of nodes and the average degree is [math]\displaystyle{ \langle k \rangle }[/math]. There is a mixing parameter [math]\displaystyle{ \mu }[/math], which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.

节点度(node degree)社区规模(community sizes)按幂律分布,但指数不同。基准测试假设度(node degree)社区规模(community sizes)都具有不同指数的幂律分布(power law distribution),分别为此处需插入公式此处需插入公式此处需插入公式是节点的数量,平均度为此处需插入公式。混合参数此处需插入公式是一个节点的相邻节点的平均比例,这些相邻节点不属于基准节点所属的任何社区。这个参数控制着社区之间的边缘比例。


One can generate the benchmark network using the following steps.

One can generate the benchmark network using the following steps.

可以通过以下步骤生成基准网络。


Step 1: Generate a network with nodes following a power law distribution with exponent [math]\displaystyle{ \gamma }[/math] and choose extremes of the distribution [math]\displaystyle{ k_{\min} }[/math] and [math]\displaystyle{ k_{\max} }[/math] to get desired average degree is [math]\displaystyle{ \langle k\rangle }[/math].

Step 1: Generate a network with nodes following a power law distribution with exponent [math]\displaystyle{ \gamma }[/math] and choose extremes of the distribution [math]\displaystyle{ k_{\min} }[/math] and [math]\displaystyle{ k_{\max} }[/math] to get desired average degree is [math]\displaystyle{ \langle k\rangle }[/math].

 --袁一博讨论)“to get desired average degree is”中的“is”怀疑是原文的误输入,因为它完全违背正确的英语语法。

< 大 > 步骤1: 生成一个网络,其节点遵循指数为此处需插入公式的幂律分布,并选择分布的极值此处需插入公式此处需插入公式来获得期望平均度此处需插入公式


Step 2: [math]\displaystyle{ (1 - \mu) }[/math] fraction of links of every node is with nodes of the same community, while fraction [math]\displaystyle{ \mu }[/math] is with the other nodes.

Step 2: [math]\displaystyle{ (1 - \mu) }[/math] fraction of links of every node is with nodes of the same community, while fraction [math]\displaystyle{ \mu }[/math] is with the other nodes.

< 大 > 步骤2: 每个节点的此处需插入公式链接部分与同一社区的节点相同,而此处需插入公式部分与其他节点相同。


Step 3: Generate community sizes from a power law distribution with exponent [math]\displaystyle{ \beta }[/math]. The sum of all sizes must be equal to [math]\displaystyle{ N }[/math]. The minimal and maximal community sizes [math]\displaystyle{ s_{\min} }[/math] and [math]\displaystyle{ s_{\max} }[/math] must satisfy the definition of community so that every non-isolated node is in at least in one community:

Step 3: Generate community sizes from a power law distribution with exponent [math]\displaystyle{ \beta }[/math]. The sum of all sizes must be equal to [math]\displaystyle{ N }[/math]. The minimal and maximal community sizes [math]\displaystyle{ s_{\min} }[/math] and [math]\displaystyle{ s_{\max} }[/math] must satisfy the definition of community so that every non-isolated node is in at least in one community:

< big > 步骤3: 根据指数为'此处需插入公式的幂律分布生成社区规模(community sizes)。所有规模大小的和必须等于此处需插入公式。最小和最大的社区规模(community sizes)'此处需插入公式必须满足社区的定义,这样每个非孤立的节点至少存在于一个群落中:


[math]\displaystyle{ s_{\min} \gt k_{\min} }[/math]
[math]\displaystyle{  s_{\min} \gt  k_{\min}  }[/math] 

[ math > s _ { min } > k _ { min }

[math]\displaystyle{ s_{\max} \gt k_{\max} }[/math]
[math]\displaystyle{  s_{\max} \gt  k_{\max}  }[/math]

[数学][数学]


Step 4: Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community.

Step 4: Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community.

< big > 步骤4: 最初,没有为任何社区分配任何节点。然后,每个节点被随机分配到一个社区。只要社区内相邻节点的数量不超过社区规模(community sizes),就会向社区添加一个新节点,否则就不会添加。在接下来的迭代中,无归属的节点被随机分配给某个社区。如果该社区是完备的,即规模已经用尽,必须随机选择社区中的一个节点并断开其链接。当所有社区都完备且所有节点都至少属于一个社区时停止迭代。


Step 5: Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter [math]\displaystyle{ \mu }[/math].[2]

Step 5: Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter [math]\displaystyle{ \mu }[/math].

< big > 步骤5: 对节点重新布线,保持相同的节点度,但只影响内部和外部链接,使得每个节点在社区外的链接数量约等于混合参数此处需插入公式


Testing 调试

Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a [math]\displaystyle{ p(C) }[/math] distribution that represents the probability that a randomly picked node is from the community [math]\displaystyle{ C }[/math]. Consider a partition of the same network that was predicted by some community finding algorithm and has [math]\displaystyle{ p(C_2) }[/math] distribution. The benchmark partition has [math]\displaystyle{ p(C_1) }[/math] distribution.

Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a [math]\displaystyle{ p(C) }[/math] distribution that represents the probability that a randomly picked node is from the community [math]\displaystyle{ C }[/math]. Consider a partition of the same network that was predicted by some community finding algorithm and has [math]\displaystyle{ p(C_2) }[/math] distribution. The benchmark partition has [math]\displaystyle{ p(C_1) }[/math] distribution.

考虑社区的一个不重叠分割。每次迭代中随机选择的节点的社区遵循一个此处需插入公式分布,这个分布表示随机选择的节点来自社区此处需插入公式的概率。考虑同一个网络的一个分割,这个分割由一些社区搜索算法预测得出,并且具有此处需插入公式分布。基准分割具有此处需插入公式分布。

The joint distribution is [math]\displaystyle{ p(C_1, C_2) }[/math]. The similarity of these two partitions is captured by the normalized mutual information.

The joint distribution is [math]\displaystyle{ p(C_1, C_2) }[/math]. The similarity of these two partitions is captured by the normalized mutual information.

联合分布为此处需插入公式。这两个分割的相似性可以通过归一化互信息得到。


[math]\displaystyle{ I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})} }[/math]
[math]\displaystyle{  I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})}  }[/math]

< math > i _ n = frac { sum _ { c _ 1,c _ 2} p (c _ 1,c _ 2) log _ 2 frac { p (c _ 1,c _ 2)}{ p (c _ 1) p (c _ 2)}{ frac _ 12 h ({ p (c _ 1)}}) + frac 12 h ({ p (c _ 2)}}} </math >


If [math]\displaystyle{ I_n=1 }[/math] the benchmark and the detected partitions are identical, and if [math]\displaystyle{ I_n=0 }[/math] then they are independent of each other.[4]

If [math]\displaystyle{ I_n=1 }[/math] the benchmark and the detected partitions are identical, and if [math]\displaystyle{ I_n=0 }[/math] then they are independent of each other.

如果此处需插入公式基准和检测到的分割是相同的,并且如果此处需插入公式,那么它们彼此独立。


References 参考文献

  1. Hua-Wei Shen (2013). "Community Structure of Complex Networks". Springer Science & Business Media. 11–12.
  2. 2.0 2.1 2.2 A. Lancichinetti, S. Fortunato, and F. Radicchi.(2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78. 模板:ArXiv
  3. Twan van Laarhoven and Elena Marchiori (2013). "Network community detection with edge classifiers trained on LFR graphs". https://www.cs.ru.nl/~elenam/paper-learning-community.pdf
  4. Barabasi, A.-L. (2014). "Network Science". Chapter 9: Communities.

Category:Algorithms

类别: 算法

Category:Random graphs

类别: 随机图

Category:Benchmarks (computing)

类别: 基准(计算)

Category:Statistical methods

类别: 统计方法


This page was moved from wikipedia:en:Lancichinetti–Fortunato–Radicchi benchmark. Its edit history can be viewed at LFR算法/edithistory