LFR算法

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
Moonscar讨论 | 贡献2020年8月11日 (二) 14:58的版本 (Moved page from wikipedia:en:Lancichinetti–Fortunato–Radicchi benchmark (history))
(差异) ←上一版本 | 最后版本 (差异) | 下一版本→ (差异)
跳到导航 跳到搜索

此词条暂由彩云小译翻译,未经人工整理和审校,带来阅读不便,请见谅。

模板:Network science


Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods.[1] The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.[2]

Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.

Lancichinetti-Fortunato-Radicchi benchmark 是一种生成基准网络(类似于真实世界网络的人工网络)的算法。他们有一个先验已知的社区,并用于比较不同的社区检测方法。与其他方法相比,基准测试的优点在于它解释了节点度分布和群落规模分布的异质性。


The algorithm

The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, [math]\displaystyle{ \gamma }[/math] and [math]\displaystyle{ \beta }[/math], respectively. [math]\displaystyle{ N }[/math] is the number of nodes and the average degree is [math]\displaystyle{ \langle k \rangle }[/math]. There is a mixing parameter [math]\displaystyle{ \mu }[/math], which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.[2] Thus, it reflects the amount of noise in the network. At the extremes, when [math]\displaystyle{ \mu = 0 }[/math] all links are within community links, if [math]\displaystyle{ \mu = 1 }[/math] all links are between nodes belonging to different communities.[3]

The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, [math]\displaystyle{ \gamma }[/math] and [math]\displaystyle{ \beta }[/math], respectively. [math]\displaystyle{ N }[/math] is the number of nodes and the average degree is [math]\displaystyle{ \langle k \rangle }[/math]. There is a mixing parameter [math]\displaystyle{ \mu }[/math], which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.

节点度和社区规模按幂律分布,但指数不同。基准测试假设度和社区规模都具有幂指数分布,分别是 < math > gamma </math > 和 < math > beta </math > 。是节点的数量,平均程度是。有一个混合参数 < math > mu </math > ,它是一个节点相邻节点的平均分数,这些节点不属于基准节点所属的任何社区。这个参数控制社区之间的边的比例。


One can generate the benchmark network using the following steps.

One can generate the benchmark network using the following steps.

可以使用以下步骤生成基准网络。


Step 1: Generate a network with nodes following a power law distribution with exponent [math]\displaystyle{ \gamma }[/math] and choose extremes of the distribution [math]\displaystyle{ k_{\min} }[/math] and [math]\displaystyle{ k_{\max} }[/math] to get desired average degree is [math]\displaystyle{ \langle k\rangle }[/math].

Step 1: Generate a network with nodes following a power law distribution with exponent [math]\displaystyle{ \gamma }[/math] and choose extremes of the distribution [math]\displaystyle{ k_{\min} }[/math] and [math]\displaystyle{ k_{\max} }[/math] to get desired average degree is [math]\displaystyle{ \langle k\rangle }[/math].

< 大 > 步骤1: 生成一个网络,其节点按指数 < math > gamma </math > 的幂律分布,并选择分布的极值 < math > k { min } </math > 和 < math > k { max } </math > 来获得期望的平均程度是 < math > langle k rangle </math > 。


Step 2: [math]\displaystyle{ (1 - \mu) }[/math] fraction of links of every node is with nodes of the same community, while fraction [math]\displaystyle{ \mu }[/math] is with the other nodes.

Step 2: [math]\displaystyle{ (1 - \mu) }[/math] fraction of links of every node is with nodes of the same community, while fraction [math]\displaystyle{ \mu }[/math] is with the other nodes.

< 大 > 步骤2: < math > (1-mu) </math > 每个节点的链接分数与同一社区的节点相同,而分数 < math > mu </math > 与其他节点相同。


Step 3: Generate community sizes from a power law distribution with exponent [math]\displaystyle{ \beta }[/math]. The sum of all sizes must be equal to [math]\displaystyle{ N }[/math]. The minimal and maximal community sizes [math]\displaystyle{ s_{\min} }[/math] and [math]\displaystyle{ s_{\max} }[/math] must satisfy the definition of community so that every non-isolated node is in at least in one community:

Step 3: Generate community sizes from a power law distribution with exponent [math]\displaystyle{ \beta }[/math]. The sum of all sizes must be equal to [math]\displaystyle{ N }[/math]. The minimal and maximal community sizes [math]\displaystyle{ s_{\min} }[/math] and [math]\displaystyle{ s_{\max} }[/math] must satisfy the definition of community so that every non-isolated node is in at least in one community:

< big > 步骤3: 根据指数 < math > beta </math > 的幂律分布生成社区规模。所有大小的和必须等于 < math > n </math > 。最小和最大的群落规模 < math > 和 < math > s _ max </math > 必须满足群落的定义,这样每个非孤立的节点至少在一个群落中:


[math]\displaystyle{ s_{\min} \gt k_{\min} }[/math]
[math]\displaystyle{  s_{\min} \gt  k_{\min}  }[/math] 

[ math > s _ { min } > k _ { min }

[math]\displaystyle{ s_{\max} \gt k_{\max} }[/math]
[math]\displaystyle{  s_{\max} \gt  k_{\max}  }[/math]

[数学][数学]


Step 4: Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community.

Step 4: Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community.

< big > 步骤4: 最初,没有为社区分配节点。然后,每个节点被随机分配到一个社区。只要社区内相邻节点的数量不超过社区规模,就会向社区添加一个新节点,否则就置身事外。在接下来的迭代中,“无家可归者”节点被随机分配给某个社区。如果该社区是完整的,即。规模已经耗尽,必须解除社区中随机选择的节点的链接。当所有社区都完成并且所有节点至少属于一个社区时停止迭代。


Step 5: Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter [math]\displaystyle{ \mu }[/math].[2]

Step 5: Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter [math]\displaystyle{ \mu }[/math].

< big > 步骤5: 实现节点重新布线,保持相同的节点度,但只影响内部和外部链接的分数,这样每个节点在社区外的链接数量大约等于混合参数。


Testing

Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a [math]\displaystyle{ p(C) }[/math] distribution that represents the probability that a randomly picked node is from the community [math]\displaystyle{ C }[/math]. Consider a partition of the same network that was predicted by some community finding algorithm and has [math]\displaystyle{ p(C_2) }[/math] distribution. The benchmark partition has [math]\displaystyle{ p(C_1) }[/math] distribution.

Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a [math]\displaystyle{ p(C) }[/math] distribution that represents the probability that a randomly picked node is from the community [math]\displaystyle{ C }[/math]. Consider a partition of the same network that was predicted by some community finding algorithm and has [math]\displaystyle{ p(C_2) }[/math] distribution. The benchmark partition has [math]\displaystyle{ p(C_1) }[/math] distribution.

考虑将一个分区划分为不重叠的社区。每次迭代中随机选择的节点的社区遵循一个 < math > p (c) </math > 分布,它表示随机选择的节点来自社区 < math > c </math > 的概率。考虑同一个网络的一个分区,这个分区由一些社区搜索算法预测,并且具有 < math > p (c _ 2) </math > 分布。基准分区具有 < math > p (c _ 1) </math > 分布。

The joint distribution is [math]\displaystyle{ p(C_1, C_2) }[/math]. The similarity of these two partitions is captured by the normalized mutual information.

The joint distribution is [math]\displaystyle{ p(C_1, C_2) }[/math]. The similarity of these two partitions is captured by the normalized mutual information.

联合分布为 < math > p (c1,c2) < math > 。这两个分区的相似性可以通过规范化的相互信息获得。


[math]\displaystyle{ I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})} }[/math]
[math]\displaystyle{  I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})}  }[/math]

< math > i _ n = frac { sum _ { c _ 1,c _ 2} p (c _ 1,c _ 2) log _ 2 frac { p (c _ 1,c _ 2)}{ p (c _ 1) p (c _ 2)}{ frac _ 12 h ({ p (c _ 1)}}) + frac 12 h ({ p (c _ 2)}}} </math >


If [math]\displaystyle{ I_n=1 }[/math] the benchmark and the detected partitions are identical, and if [math]\displaystyle{ I_n=0 }[/math] then they are independent of each other.[4]

If [math]\displaystyle{ I_n=1 }[/math] the benchmark and the detected partitions are identical, and if [math]\displaystyle{ I_n=0 }[/math] then they are independent of each other.

如果 < math > i _ n = 1 </math > 基准和检测到的分区是相同的,并且如果 < math > i _ n = 0 </math > 那么它们彼此独立。


References

  1. Hua-Wei Shen (2013). "Community Structure of Complex Networks". Springer Science & Business Media. 11–12.
  2. 2.0 2.1 2.2 A. Lancichinetti, S. Fortunato, and F. Radicchi.(2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78. 模板:ArXiv
  3. Twan van Laarhoven and Elena Marchiori (2013). "Network community detection with edge classifiers trained on LFR graphs". https://www.cs.ru.nl/~elenam/paper-learning-community.pdf
  4. Barabasi, A.-L. (2014). "Network Science". Chapter 9: Communities.

Category:Algorithms

类别: 算法

Category:Random graphs

类别: 随机图

Category:Benchmarks (computing)

类别: 基准(计算)

Category:Statistical methods

类别: 统计方法


This page was moved from wikipedia:en:Lancichinetti–Fortunato–Radicchi benchmark. Its edit history can be viewed at LFR算法/edithistory