更改

LFR算法 (查看源代码)

2020年10月5日 (一) 21:38的版本

添加990字节、 2020年10月5日 (一) 21:38

无编辑摘要

第11行：第11行：

Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.

−

~~Lancichinetti~~-~~Fortunato~~-~~Radicchi benchmark~~ 是一种生成基准网络(类似于真实世界网络的人工网络)的算法。他们有一个先验已知的社区，并用于比较不同的社区检测方法。与其他方法相比，基准测试的优点在于它解释了节点度分布和群落规模分布的异质性。

+

'''兰奇基内蒂-福图纳托-拉迪奇基准程序'''是一种生成基准网络(类似于真实世界网络的人工网络)的算法。他们有一个预先已知的社区，用于比较不同的社区检测方法。与其他方法相比，基准测试的优点在于它解释了度分布和社区规模分布的异质性。

−

==The algorithm==

+

==The algorithm 算法==

The node degrees and the community sizes are distributed according to a [[power law]], with different exponents. The benchmark assumes that both the degree and the community size have [[Power law distribution|power law distributions]] with different exponents, <math>\gamma</math> and <math>\beta</math>, respectively. <math>N</math> is the number of nodes and the average degree is <math>\langle k \rangle</math>. There is a mixing parameter <math>\mu</math>, which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.<ref name="original"/> Thus, it reflects the amount of noise in the network. At the extremes, when <math>\mu = 0</math> all links are within community links, if <math> \mu = 1 </math> all links are between nodes belonging to different communities.<ref>Twan van Laarhoven and Elena Marchiori (2013). "Network community detection with edge classifiers trained on LFR graphs". https://www.cs.ru.nl/~elenam/paper-learning-community.pdf</ref>

第21行：第21行：

The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, <math>\gamma</math> and <math>\beta</math>, respectively. <math>N</math> is the number of nodes and the average degree is <math>\langle k \rangle</math>. There is a mixing parameter <math>\mu</math>, which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.

−

~~节点度和社区规模按幂律分布，但指数不同。基准测试假设度和社区规模都具有幂指数分布，分别是~~ < ~~math~~ > ~~gamma~~ </~~math~~ > 和 < ~~math~~ > ~~beta~~ </~~math~~ > ~~。是节点的数量，平均程度是。有一个混合参数~~ < ~~math~~ > mu </~~math~~ > ~~，它是一个节点相邻节点的平均分数，这些节点不属于基准节点所属的任何社区。这个参数控制社区之间的边的比例。~~

+

节点度和社区规模按幂律分布，但指数不同。基准测试假设度和社区规模都具有不同指数的幂律分布，分别为'''此处需插入公式'''和'''此处需插入公式'''。'''此处需插入公式'''是节点的数量，平均度为'''此处需插入公式'''。混合参数'''此处需插入公式'''是一个节点的相邻节点的平均比例，这些相邻节点不属于基准节点所属的任何社区。这个参数控制着社区之间的边缘比例。

第29行：第29行：

One can generate the benchmark network using the following steps.

−

~~可以使用以下步骤生成基准网络。~~

+

可以通过以下步骤生成基准网络。

第36行：第36行：

<big>Step 1:</big> Generate a network with nodes following a power law distribution with exponent <math>\gamma</math> and choose extremes of the distribution <math> k_{\min} </math> and <math> k_{\max} </math> to get desired average degree is <math>\langle k\rangle</math>.

+

--[[用户:粲兰|袁一博]]（[[用户讨论:粲兰|讨论]]）“to get desired average degree is”中的“is”怀疑是原文的误输入，因为它完全违背正确的英语语法。

−

< 大 > 步骤1: </big > ~~生成一个网络，其节点按指数~~ < ~~math~~ > ~~gamma~~ </~~math~~ > 的幂律分布，并选择分布的极值 < ~~math~~ > ~~k { min }~~ </~~math~~ > 和 < ~~math~~ > ~~k { max }~~ </~~math~~ > ~~来获得期望的平均程度是~~ < ~~math~~ > ~~langle k rangle~~ </~~math~~ > 。

+

< 大 > 步骤1: </big > 生成一个网络，其节点遵循指数为'''此处需插入公式'''的幂律分布，并选择分布的极值'''此处需插入公式'''和'''此处需插入公式'''来获得期望平均度'''此处需插入公式'''。

第45行：第46行：

<big>Step 2:</big> <math>(1 - \mu)</math> fraction of links of every node is with nodes of the same community, while fraction <math>\mu</math> is with the other nodes.

−

< 大 > 步骤2: </big > < ~~math~~ > ~~(1-mu)~~ </~~math~~ > ~~每个节点的链接分数与同一社区的节点相同，而分数~~ < ~~math~~ > mu </~~math~~ > ~~与其他节点相同。~~

+

< 大 > 步骤2: </big >每个节点的'''此处需插入公式'''链接部分与同一社区的节点相同，而'''此处需插入公式'''部分与其他节点相同。

第53行：第54行：

<big>Step 3:</big> Generate community sizes from a power law distribution with exponent <math>\beta</math>. The sum of all sizes must be equal to <math>N</math>. The minimal and maximal community sizes <math> s_{\min} </math> and <math> s_{\max} </math> must satisfy the definition of community so that every non-isolated node is in at least in one community:

−

< big > 步骤3: </big > ~~根据指数~~ < ~~math~~ > ~~beta~~ </~~math~~ > ~~的幂律分布生成社区规模。所有大小的和必须等于~~ < ~~math~~ > n </~~math~~ > ~~。最小和最大的群落规模~~ < ~~math~~ > ~~和 < math > s _ max~~ </~~math~~ > ~~必须满足群落的定义，这样每个非孤立的节点至少在一个群落中~~:

+

< big > 步骤3: </big > 根据指数为'''此处需插入公式'''的幂律分布生成社区规模。所有规模大小的和必须等于'''此处需插入公式'''。最小和最大的社区规模'''此处需插入公式'''必须满足社区的定义，这样每个非孤立的节点至少存在于一个群落中:

第75行：第76行：

<big>Step 4:</big> Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community.

−

< big > 步骤4: </big > 最初，没有为社区分配节点。然后，每个节点被随机分配到一个社区。只要社区内相邻节点的数量不超过社区规模，就会向社区添加一个新节点，否则就置身事外。在接下来的迭代中，“无家可归者”节点被随机分配给某个社区。如果该社区是完整的，即。规模已经耗尽，必须解除社区中随机选择的节点的链接。当所有社区都完成并且所有节点至少属于一个社区时停止迭代。

+

< big > 步骤4: </big > 最初，没有为任何社区分配任何节点。然后，每个节点被随机分配到一个社区。只要社区内相邻节点的数量不超过社区规模，就会向社区添加一个新节点，否则就不会添加。在接下来的迭代中，无归属的节点被随机分配给某个社区。如果该社区是完备的，即，规模已经用尽，必须随机选择社区中的一个节点并断开其链接。当所有社区都完备且所有节点都至少属于一个社区时停止迭代。

第83行：第84行：

<big>Step 5:</big> Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter <math>\mu</math>.

−

< big > 步骤5: </big > ~~实现节点重新布线，保持相同的节点度，但只影响内部和外部链接的分数，这样每个节点在社区外的链接数量大约等于混合参数。~~

+

< big > 步骤5: </big > 对节点重新布线，保持相同的节点度，但只影响内部和外部链接，使得每个节点在社区外的链接数量约等于混合参数'''此处需插入公式'''。

−

==Testing==

+

==Testing 调试==

Consider a [[Partition of a set|partition]] into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a <math>p(C)</math> distribution that represents the probability that a randomly picked node is from the community <math>C</math>. Consider a partition of the same network that was predicted by some community finding algorithm and has <math>p(C_2)</math> distribution. The benchmark partition has <math>p(C_1)</math> distribution.

第93行：第94行：

Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a <math>p(C)</math> distribution that represents the probability that a randomly picked node is from the community <math>C</math>. Consider a partition of the same network that was predicted by some community finding algorithm and has <math>p(C_2)</math> distribution. The benchmark partition has <math>p(C_1)</math> distribution.

−

~~考虑将一个分区划分为不重叠的社区。每次迭代中随机选择的节点的社区遵循一个~~ < ~~math~~ > ~~p (c)~~ </~~math~~ > ~~分布，它表示随机选择的节点来自社区~~ < ~~math~~ > c </~~math~~ > ~~的概率。考虑同一个网络的一个分区，这个分区由一些社区搜索算法预测，并且具有~~ < ~~math~~ > ~~p (c _ 2)~~ </~~math~~ > ~~分布。基准分区具有~~ < ~~math~~ > ~~p (c _ 1)~~ </~~math~~ > 分布。

+

考虑社区的一个不重叠分割。每次迭代中随机选择的节点的社区遵循一个'''此处需插入公式'''分布，这个分布表示随机选择的节点来自社区'''此处需插入公式'''的概率。考虑同一个网络的一个分割，这个分割由一些社区搜索算法预测得出，并且具有'''此处需插入公式'''分布。基准分割具有'''此处需插入公式'''分布。

The joint distribution is <math>p(C_1, C_2)</math>. The similarity of these two partitions is captured by the normalized [[mutual information]].

第99行：第100行：

The joint distribution is <math>p(C_1, C_2)</math>. The similarity of these two partitions is captured by the normalized mutual information.

−

联合分布为 < ~~math~~ > ~~p (c1，c2)~~ < ~~math~~ > ~~。这两个分区的相似性可以通过规范化的相互信息获得。~~

+

联合分布为'''此处需插入公式'''。这两个分割的相似性可以通过归一化互信息得到。

第115行：第116行：

If <math> I_n=1 </math> the benchmark and the detected partitions are identical, and if <math> I_n=0 </math> then they are independent of each other.

−

如果 < ~~math~~ > ~~i _ n = 1~~ </~~math~~ > ~~基准和检测到的分区是相同的，并且如果~~ < ~~math~~ > ~~i _ n = 0~~ </~~math~~ > ~~那么它们彼此独立。~~

+

如果'''此处需插入公式'''基准和检测到的分割是相同的，并且如果'''此处需插入公式'''，那么它们彼此独立。

−

==References==

+

==References 参考文献==

粲兰

97

个编辑

更改

LFR算法 (查看源代码)

2020年10月5日 (一) 21:38的版本

导航菜单

搜索