第1行: |
第1行: |
| {{#seo: | | {{#seo: |
− | |keywords=致命弱点,希腊神话,特洛伊战争 | + | |keywords=Lancichinetti–Fortunato–Radicchi benchmark,基准网络,节点度分布,社区规模分布 |
| |description=致命弱点,希腊神话,特洛伊战争 | | |description=致命弱点,希腊神话,特洛伊战争 |
| }} | | }} |
| | | |
− | '''Lancichinetti–Fortunato–Radicchi''' '''benchmark''' is an algorithm that generates [[Benchmark (computing)|benchmark]] networks (artificial networks that resemble real-world networks). They have ''a priori'' known [[Community structure|communities]] and are used to compare different community detection methods.<ref>Hua-Wei Shen (2013). "Community Structure of Complex Networks". Springer Science & Business Media. 11–12.</ref> The advantage of the benchmark over other methods is that it accounts for the [[Homogeneity (statistics)|heterogeneity]] in the distributions of [[Vertex (graph theory)|node]] [[Degree (graph theory)|degrees]] and of community sizes. | + | '''<font color="#ff8000">兰奇基内蒂-福图纳托-拉迪奇基准测试 Lancichinetti–Fortunato–Radicchi benchmark (LFR)</font>'''是一种生成'''基准网络 baseline network'''(类似于真实世界网络的人工网络)的算法。他们有一个预先已知的社区,用于比较不同的社区检测方法。<ref>Hua-Wei Shen (2013). "Community Structure of Complex Networks". Springer Science & Business Media. 11–12.</ref>与其他方法相比,基准测试的优点在于它解释了'''节点度分布'''和'''社区规模分布'''的[[异质性]]。<ref name="original">A. Lancichinetti, S. Fortunato, and F. Radicchi.(2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78. {{ArXiv|0805.4770}}</ref> |
| | | |
− | Lancichinetti–Fortunato–Radicchi benchmark is an algorithm that generates benchmark networks (artificial networks that resemble real-world networks). They have a priori known communities and are used to compare different community detection methods. The advantage of the benchmark over other methods is that it accounts for the heterogeneity in the distributions of node degrees and of community sizes.
| |
| | | |
− | '''<font color="#ff8000">兰奇基内蒂-福图纳托-拉迪奇基准测试 Lancichinetti–Fortunato–Radicchi benchmark</font>'''是一种生成'''基准网络 baseline network'''(类似于真实世界网络的人工网络)的算法。他们有一个预先已知的社区,用于比较不同的社区检测方法。与其他方法相比,基准测试的优点在于它解释了节点度分布和社区规模分布的异质性。<ref name="original">A. Lancichinetti, S. Fortunato, and F. Radicchi.(2008) Benchmark graphs for testing community detection algorithms. Physical Review E, 78. {{ArXiv|0805.4770}}</ref>
| |
| | | |
| + | ==算法== |
| | | |
| + | '''节点度'''和''' 社区规模 '''按[[幂律分布]],但指数不同。基准测试假设节点度和社区规模都具有不同指数的幂律分布,分别为<math>\gamma</math>和<math>\beta</math>。<math>N</math>是节点的数量,平均度为<math>\langle k \rangle</math>。混合参数<math>\mu</math>是一个节点的相邻节点的平均比例,这些相邻节点不属于基准节点所属的任何社区。这个参数控制着社区之间的边缘比例。<ref>Twan van Laarhoven and Elena Marchiori (2013). "Network community detection with edge classifiers trained on LFR graphs". https://www.cs.ru.nl/~elenam/paper-learning-community.pdf</ref> |
| | | |
− | ==The algorithm 算法==
| |
| | | |
− | The node degrees and the community sizes are distributed according to a [[power law]], with different exponents. The benchmark assumes that both the degree and the community size have [[Power law distribution|power law distributions]] with different exponents, <math>\gamma</math> and <math>\beta</math>, respectively. <math>N</math> is the number of nodes and the average degree is <math>\langle k \rangle</math>. There is a mixing parameter <math>\mu</math>, which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.<ref name="original"/> Thus, it reflects the amount of noise in the network. At the extremes, when <math>\mu = 0</math> all links are within community links, if <math> \mu = 1 </math> all links are between nodes belonging to different communities.<ref>Twan van Laarhoven and Elena Marchiori (2013). "Network community detection with edge classifiers trained on LFR graphs". https://www.cs.ru.nl/~elenam/paper-learning-community.pdf</ref>
| + | 生成基准网络的步骤如下。 |
| | | |
− | The node degrees and the community sizes are distributed according to a power law, with different exponents. The benchmark assumes that both the degree and the community size have power law distributions with different exponents, <math>\gamma</math> and <math>\beta</math>, respectively. <math>N</math> is the number of nodes and the average degree is <math>\langle k \rangle</math>. There is a mixing parameter <math>\mu</math>, which is the average fraction of neighboring nodes of a node that do not belong to any community that the benchmark node belongs to. This parameter controls the fraction of edges that are between communities.
| + | :步骤1:生成一个网络,其节点遵循指数为<math>\gamma</math>的幂律分布,并选择分布的极值<math> k_{\min} </math>和<math> k_{\max} </math>来获得期望平均度<math>\langle k\rangle</math>'。 |
| | | |
− | '''<font color="#ff8000">节点度(node degree)</font>'''和'''<font color="#ff8000">社区规模(community sizes)</font>'''按幂律分布,但指数不同。基准测试假设'''<font color="#ff8000">节点度(node degree)</font>'''和'''<font color="#ff8000">社区规模(community sizes)</font>'''都具有不同指数的'''<font color="#ff8000">幂律分布(power law distribution)</font>''',分别为'''<font color="#32CD32">此处需插入公式</font>'''和'''<font color="#32CD32">此处需插入公式</font>'''。'''<font color="#32CD32">此处需插入公式</font>'''是节点的数量,平均度为'''<font color="#32CD32">此处需插入公式</font>'''。混合参数'''<font color="#32CD32">此处需插入公式</font>'''是一个节点的相邻节点的平均比例,这些相邻节点不属于基准节点所属的任何社区。这个参数控制着社区之间的边缘比例。
| |
| | | |
− | | + | :步骤2:每个节点的<math>(1 - \mu)</math>链接部分与同一社区的节点相同,而<math>\mu</math>部分与其他节点相同。 |
− | | |
− | One can generate the benchmark network using the following steps.
| |
− | | |
− | One can generate the benchmark network using the following steps.
| |
− | | |
− | 可以通过以下步骤生成基准网络。
| |
− | | |
− | | |
− | | |
− | <big>'''Step 1:'''</big> Generate a network with nodes following a power law distribution with exponent <math>\gamma</math> and choose extremes of the distribution <math> k_{\min} </math> and <math> k_{\max} </math> to get desired average degree is <math>\langle k\rangle</math>.
| |
− | | |
− | <big>Step 1:</big> Generate a network with nodes following a power law distribution with exponent <math>\gamma</math> and choose extremes of the distribution <math> k_{\min} </math> and <math> k_{\max} </math> to get desired average degree is <math>\langle k\rangle</math>.
| |
− | --[[用户:粲兰|袁一博]]([[用户讨论:粲兰|讨论]])“to get desired average degree is”中的“is”怀疑是原文的误输入,因为它完全违背正确的英语语法。
| |
− | | |
− | < 大 > 步骤1: </big > 生成一个网络,其节点遵循指数为'''<font color="#32CD32">此处需插入公式</font>'''的幂律分布,并选择分布的极值'''<font color="#32CD32">此处需插入公式</font>'''和'''<font color="#32CD32">此处需插入公式</font>'''来获得期望平均度'''<font color="#32CD32">此处需插入公式</font>'''。
| |
− | | |
− | | |
− | | |
− | <big>'''Step 2:'''</big> <math>(1 - \mu)</math> fraction of links of every node is with nodes of the same community, while fraction <math>\mu</math> is with the other nodes.
| |
− | | |
− | <big>Step 2:</big> <math>(1 - \mu)</math> fraction of links of every node is with nodes of the same community, while fraction <math>\mu</math> is with the other nodes.
| |
− | | |
− | < 大 > 步骤2: </big >每个节点的'''<font color="#32CD32">此处需插入公式</font>'''链接部分与同一社区的节点相同,而'''<font color="#32CD32">此处需插入公式</font>'''部分与其他节点相同。
| |
| | | |
| | | |
第51行: |
第26行: |
| <big>Step 3:</big> Generate community sizes from a power law distribution with exponent <math>\beta</math>. The sum of all sizes must be equal to <math>N</math>. The minimal and maximal community sizes <math> s_{\min} </math> and <math> s_{\max} </math> must satisfy the definition of community so that every non-isolated node is in at least in one community: | | <big>Step 3:</big> Generate community sizes from a power law distribution with exponent <math>\beta</math>. The sum of all sizes must be equal to <math>N</math>. The minimal and maximal community sizes <math> s_{\min} </math> and <math> s_{\max} </math> must satisfy the definition of community so that every non-isolated node is in at least in one community: |
| | | |
− | < big > 步骤3: </big > 根据指数为'''<font color="#32CD32">此处需插入公式</font>'''的幂律分布生成'''<font color="#ff8000">社区规模(community sizes)</font>'''。所有规模大小的和必须等于'''<font color="#32CD32">此处需插入公式</font>'''。最小和最大的'''<font color="#ff8000">社区规模(community sizes)</font>''' '''<font>color="#32CD32">此处需插入公式</font>'''必须满足社区的定义,这样每个非孤立的节点至少存在于一个群落中:
| + | :步骤3:根据指数为<math>\beta</math>的幂律分布生成社区规模。所有规模大小的和必须等于<math>N</math>。最小和最大的<math> s_{\min} </math><math> s_{\max} </math>必须满足社区的定义,这样每个非孤立的节点至少存在于一个群落中: |
− | | |
− | | |
| | | |
| : <math> s_{\min} > k_{\min} </math> | | : <math> s_{\min} > k_{\min} </math> |
− |
| |
− | <math> s_{\min} > k_{\min} </math>
| |
− |
| |
− | [ math > s _ { min } > k _ { min }
| |
| | | |
| : <math> s_{\max} > k_{\max} </math> | | : <math> s_{\max} > k_{\max} </math> |
| | | |
− | <math> s_{\max} > k_{\max} </math>
| |
| | | |
− | [数学][数学]
| + | :步骤4:最初,没有为任何社区分配任何节点。然后,每个节点被随机分配到一个社区。只要社区内相邻节点的数量不超过社区规模,就会向社区添加一个新节点,否则就不会添加。在接下来的迭代中,无归属的节点被随机分配给某个社区。如果该社区是完备的,即规模已经用尽,必须随机选择社区中的一个节点并断开其链接。当所有社区都完备且所有节点都至少属于一个社区时停止迭代。 |
| | | |
| | | |
| + | :步骤5:对节点重新布线,保持相同的节点度,但只影响内部和外部链接,使得每个节点在社区外的链接数量约等于混合参数<math>\mu</math>。<ref name="original"/> |
| | | |
− | <big>'''Step 4:'''</big> Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community.
| + | ==调试== |
| | | |
− | <big>Step 4:</big> Initially, no nodes are assigned to communities. Then, each node is randomly assigned to a community. As long as the number of neighboring nodes within the community does not exceed the community size a new node is added to the community, otherwise stays out. In the following iterations the “homeless” node is randomly assigned to some community. If that community is complete, i.e. the size is exhausted, a randomly selected node of that community must be unlinked. Stop the iteration when all the communities are complete and all the nodes belong to at least one community. | + | 考虑社区的一个不重叠分割。每次迭代中随机选择的节点的社区遵循一个<math>p(C)</math>分布,这个分布表示随机选择的节点来自社区<math>C</math>的概率。考虑同一个网络的一个分割,这个分割由一些社区搜索算法预测得出,并且具有<math>p(C_2)</math>分布。基准分割具有<math>p(C_1)</math>分布。 |
| | | |
− | < big > 步骤4: </big > 最初,没有为任何社区分配任何节点。然后,每个节点被随机分配到一个社区。只要社区内相邻节点的数量不超过'''<font color="#ff8000">社区规模(community sizes)</font>''',就会向社区添加一个新节点,否则就不会添加。在接下来的迭代中,无归属的节点被随机分配给某个社区。如果该社区是完备的,即规模已经用尽,必须随机选择社区中的一个节点并断开其链接。当所有社区都完备且所有节点都至少属于一个社区时停止迭代。
| |
− |
| |
− |
| |
− |
| |
− | <big>'''Step 5:'''</big> Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter <math>\mu</math>.<ref name="original"/>
| |
− |
| |
− | <big>Step 5:</big> Implement rewiring of nodes keeping the same node degrees but only affecting the fraction of internal and external links such that the number of links outside the community for each node is approximately equal to the mixing parameter <math>\mu</math>.
| |
− |
| |
− | < big > 步骤5: </big > 对节点重新布线,保持相同的节点度,但只影响内部和外部链接,使得每个节点在社区外的链接数量约等于混合参数'''<font color="#32CD32">此处需插入公式</font>'''。
| |
− |
| |
− | ==Testing 调试==
| |
− |
| |
− | Consider a [[Partition of a set|partition]] into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a <math>p(C)</math> distribution that represents the probability that a randomly picked node is from the community <math>C</math>. Consider a partition of the same network that was predicted by some community finding algorithm and has <math>p(C_2)</math> distribution. The benchmark partition has <math>p(C_1)</math> distribution.
| |
− |
| |
− | Consider a partition into communities that do not overlap. The communities of randomly chosen nodes in each iteration follow a <math>p(C)</math> distribution that represents the probability that a randomly picked node is from the community <math>C</math>. Consider a partition of the same network that was predicted by some community finding algorithm and has <math>p(C_2)</math> distribution. The benchmark partition has <math>p(C_1)</math> distribution.
| |
− |
| |
− | 考虑社区的一个不重叠分割。每次迭代中随机选择的节点的社区遵循一个'''<font color="#32CD32">此处需插入公式</font>'''分布,这个分布表示随机选择的节点来自社区'''<font color="#32CD32">此处需插入公式</font>'''的概率。考虑同一个网络的一个分割,这个分割由一些社区搜索算法预测得出,并且具有'''<font color="#32CD32">此处需插入公式</font>'''分布。基准分割具有'''<font color="#32CD32">此处需插入公式</font>'''分布。
| |
− |
| |
− | The joint distribution is <math>p(C_1, C_2)</math>. The similarity of these two partitions is captured by the normalized [[mutual information]].
| |
− |
| |
− | The joint distribution is <math>p(C_1, C_2)</math>. The similarity of these two partitions is captured by the normalized mutual information.
| |
− |
| |
− | 联合分布为'''<font color="#32CD32">此处需插入公式</font>'''。这两个分割的相似性可以通过'''<font color="#ff8000">归一化互信息</font>'''得到。
| |
| | | |
| + | 联合分布为<math>p(C_1, C_2)</math>。这两个分割的相似性可以通过'''<font color="#ff8000">归一化互信息</font>'''得到。 |
| | | |
| | | |
| : <math> I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})} </math> | | : <math> I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})} </math> |
| | | |
− | <math> I_n = \frac{\sum_{C_1,C_2} p(C_1,C_2) \log_2 \frac{p(C_1,C_2)}{p(C_1)p(C_2)} }{\frac 1 2 H(\{p(C_1)\}) + \frac 1 2 H(\{p(C_2)\})} </math>
| |
− |
| |
− | < math > i _ n = frac { sum _ { c _ 1,c _ 2} p (c _ 1,c _ 2) log _ 2 frac { p (c _ 1,c _ 2)}{ p (c _ 1) p (c _ 2)}{ frac _ 12 h ({ p (c _ 1)}}) + frac 12 h ({ p (c _ 2)}}} </math >
| |
− |
| |
− |
| |
− |
| |
− | If <math> I_n=1 </math> the benchmark and the detected partitions are identical, and if <math> I_n=0 </math> then they are independent of each other.<ref>Barabasi, A.-L. (2014). "Network Science". Chapter 9: Communities.</ref>
| |
− |
| |
− | If <math> I_n=1 </math> the benchmark and the detected partitions are identical, and if <math> I_n=0 </math> then they are independent of each other.
| |
| | | |
− | 如果'''<font color="#32CD32">此处需插入公式</font>'''基准和检测到的分割是相同的,并且如果'''<font color="#32CD32">此处需插入公式</font>''',那么它们彼此独立。 | + | 如果<math> I_n=1 </math>基准和检测到的分割相同,且<math> I_n=0 </math>,那么它们彼此独立。<ref>Barabasi, A.-L. (2014). "Network Science". Chapter 9: Communities.</ref> |
| | | |
| | | |
| | | |
− | ==References 参考文献== | + | ==参考文献== |
| | | |
| {{Reflist}} | | {{Reflist}} |