更改

网络科学 (查看源代码)

2020年4月22日 (三) 08:44的版本

添加2,292字节、 2020年4月22日 (三) 08:44

→‎Web link analysis

第488行：第488行：

====Web link analysis====

+

网络连接分析

+

Several [[Web search]] [[ranking]] algorithms use link-based centrality metrics, including (in order of appearance) [[Massimo Marchiori|Marchiori]]'s [[Hyper Search]], [[Google]]'s [[PageRank]], Kleinberg's [[HITS algorithm]], the [[CheiRank]] and [[TrustRank]] algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example, the analysis might be of the interlinking between politicians' web sites or blogs.

+

Several Web search ranking algorithms use link-based centrality metrics, including (in order of appearance) Marchiori's Hyper Search, Google's PageRank, Kleinberg's HITS algorithm, the CheiRank and TrustRank algorithms. Link analysis is also conducted in information science and communication science in order to understand and extract information from the structure of collections of web pages. For example, the analysis might be of the interlinking between politicians' web sites or blogs.

+

一些网络搜索排名算法使用基于链接的中心度矩阵，包括Marchiori的 Hyper Search、 Google 的 PageRank、 Kleinberg 的 HITS 算法、 CheiRank 和 TrustRank 算法。在信息科学和传播学中也进行链接分析，以便从网页的结构中理解和提取信息。例如，可以分析政客的网站或博客之间的相互联系。

+

[[用户:思无涯咿呀咿呀|思无涯咿呀咿呀]]（[[用户讨论:思无涯咿呀咿呀|讨论]]）link-based centrality metrics[[用户:思无涯咿呀咿呀|思无涯咿呀咿呀]]（[[用户讨论:思无涯咿呀咿呀|讨论]]）

=====PageRank=====

+

[[PageRank]] works by randomly picking "nodes" or websites and then with a certain probability, "randomly jumping" to other nodes. By randomly jumping to these other nodes, it helps PageRank completely traverse the network as some webpages exist on the periphery and would not as readily be assessed.

+

PageRank works by randomly picking "nodes" or websites and then with a certain probability, "randomly jumping" to other nodes. By randomly jumping to these other nodes, it helps PageRank completely traverse the network as some webpages exist on the periphery and would not as readily be assessed.

Each node, <math>x_i</math>, has a PageRank as defined by the sum of pages <math>j</math> that link to <math>i</math> times one over the outlinks or "out-degree" of <math>j</math> times the "importance" or PageRank of <math>j</math>.

+

PageRank算法的工作原理是随机选择“节点”或网站，然后以一定的概率“随机跳转”到其他节点。通过随机跳转到这些其他节点，它帮助 PageRank算法完全遍历网络，因为可能一些不容易被跳转到的边缘网站。

: <math>x_i = \sum_{j\rightarrow i}{1\over N_j}x_j^{(k)}</math>

======Random jumping======

+

随机跳转

As explained above, PageRank enlists random jumps in attempts to assign PageRank to every website on the internet. These random jumps find websites that might not be found during the normal search methodologies such as [[Breadth-First Search]] and [[Depth-First Search]].

+

正如上面解释的那样，试图通过随机跳转，为互联网上的每个网站分配网页排名。通过随机跳转可以找到一些在正常的搜索方法（如广度优先搜索和深度优先搜索）中找不到的边缘网站。

+

In an improvement over the aforementioned formula for determining PageRank includes adding these random jump components. Without the random jumps, some pages would receive a PageRank of 0 which would not be good.

+

在上述公式中，该算法的主要提升是添加了随机跳转，没有这些随机跳转，一些网页的排名可能就是0，这样是非常不好的。

The first is <math>\alpha</math>, or the probability that a random jump will occur. Contrasting is the "damping factor", or <math>1 - \alpha</math>.

−

+

第一个字母<math>\alpha</math>，代表的是随机跳转发生的概率。与此相对的是阻尼因子，对应的是<math>1 - \alpha</math>。

: <math>R{(p)} = {\alpha\over N} + (1 - \alpha) \sum_{j\rightarrow i} {1\over N_j} x_j^{(k)}</math>

Another way of looking at it:

−

+

从另一个角度来看

: <math>R(A) = \sum {R_B\over B_\text{(outlinks)}} + \cdots + {R_n \over n_\text{(outlinks)}}</math>

思无涯咿呀咿呀

管理员

2,443

个编辑

更改

网络科学 (查看源代码)

2020年4月22日 (三) 08:44的版本

导航菜单

搜索