更改

跳到导航 跳到搜索
添加3,818字节 、 2020年5月2日 (六) 21:43
创建页面,内容为“==原理<ref name="DeepWalk">{{cite journal | last = Bryan Perozzi | first = Rami Al-Rfou | date = 2014 | title = DeepWalk: Online Learning of Social Representation…”
==原理<ref name="DeepWalk">{{cite journal | last = Bryan Perozzi | first = Rami Al-Rfou | date = 2014 | title = DeepWalk: Online Learning of Social Representations | journal = KDD}}</ref>==
主要是利用word2vec技术将流网络嵌入到向量空间中(不能说是欧式空间),所以关键在于如何利用流网络生成语料库:给定一个流网络,从源节点出发,从其邻居中依概率选择下一步移动到的节点,概率以当前所在节点全部邻居权重归一化后来表示,概率越大到越容易被选择。重复这一选择过程,直至到达汇节点,则称完成一次随机游走。记录这一次游走的路线,它就是广义的“句子”,每一个经过的节点即广义的“单词”。重复若干次这种随机游走即可得到一系列“句子”,这就是我们需要的语料库。将它给word2vec即可完成嵌入。

==嵌入程序==
<syntaxhighlight lang="python">

import gensim
import numpy as np

NODES = []
NODES_NUM= []

# deepwalk嵌入--返回由流网络构建的语料库 g为网络图 corpus_num为句子数目
def deepwalk(_g, _corpus_num):
_corpus = []
for i in range(_corpus_num):
sentence = ['o'] # 'o'为源节点 's'表示汇节点
current_word = 'o'
while current_word != 's':
_node_list = []
_weight_list = []
for _nbr, _data in _g[current_word].items():
_node_list.append(_nbr)
_weight_list.append(_data['weight'])
_ps = [float(_weight) / sum(_weight_list) for _weight in _weight_list]
sel_node = roulette(_node_list, _ps)
sentence.append(sel_node)
current_word = sel_node
_corpus.append(sentence)
return _corpus

# 轮盘赌模型-按概率选择指定区域
def roulette(_datas, _ps):
return np.random.choice(_datas, p=_ps)

# word2vec训练词向量
def word_sequence_2_vec(_word_sequence, _dim):
model = gensim.models.Word2Vec(_word_sequence, min_count=1, size=_dim)
global NODES
NODES = model.vocab.keys()
_em_vec = []
for _node in NODES:
_em_vec.append(model[_node].tolist())
global NODES_NUM
NODES_NUM = len(NODES)
return np.mat(_em_vec)

</syntaxhighlight>

==测试样例==
<syntaxhighlight lang="python">
import networkx as nx

g = nx.DiGraph()
g.add_weighted_edges_from([('o', '1', 80), ('1', '2', 50), ('1', '3', 30), ('2', '4', 20), ('2', '5', 30),
('2', 's', 10), ('3', '2', 10), ('3', 's', 25), ('4', '5', 10), ('4', 's', 10), ('5', '3', 5), ('5', 's', 35)])

# 测试用,实际中dim和num要取得更大
dim = 3
num = 10
corpus = deepwalk(g, num) # num个句子
embed_vec = word_sequence_2_vec(corpus, dim)

</syntaxhighlight>

==结果==
网络图:

[[File:exampleflownetwork.PNG|500px]]

corpus结果:
<syntaxhighlight lang="python">
corpus =
[['o', '1', '2', 's'], ['o', '1', '2', 's'], ['o', '1', '3', 's'], ['o', '1', '3', 's'], ['o', '1', '2', 's'],
['o', '1', '3', 's'], ['o', '1', '2', '4', 's'], ['o', '1', '2', '4', 's'], ['o', '1', '2', '5', 's'], ['o', '1', '3', 's']]
</syntaxhighlight>
embed_vec结果:
<syntaxhighlight lang="python">
embed_vec =
[[-0.10715285 -0.02122051 0.05372581]
[-0.00904658 -0.01074309 0.09090474]
[-0.16823524 -0.16535626 0.15866642]
[-0.14700833 0.03448284 0.15175475]
[ 0.12771051 -0.02255488 -0.02870698]
[ 0.02428926 -0.14465304 -0.16563286]
[-0.00537517 0.15891171 0.00340934]]
</syntaxhighlight>
NODES结果:
<syntaxhighlight lang="python">
NODES = ['s', 'o', '1', '3', '2', '5', '4']
</syntaxhighlight>

==流距离嵌入与Deepwalk嵌入的对比==

请参看:[[Word2Vec与流网络]]

=参考文献=
<references />



==相关wiki==
*[[Word2Vec与流网络]]

[[Category:深度学习]]
[[Category:复杂系统]]
330

个编辑

导航菜单