流网络根据流距离嵌入空间
我们的目的是根据流网络的流距离(平均首达时间矩阵L)矩阵,将整个网络嵌入到一个欧氏空间里面去,使得任意两点的欧氏距离尽量靠近他们的流距离.[1][2]
原理解析
给定某流网络,假想网络中各节点两两之间用弹簧相互连接,嵌入要做的即调整弹簧,使其松弛长度等于流距离。初始时各节点在欧式空间的坐标完全随机,以此我们可以计算节点两两之间的欧式距离。如果两节点间的流距离大于相应的欧式距离,表示弹簧被压缩了,那么我们应该拉伸这两节点间的弹簧;反之,我们应该压缩该弹簧。具体拉伸或者压缩多少取决于弹簧系数k及流距离和欧式距离的偏差x,即形变量=k*x。我们用该形变量去更新相应的节点欧式位置。反复这一过程,直至收敛。(注:实际运用之,我们常常不考虑流网络的源和汇节点,即流距离矩阵中不需要含有源和汇的项)
嵌入程序(Matlab版本)
初始准备
此处程序没有考虑流矩阵中无穷大的情况
输入变量流距离矩阵-flow_distance,节点编号-nodes_num,嵌入维数dim
dim=10;
max_distance=max(max(flow_distance));
max_x=max_distance;
max_y=max_x;
%为每个节点生成随机位置
initial_pos=[];
for i=1:length(nodes_num)
tmp=[];
for j=1:dim
tmp=[tmp max_x-2*max_x*rand(1)];
end
initial_pos=[initial_pos;tmp];
end
第一步优化,按照欧氏距离和流距离的差别大小受力
开始运动
current_pos=initial_pos;
iterations=1;
complete_temperature_iterations = 0;
current_target= compute_distance(initial_pos,flow_distance );
basic_size=20;
target=[];
all_temperature=[];
probility=[];
while iterations< 200
iterations
%-------------产生新的坐标-------------%
new_pos=[];
distance_matrix=compute_distance_matrix(current_pos);
k=1.0000e-0;
for i=1:size(current_pos,1)
%计算每一个节点受到的合力
%计算节点到其他节点距离和流距离的差值
diff_distance=(distance_matrix(i,:)-flow_distance(i,:))';
force_direction=-1*(repmat(current_pos(i,:),size(current_pos,1),1)-current_pos);
force_direction=force_direction./repmat(sqrt(sum(force_direction.^2,2)),1,size(current_pos,2)) ;
force_direction(isnan(force_direction))=0;
force_vector=diff_distance;
final_force_vector=force_direction.*repmat(force_vector,1,size(current_pos,2));
all_force=sum(final_force_vector,1)/length(diff_distance);
new_pos=[new_pos;current_pos(i,:)+all_force*k];
end
%-------------比较新的位置与原来的位置-------------%
current_distance=compute_distance(current_pos,flow_distance);
new_distance=compute_distance(new_pos,flow_distance);
diff = abs(current_distance - new_distance);
target=[target;current_distance];
if complete_temperature_iterations >= 10
temperature = cooling_rate*temperature;
complete_temperature_iterations = 0;
end
iterations = iterations + 1;
complete_temperature_iterations = complete_temperature_iterations + 1;
current_pos=new_pos;
end
第二步优化,按照每个节点失真程度大小受力
开始运动
initial_pos=new_pos;
iterations=1;
complete_temperature_iterations = 0;
current_target= compute_distance(initial_pos,flow_distance );
cooling_rate=0.9;
basic_size=20;
target=[];
all_temperature=[];
probility=[];
while iterations< 200
iterations
%-------------产生新的坐标-------------%
new_pos=[];
distance_matrix=compute_distance_matrix(current_pos);
distortion=distance_matrix./flow_distance-1;
distortion(isnan(distortion))=0;
k=30.0000e-0;
for i=1:size(current_pos,1)
%计算每一个节点受到的合力
%计算节点到其他节点距离和流距离的差值
diff_distance=distortion(:,i);
force_direction=-1*(repmat(current_pos(i,:),size(current_pos,1),1)-current_pos);
force_direction=force_direction./repmat(sqrt(sum(force_direction.^2,2)),1,size(current_pos,2)) ;
force_direction(isnan(force_direction))=0;
force_vector=diff_distance;
final_force_vector=force_direction.*repmat(force_vector,1,size(current_pos,2));
all_force=sum(final_force_vector,1)/length(diff_distance);
new_pos=[new_pos;current_pos(i,:)+all_force*k];
end
%-------------比较新的位置与原来的位置-------------%
current_distance=compute_distance(current_pos,flow_distance);
new_distance=compute_distance(new_pos,flow_distance);
diff = abs(current_distance - new_distance);
target=[target;current_distance];
iterations = iterations + 1;
all_temperature=[all_temperature;temperature];
end
计算拟合效果
横轴是流距离,纵轴是欧式距离
figure
tmp=bone_L;
tmp(logical(eye(size(bone_L,1))))=1;
tmp_L=flow_distance(find(tmp>0));
tmp_dis=distance_matrix(find(tmp>0));
plot(tmp_L(:),tmp_dis(:),'.')
hold on
plot([0; max([max(tmp_dis(:)) max(tmp_dis(:))])],[0;max([max(tmp_dis(:)) max(tmp_dis(:))])],'r-')
平均失真程度
distortion=tmp_dis./tmp_L;
distortion(isnan(distortion))=0;
distortion(find(distortion<1&distortion>0))=1./distortion(find(distortion<1&distortion>0));
average_distortion=sum(sum(distortion))/length(find(distortion>0))
中间计算距离的函数
现在距离与流距离的总差值
function [ target_distance ] = compute_distance(new_pos,flow_distance )
%UNTITLED Summary of this function goes here
%Detailed explanation goes here
new_distance_matrix=compute_distance_matrix(new_pos);
target_distance=sum(sum((new_distance_matrix-flow_distance).^2));
end
根据节点坐标,计算节点之间距离
function [ distance_matrix] = compute_distance_matrix( nodes )
%UNTITLED3 Summary of this function goes here
% Detailed explanation goes here
distance_matrix=zeros(size(nodes,1),size(nodes,1));
for i=1:size(nodes,1)
for j=i+1:size(nodes,1)
distance_matrix(i,j)=sqrt((nodes(i,1)-nodes(j,1))^2+(nodes(i,2)-nodes(j,2))^2);
distance_matrix(j,i)= distance_matrix(i,j);
end
end
flow_distance=distance_matrix;
end
嵌入程序(python版本)
输入参数
def spring_algorithm(_c, _dim):
pass
_c:去掉源和汇节点的流距离矩阵,且默认已经将流距离无穷大的节点做处理了
_dim:嵌入维数
初始化
# 最大小幅变化次数(可弹性调整)
max_count = 10
# 弹簧系数(可弹性调整,一般取在(0,1)之间,且k1>k2)
k1, k2 = [0.8, 0.5]
# 阈值(可弹性调整, 依据具体问题而定)
threshold1, threshold2 = [0.05, 0.01]
# 各节点初始位置(dim维)
initial_pos = np.random.rand(_c.shape[0], _dim) # [0, 1)
Step1 按照欧氏距离和流距离的差别大小受力(快速调整)
# step1 按照欧氏距离和流距离的差别大小受力
total_pos_mat = np.mat(initial_pos)
t1 = 0
# 连续小幅度变化的次数
small_count1 = 0
# 与真实流距离的误差
error_list = []
# 上一次迭代的总距离误差
last_distance_sum_error = float("inf")
while True:
print '距离差受力第%d次迭代' % t1
for _i in range(_c.shape[0]):
node_i_pos_mat = total_pos_mat[_i]
repeat_node_i_pos_mat = np.matlib.repmat(node_i_pos_mat, _c.shape[0], 1) # 重复len(_c)*1次
##### 节点至其它节点的欧式距离euclidean_distance #####
distance_offset_mat = repeat_node_i_pos_mat - total_pos_mat # 当前节点i与其它节点实际位置偏移矩阵
dis_square = np.multiply(distance_offset_mat, distance_offset_mat) # 对应元素相乘
euclidean_distance = np.mat(np.sqrt(np.sum(dis_square, 1))) # 列向量:当前节点到其余所有节点(包括自己)的欧式距离
# 欧式距离和流距离的偏差行向量,即弹簧形变量x
diff_distance = np.mat(_c[_i, :] - euclidean_distance.T)
# 防止除以该向量中的0元素出现Nan
modifier_euclidean_distance = np.ones((_c.shape[0], 1)) - np.sign(euclidean_distance) + euclidean_distance
# 受力方向:单位向量 当前节点距其余节点的距离向量/|当前节点距其余节点的距离向量|
direction = distance_offset_mat / modifier_euclidean_distance
##### 距离差受力矢量 F = k*x*direction #####
force = np.multiply(diff_distance.T, direction) * k1
##### 位置调整 #####
sum_force = np.sum(force, 0)
total_pos_mat[_i] += np.divide(sum_force, _c.shape[0]).astype(np.float64)
##### 误差项 #####
node_i_error = np.sum(np.multiply(diff_distance, diff_distance))
error_list.append(node_i_error)
sum_distance_error = sum(error_list)
print sum_distance_error
if abs(last_distance_sum_error - sum_distance_error) <= threshold1:
small_count1 += 1
if small_count1 >= max_count:
break
t1 += 1
last_distance_sum_error = sum_distance_error
error_list = []
Step2 按照每个节点失真程度大小受力(微调)
# step2 按照每个节点失真程度大小受力
t2 = 0
# 连续小幅度变化的次数
small_count2 = 0
# 失真度误差
distortion_list = []
# 上一次迭代的总失真度误差
last_sum_distortion_error = float("inf")
while True:
print '失真度受力第%d次迭代' % t2
for _i in range(_c.shape[0]):
node_i_pos_mat = total_pos_mat[_i]
repeat_node_i_pos_mat = np.matlib.repmat(node_i_pos_mat, _c.shape[0], 1) # 重复len(_c)*1次
##### 节点至其它节点的欧式距离euclidean_distance #####
distance_offset_mat = repeat_node_i_pos_mat - total_pos_mat # 当前节点i与其它节点实际位置差距矩阵
dis_square = np.multiply(distance_offset_mat, distance_offset_mat) # 对应元素相乘
euclidean_distance = np.mat(np.sqrt(np.sum(dis_square, 1))) # 列向量:当前节点到其余所有节点(包括自己)的欧式距离
# 防止除以该向量中的0元素出现Nan
modifier_euclidean_distance = np.ones((_c.shape[0], 1)) - np.sign(euclidean_distance) + euclidean_distance
# 失真度=max(_c[_i] / modifier_euclidean_distance, modifier_euclidean_distance / _c[_i])
# 这里为方便,只采用 失真度=_c[_i] / modifier_euclidean_distance
distortion_ratio = _c[_i, :] / modifier_euclidean_distance.T
modifier_distortion = np.ones((1, _c.shape[0])) - np.sign(distortion_ratio) + distortion_ratio - 1
# 受力方向:单位向量 当前节点距其余节点的距离向量/\当前节点距其余节点的距离向量\
direction = distance_offset_mat / modifier_euclidean_distance
##### 失真度受力矢量 #####
force = np.multiply(modifier_distortion.T, direction) * k2
##### 位置调整 #####
sum_force = np.sum(force, 0)
total_pos_mat[_i] += np.divide(sum_force, _c.shape[0]).astype(np.float64)
##### 误差项 #####
node_i_error = np.sum(np.multiply((distortion_ratio - 1), (distortion_ratio - 1)))
distortion_list.append(node_i_error)
sum_distortion_error = sum(distortion_list)
print sum_distortion_error
if abs(last_sum_distortion_error - sum_distortion_error) <= threshold2:
small_count2 += 1
if small_count2 >= max_count:
break
t2 += 1
last_sum_distortion_error = sum_distortion_error
distortion_list = []
return total_pos_mat
测试样例
思路
随机生成n个dim维欧式空间中的点,计算各点间的欧式距离得到n*n维欧式距离矩阵,以此当作c矩阵输入到弹簧算法中,得到n个dim维嵌入向量。若dim>2,则降维到2维,可视化前后两种点坐标表示,比较
程序
n, dim = 10, 2
# n各点初始位置(2维)
initial_pos = np.random.rand(n, dim) # [0, 1)
plt.subplot(121)
plt.title('original position')
plt.plot(initial_pos[:, 0], initial_pos[:, 1], 'ro')
for i, (_x, _y) in enumerate(zip(initial_pos[:, 0], initial_pos[:, 1])):
plt.text(_x, _y, i)
# 各节点间的欧式距离
edistance_list = []
for i in range(n):
distance_offset_vec = initial_pos[i] - initial_pos # 当前节点i与所有节点(包括自己)欧式位置差距
dis_square = np.multiply(distance_offset_vec, distance_offset_vec) # 对应元素相乘
edistance = np.sqrt(np.sum(dis_square, 1)).T.tolist() # 行向量
edistance_list.append(edistance)
edistance_matrix = np.mat(edistance_list)
# 嵌入(此处采用迭代500次作为终止条件,也可换成采用阈值,但是需要手动调整大小)
embed_vec = spring_algorithm(edistance_matrix, 2)
plt.subplot(122)
plt.title('embedding position')
plt.plot(embed_vec[:, 0], embed_vec[:, 1], 'bo')
for i, (_x, _y) in enumerate(zip(embed_vec[:, 0], embed_vec[:, 1])):
plt.text(_x, _y, i)
plt.show()
结果
结果如下:
可以看到原始图和嵌入图非常相似,这证明弹簧算法的正确性。
注意:两幅图之间可能存在平移、旋转、对称线性变换关系,但是这属于正常现象。(要想完全吻合,需要加入摩擦系数)
补充:Deepwalk方式嵌入
请参考Word2Vec与流网络