更改

添加3,572字节 、 2020年10月15日 (四) 15:59
创建页面,内容为“ =RQ: what is a good question= ==Zhihu tag tree== 知乎数据抓取 https://github.com/7sDream/zhihu-oauth https://www.zhihu.com/topic/19776749/organize/entire#…”

=RQ: what is a good question=

==Zhihu tag tree==

知乎数据抓取 https://github.com/7sDream/zhihu-oauth

https://www.zhihu.com/topic/19776749/organize/entire#anchor-children-topic

<syntaxhighlight lang="python">
import pandas as pd

# https://github.com/Lynxmac/zhihu_topic_tree/
with open('/Users/datalab/bigdata/zhihu_topic_tree.txt', 'r', encoding='gb18030') as f:
lines = f.readlines()

df_list = []
for index, line in enumerate(lines):
a = line.rstrip().split('─')
hierarchy = len(a[0])
if index > 312482:
hierarchy -= 1
sign = a[0][-1]
b = a[-1].split('_', maxsplit = 2)
ids = b[0]
name = '_'.join(b[1:])
df_list.append([index, hierarchy, sign, ids, name, line])

df = pd.DataFrame(df_list, columns = ['loc', 'hierarchy', 'sign', 'id', 'name', 'line'])

# clean the hierarchy variable
new_hierarchy = []
for i in df.hierarchy:
if i % 3 ==1:
new_hierarchy.append(i)
elif i%3 ==2:
new_hierarchy.append(i-1)
elif i%3 ==0:
new_hierarchy.append(i-2)

df['new_hierarchy'] = new_hierarchy
df['good_hierarchy'] = [(i-1)/3 + 1 for i in new_hierarchy]

# add missing id for level 1 topics
id_list = [(29855, 19778298, '「形而上」话题'),
(178555,19560891,'产业'),
(190122, 19618774, '学科'),
(223661, 19778287, '实体'),
(312482, 19778317,'生活、艺术、文化与活动')]

for i in id_list:
df['id'][i[0]] = i[1]
df['name'][i[0]] = i[2]

# delete wrong ids
error_id_index = []
for k, i in enumerate(df.id):
try:
j = int(i)
except:
error_id_index.append(k)
len(error_id_index)

df = df.drop(error_id_index)
df['id'] = [int(i) for i in df.id]

# construct network
# it takes around 3 hours
# search for the nearest high level neighbor and link together

from flownetwork.flownetwork import flushPrint

net = []
for i in df.index:
if i%100 ==0:
flushPrint(i)
ids = df['id'][i]
hierarchy = df['good_hierarchy'][i]
loc = df['loc'][i]
if hierarchy == 1:
net.append(('root', ids))
else:
upper_hierarchy = hierarchy - 1
upper_nodes = df[df['good_hierarchy'] == upper_hierarchy]
upper_node_loc = [j for j in upper_nodes['loc'] if (loc - j) > 0][-1]
upper_node_id = df['id'][df['loc'] == upper_node_loc]
net.append(( int(upper_node_id), ids))
</syntaxhighlight>

==StackOverflow tag network==

StackOverflow using tags to organize raised questions, see the tags here: https://stackoverflow.com/tags

Given a tag, such as javascript, you can see the tagged questions: https://stackoverflow.com/questions/tagged/javascript

Note that, stackoverflow also demonstrates the related tags for a tag. For example, the javascript tag is related to

'''Related Tags'''

* jquery × 518122
* html × 317264
* css × 146448
* angularjs × 116430
* php × 111251
* node.js × 92095
* ajax × 88493
* json × 58117
* html5 × 51808
* reactjs × 51308
* arrays × 49362
* asp.net × 31550
* regex × 28362
* twitter-bootstrap × 24516
* angular × 24174
* c# × 23339
* forms × 22346
* google-chrome × 21292
* d3.js × 21232
* dom × 19442
* google-maps × 18658
* typescript × 18244
* java × 17724
* canvas × 17054
* express × 16103

==Quora==

https://www.quora.com/topic/Computer-Science

https://github.com/tapaswenipathak/pyQTopic/blob/master/qtopic/pyqtopics.py

https://github.com/csu/quora-api

=References=


<references/>
[[Category:所有人]]
[[Category:计算传播学]]
[[User:Wangchj04]]


[[category:旧词条迁移]]