Clickstream network datasets

来自集智百科 - 复杂系统|人工智能|复杂科学|复杂网络|自组织
跳到导航 跳到搜索

Sources

The Flickr and Delicious data sets are provided by the PINTS project.

The Bitcoin data set is provided by Kondor et al.

Flickr

Growth of system size

350px

The above figure gives the growth of the total number of unique users, unique resources, and unique tags. The dashed line show the exponential growth model [math]\displaystyle{ T_{day}\sim e^{0.01day} }[/math] for comparison.

Growth of system variance

The left figure shows five daily quantities, including N of unique, active users, N of unique, created tags, N of unique, uploaded resources, N of successive tagging activities (effect pv), and the total time cost of the tagging behavior.

Accelerating growth of information production

700px

1. Effective PV grows faster than UV, satisfying [math]\displaystyle{ PV\sim UV^{\theta \gt 1} }[/math] . It means that the average contribution of users [math]\displaystyle{ AC = PV/UV \sim UV^{\theta - 1 \gt 0} }[/math] increases with system size.

2. Total time TT grows slower than PV and UV, satisfying [math]\displaystyle{ TT \sim PV^{\alpha \lt 1} }[/math] and [math]\displaystyle{ T \sim UV^{\beta \lt 1} }[/math] . It means that both of the average time spent by individual [math]\displaystyle{ ET = TT/UV \sim UV^{\beta -1 \lt 0} }[/math] and the average duration of a single tagging behavior [math]\displaystyle{ AP = TT/PV \sim PV^{\alpha -1}\sim UV^{\theta(\alpha-1)\lt 0} }[/math] decrease with system size.

3. In sum, as the system grows, users speed up the rate of producing and generate more information resources.

Relative growth rate between users, resources, and tags

Resources grows faster than users, as users continuously add new resources (photos) into the system, but tag grows slower than users and resources, this indicates the efficiency of natural language: we only need a limited amount of words to describe limitless objects.

Example clickstream networks in three days

250px 250px 250px

The above figure shows the clickstream networks in 2004-01-01, 2004-06-01, and 2004-12-01. The nodes are tags and the edges are the directed, weighted clickstreams between tags. A directed clickstream is generated between two tags when a user use them successively to annotate the same resource.

Growth of daily network size

Scaling of clickstream networks and the efficiency of information production

700px

The scaling across daily networks implies that large networks are simply the scaled up version of small networks. The increase of nodes (tags), edges (semantic relationships), through-flow (uv) and total flow (uv) does not change the flow structure of the networks. In particular, we find that

[math]\displaystyle{ PV\sim UV^{\theta \gt 1} \rightarrow \theta \sim log(PV)/log(UV) = 1.1 }[/math]

[math]\displaystyle{ \theta }[/math] can be used as an indicator of information production, because it describe the constant rate of information output (pv, or tagging behavior) generated from attention input (uv).

Note that the scaling of the networks are more "clean" than the scaling of the system. It means that the clickstream network model captures the growth dynamics of the system and successfully remove the noise, especially at the early stage of the system development.

Delicious

Growth of system size

350px

The above figure gives the growth of the total number of unique users, unique resources, and unique tags. The dashed line show the exponential growth model [math]\displaystyle{ T_{day}\sim e^{0.0057day} }[/math] for comparison.

Growth of system variance

The left figure shows five daily quantities, including N of unique, active users, N of unique, created tags, N of unique, uploaded resources, N of successive tagging activities (effect pv), and the total time cost of the tagging behavior.

Accelerating growth of information production

700px

1. Effective PV grows faster than UV, satisfying [math]\displaystyle{ PV\sim UV^{\theta \gt 1} }[/math] . It means that the average contribution of users [math]\displaystyle{ AC = PV/UV \sim UV^{\theta - 1 \gt 0} }[/math] increases with system size.

2. Total time TT grows slower than PV and UV, satisfying [math]\displaystyle{ TT \sim PV^{\alpha \lt 1} }[/math] and [math]\displaystyle{ T \sim UV^{\beta \lt 1} }[/math] . It means that both of the average time spent by individual [math]\displaystyle{ ET = TT/UV \sim UV^{\beta -1 \lt 0} }[/math] and the average duration of a single tagging behavior [math]\displaystyle{ AP = TT/PV \sim PV^{\alpha -1}\sim UV^{\theta(\alpha-1)\lt 0} }[/math] decrease with system size.

3. In sum, as the system grows, users speed up the rate of producing and generate more information resources.

Relative growth rate between users, resources, and tags

Resources grows faster than users, as users continuously add new resources (photos) into the system, but tag grows slower than users and resources, this indicates the efficiency of natural language: we only need a limited amount of words to describe limitless objects.

Example clickstream networks in three days

250px 250px 250px

The above figure shows the delicious clickstream networks in 2003-01-01, 2003-06-01, and 2003-12-01. The nodes are tags and the edges are the directed, weighted clickstreams between tags. A directed clickstream is generated between two tags when a user use them successively to annotate the same resource.

The Delicious clickstreams between tags are more "reasonable" to researcher than Flickr clickstreams, this is because in the Delicious system users are tagging the resources in the public domain, such as news and technological blogs, whereas in the Flickr system users are usually tagging their personal photos.

Scaling of clickstream networks and the efficiency of information production

700px

The scaling across daily networks implies that large networks are simply the scaled up version of small networks. The increase of nodes (tags), edges (semantic relationships), through-flow (uv) and total flow (uv) does not change the flow structure of the networks. In particular, we find that

[math]\displaystyle{ PV\sim UV^{\theta \gt 1} \rightarrow \theta \sim log(PV)/log(UV) = 1.1 }[/math]

[math]\displaystyle{ \theta }[/math] can be used as an indicator of information production, because it describe the constant rate of information output (pv, or tagging behavior) generated from attention input (uv).

Note that the scaling of the networks are more "clean" than the scaling of the system. It means that the clickstream network model captures the growth dynamics of the system and successfully remove the noise, especially at the early stage of the system development.

BitCoin

Growth of system size

350px

The above figure gives the growth of the total number of users, transactions, and total money transferred. The dashed line show the exponential growth model [math]\displaystyle{ T_{day}\sim e^{0.007day} }[/math] for comparison.

Example money-flow networks in three days

250px 250px 250px

The above figure shows the money-flow networks in 2009-01-12, 2010-09-12, and 2011-01-13. The nodes are users and the directed, weighted edges show the money transferred between them. The color of nodes indicates their daily balance (green for positive and red for negative) and the size of nodes denotes the absolute value of the balance. It is observed that as the system grows, the transmissions of money gradually have more complex patterns. Although balance is not necessary correlated with degree, we find that the hubs are usually green. In other words, it is more common for one user to collect money for many others than to split his wealth into small parts.

Growth of size variance

Scaling of clickstream networks and the efficiency of information production

700px

The scaling across daily networks implies that large networks are simply the scaled up version of small networks. The increase of nodes (users, UV), edges (transaction relationships, PV), through-flow (historial storage and newly genertaed Bitcoins needed to supply the transaction) and total flow (total money involved in transaction) does not change the flow structure of the networks. In particular, we find that

[math]\displaystyle{ PV\sim UV^{\theta \gt 1} \rightarrow \theta \sim log(PV)/log(UV) = 1.1 }[/math]

[math]\displaystyle{ \theta }[/math] can be used as an indicator of information production, because it describe the constant rate of information output (pv, or tagging behavior) generated from attention input (uv). The mechanism of the Bitcoin system requires that new money can only be generated in the validation of new transactions. In other words, the system rewards the transaction (PV) by money. Therefore, collective attention is turned into information (transaction) at first. After that, it is converted into money. The efficiency of information production, which is related to the wealth generation, is again 1.1, just like that in Flickr and Delicious.


相关wiki