Brilliant Wiki Collaboration Graph Part 2
In March, our post Brilliant Wiki Collboration Graph studied properties of the Brilliant wiki collaboration network and its similarity to other networks, such as social networks and citation networks. Studying properties of these networks gives insight into many important phenomena such as group dynamics. The social networks of Facebook, Twitter, and LinkedIn (to name a few) are extensively studied as they continuously evolve. Now that the Brilliant wiki project has grown significantly, we now take a deeper dive into properties of the Wiki collaboration graph.
Contents
The Evolving Brilliant Wiki Graph
As described in Brilliant Wiki Collboration Graph, we consider the graph with vertices representing wiki authors and edges between pairs of authors who have collaborated on a Brilliant wiki page. We have removed any authors who have no collaborators, as well as all staff members. Here are a few wiki collaboration graph statistics for March 2015 and August 2015:
\[ \begin{array}{rcc} & \mbox{March 2015} & \mbox{ August 2015} \\ \mbox{Number of pages: } & 535 & 856 \\ \mbox{Number of vertices/contributors: } & 328 & 580 \\ \mbox{Number of edges/collaborations: } & 1480 & 3538 \\ \end{array}\]
The interactive Wiki Collaboration Graph is posted here. Hover over any vertex to highlight the neighbors in the graph.
We can see that the graph has evolved from the collaboration graph of March 2015, with more clear clusters of collaborators (recall that the edges are bundled because some authors have worked on mostly algebra pages, while other authors have worked on mostly geometry pages, and similar subjects are grouped together).
The degree of a vertex \(v\) in a graph is the number of edges including vertex \(v\). In the wiki collaboration graph, the degree of a vertex is the number of collaborators she or he has. Here is the distribution of degrees in the wiki collaboration graph:
The interactive version of this histogram is here.
The contributors with the greatest number of collaborators are:
\[ \begin{array}{ccccc} \mbox{Sravanth: 62 } & & \mbox{Anuj: 62 } & & \mbox{Jubayer: 57 } \\ \mbox{Azhaghu: 51} & & \mbox{Hemang: 50 } & & \mbox{Omkar: 49} \\ \mbox{Mursalin: 48 } & & \mbox{Sharky: 47} & & \mbox{Agnishom: 46} \\ \mbox{Niranjan: 45} & & \mbox{Yash: 45} & & \mbox{Pranjal: 41} \\ \mbox{Trevor: 41} & & \mbox{Ashley: 36 } & & \mbox{Prasun: 33} \\ \mbox{Julian: 32 } & & \mbox{Gautam: 31} & & \mbox{Pranshu: 30 } \\ \mbox{Kishlaya: 29} & & \mbox{Krishna: 29 } & & \mbox{Chung: 25} \\ \mbox{Archit: 24} & & \mbox{Hamza: 23} & & \mbox{Christopher: 23}\\ \mbox{Ishan: 23} & & \mbox{Daniel: 21} & & \mbox{A: 28} \\ \mbox{Ritu: 21} & & \mbox{ Sandeep: 21} & & \\ \end{array}\]
Distribution of contributions
Let’s consider the histogram of contributors and compare this to other types of distributions in networks. For example, one question that has been extensively studied is the following. Consider the internet graph where each vertex is a webpage and there is a directed edge from webpage A to webpage B if webpage A includes a link to webpage B. What is the distribution of in-degrees in this graph?
One observation is that it is easy to think of examples of webpages with very large in-degree, such as wikipedia.org. There may also be webpages with in-degree zero. What kind of distribution would you expect? As the in-degree \(k\) increases, what do you expect happens to the number of pages with in-degree equal to \(k\) ?
To answer this question, researchers studied the distribution of web links over many different web snapshots throughout the history of the internet and recurrently found that the fraction of web pages with in-degree equal to \(k\) is roughly proportional to
\[ \mbox{ ( Number of websites with in-degree } k ) \propto \frac{1}{k^{\alpha}}, \]
where \(\alpha\) is a number slightly larger than \(2\). Then the number of websites with in-degree \(k\) decreases as an inverse power of \(k\), following what is known as a power law.
How does this compare to other well-known distributions, such as the normal distribution? If we think about how the function \(f(k) = \frac{1}{k^\alpha}\) behaves as \(k\) gets large, we see that this function decreases much more slowly than an exponential function (such as the normal distribution). This is also known as the long-tail phenomenon, and in the context of in-degrees in the web graph, this says that the number of pages with large values of \(k\) decreases more slowly than exponential functions. Is this is consistent with your intuition about popular webpages? Think of the pages you visit regularly and think about how large you believe their in-degrees are. In fact, there is a good chance many of you thought of the same pages, since there are a reasonable number of web pages with very large in-degree.
Becoming a Power Law Detective
The next time you see data that quantifies popularity in some way, consider asking if the distribution behaves in proportion to a power law \(\frac{1}{k^\alpha}\) for some constant \(\alpha\). One way to do this is to let \(f(k)\) be the fraction of the total with value \(k\). If \( f(k)\) can be approximated by a power law, then
\[\begin{align} f(k) & = \frac{c}{k^{\alpha} } \qquad \mbox{ for some constants } c \mbox{ and } \alpha \\ \log f(k) &= \log (c) - \alpha \log (k). \\ \end{align}\]
So if \(f(k)\) follows a power law distribution, then plotting \(\log f(k)\) as a function of \(\log(k)\) gives a straight line with slope \(-\alpha\) and \(y\)-intercept \(\log(c)\). This is also called a log-log plot, and if the result is roughly a straight line, we can use the slope and \(y\)-intercept to read off the constants involved in the power-law distribution.
Modeling power law distributions
What is the intuition behind power law distributions? Since power laws arise in situations which measure popularity, let’s think about a model of how popularity is gained over time. First, suppose we have a biased coin with probability of heads \(p\) and probability of tails \(1-p\) for some \(0 < p < 1\). Now, for the example of the web graph, consider the following model of web page creation:
- Web pages are created over time in order \(w_1, w_2, w_3, \ldots \)
- When page \(w_i\) is created, the biased coin is flipped. If the result is heads, then a link is created to an earlier webpage uniformly at random. If the result is tails, then \(w_i\) chooses an earlier webpage \(w_j\) with probability proportional to the popularity of webpage \(w_j\) (measured by the current number of in-links to \(w_j\) ) and creates a link to \(w_j\). This creates a single link out of page \(w_i\) and the process is repeated multiple times to generate independent links.
The parameters of the model are the probability \(p\) of the coin flip and the number of links out of each newly created web page. This process also models what is referred to as the “rich get richer” phenomenon, since webpages that are already popular (have many in-links) become even more popular as new pages are created. This is a simplified model, but it can be shown that if we create many pages using this model, the fraction of pages with in-degree \(k\) will indeed be distributed approximately according to a power law \(f(k) = \frac{c}{k^\alpha}\), where \(c\) and \(\alpha\) depend on the model parameters of the coin flip probability and the number of out-links created.
Thank You!
We have seen many great contributions from the wiki writing project from all over the world and would like to thank all of the wiki writers for helping to make Brilliant wiki into the very best math resource online. Thank you on behalf of the many people who are benefiting from your experience and knowledge!!