We investigate the impact of gradual edits on the re-positioning and organization of the factual information in Wikipedia articles. Literature shows that in a collaborative system, a set of contributors are responsible for seeking, perceiving, and organizing the information. Based on our analysis, we show that in a Wikipedia article, the crowd is capable of placing the factual information to its correct position, eventually reducing the knowledge gaps. We also show that the majority of information rearrangement occurs in the initial stages of the article development and gradually decreases in the later stages.

Methodology

For a Wikipedia article, we created the list of factoids for each of its revisions. Let’s say there are $n_i$ revisions in article $a_i$, then for each revision $R_j$, where $j ∈ {1, 2, 3, 4, …,n_i }$, we create a list of factoids ordered in the way they were present in the revision $R_j$. We call this list as $F_j$. We focus our analysis on the placement of the factoids in the Wikipedia articles.

We first obtain the vector embeddings of each factoid using Universal Sentence Encoder then we find the similarity between the embedding of two factoids using the Cosine Similarity, which is defined as a metric used to measure how similar two records are irrespective of their size. It gauges the cosine of the point between two vectors anticipated in a multi-dimensional space.

Dataset

we have taken a random sample of 500 articles out of 5000 most frequently edited English Wikipedia articles collected in April 2020.

Results

In 458 articles out of 500 (91.6%), we observe average sentence similarity positively correlated with the revisions, whereas 42 articles (8.4%) show a negative correlation. The results show that in the majority of the articles, the average semantic similarity increases with revisions.