MikeMirzayan's blog

By MikeMirzayan, history, 4 years ago, In English

hello codeforces

i want to suggest a research

let's call the depth of an english wikipedia page the smallest amount of clicks on links (in the text) to reach this page.

what is (approximately) the mathematical expectation of this value (the page is chosen randomly equiprobably)? I'm waiting for your approaches (with description)!!!

upd: sorry didn't know such services exist, thought it'd be interesting to use some heuristics to estimate needed values. the thread can be closed now thanks for watching subscribe on youtube and o***f***

  • Vote: I like it
  • +30
  • Vote: I do not like it

»
4 years ago, # |
Rev. 2   Vote: I like it 0 Vote: I do not like it

lmao why that page

It may be more interesting to get the average depth to an arbitrary page, or maybe the average distance between two pages. Also, some pages might not have any links, so they may not have a "depth". Wikipedia can be represented as a directed graph

»
4 years ago, # |
  Vote: I like it +9 Vote: I do not like it

This project/ page contains an almost parsed database of wikipedia as a directed graph: https://github.com/jwngr/sdow/blob/master/docs/data-source.md

It is a matter of writing the correct code, and then run your graph algorithms on it (e.g. Dijkstra's algorithm)

Time Estimation(upper bound) There are 10^7 articles. We can probably safely say there isn't more than 100 links on average in all articles (only big articles have that much), so E ~ 10^9

Running Dijkstra would be on order of 10^9. One hour of calculations is probably enough. Of course, you will have to do the coding yourself.

Alternatively if you only want a simulation of it: Just use the page https://www.sixdegreesofwikipedia.com. Provided you have a way to uniformly generate a random article of Wikipedia. You can go through the statistics, but 200 simulations should be enough for precision upto 0.1.

»
4 years ago, # |
  Vote: I like it +8 Vote: I do not like it

So are you looking for the shortest path from the home page or the expected number of random clicks? If it is random clicks, then the answer is approximately $$$\frac n 2$$$ where $$$n$$$ is the total number of Wikipedia pages. This is under the assumption that the given page is a somewhat average Wikipedia page, not extremely popular, neither extremely unpopular. The answer is the same if you start from a random page, not necessarily home page.

If you start from random page and then search the shortest path, it should be pretty short, the idea is similar to this.

Another way to think about this is that you can first go to some very general page in a couple of clicks (any country, for example), the go to culture, subculture, list of subcultures. This way the number of clicks is like at most 10.

»
4 years ago, # |
  Vote: I like it +6 Vote: I do not like it

Lol, I suppose researching this page will bring more pleasure to you, here not everything is obvious

  • »
    »
    4 years ago, # ^ |
      Vote: I like it +8 Vote: I do not like it

    bruh... i thought that would be offensive, tried to chose something neutral but memetic

»
4 years ago, # |
  Vote: I like it 0 Vote: I do not like it

Auto comment: topic has been updated by blyat (previous revision, new revision, compare).

»
4 years ago, # |
  Vote: I like it +22 Vote: I do not like it

Just download the entire Wikipedia as a large xml file, extract all links and run BFS on the reversed link graph, and compute the average of all distances. I remember running the PageRank algorithm on the entire English Wikipedia a few years ago as a university assignment, so it's certainly possible.