# Can you explain how Google PageRank works?

This QA first appeared in Data Science Briefings, the DataMiningApps newsletter. Also want to submit your question? Just Tweet us @DataMiningApps. Want to remain anonymous? Then send us a direct message and we’ll keep all your details private. Subscribe now for free if you want to be the first to receive our articles and stay up to data on data science news, or follow us @DataMiningApps.

The ranking module for Google’s search engine is based on the PageRank algorithm introduced by Page et al in 1999 (Page L, Brin S., Motwani R., Winograd T., The PageRank citation ranking: Bringing order to the Web, Proceedings of the 7th International World Wide Web Conference, page 161-172, Brisbane, Australia, 1998). The PageRank algorithm aims to simulate surfing behavior.  The below figure represents a network of web pages linking to each other. Given the figure, what is the probability that a surfer will visit web page A? Assume for now that a surfer only browses web pages by following the links on the web page s/he is currently visiting. The figure shows that web page A has three incoming links. A surfer currently visiting web page B, will visit web page A next with a probability of 20% (= 1/5). This is because web page B has 5 links to other web pages, among which web page A. Analogously, if a surfer is currently on web page C or D, there is a probability of 33.33% and 50% respectively that web page A will be visited next. The probability of visiting a web page is called the PageRank of that web page. To know the PageRank of web page A, we must know the PageRank of web page B, C and D. This is often called collective inference: the ranking of one web page depends on the ranking of other web pages; and a change in the ranking of one web page might affect the ranking of all other web pages.

The PageRank algorithm