Exploring the Algorithm Behind Google's PageRank

Categories: ScienceTechnology

Introduction

In the vast expanse of the internet, finding the right information can feel like searching for a needle in a haystack. This is where Google's PageRank algorithm comes into play, revolutionizing how we find information online. Developed by Google founders Larry Page and Sergey Brin, PageRank assigns a score to web pages, determining their importance and relevance in search results. This essay delves into the workings of PageRank, shedding light on the mathematics and logic that power one of the most used search engines globally.

PageRank views the internet as a network of web pages connected by hyperlinks. Each hyperlink from one page to another is an endorsement of content, contributing to the receiving page's importance. The algorithm uses a model called a Hyperlink Graph to represent these connections, where the significance of a page increases with the number of inlinks (hyperlinks pointing to it). However, not all inlinks are equal. A link from a highly important page weighs more than one from a lesser-known site.

Get quality help now
Prof. Finch
Prof. Finch
checked Verified writer

Proficient in: Science

star star star star 4.7 (346)

“ This writer never make an mistake for me always deliver long before due date. Am telling you man this writer is absolutely the best. ”

avatar avatar avatar
+84 relevant experts are online
Hire writer

Conversely, a page with many outlinks dilutes the value of each link it provides.

To mathematically model and calculate PageRank, Hyperlink Graphs are transformed into Hyperlink Matrices. These matrices organize web pages and their link relationships in a format suitable for computation. Normalizing these matrices allows the algorithm to quantify the importance of each page relative to the entire web.

Calculating PageRank: Beyond Simple Inlink Counting

Hyperlink - a reference to data which users can follow by clicking and points to either a whole document or to a specific element within a document.

Get to Know The Price Estimate For Your Paper
Topic
Number of pages
Email Invalid email

By clicking “Check Writers’ Offers”, you agree to our terms of service and privacy policy. We’ll occasionally send you promo and account related email

"You must agree to out terms of services and privacy policy"
Write my paper

You won’t be charged yet!

Hyperlink graph of a system - a system of a web page with hyperlinks. Matrix arrangement of number into rows and columns. Matrix operation addition of two matrices by the corresponding entries. Multiplication of binary operations from two matrices. Determinant the quality obtained by the addition of products of the element of a square matrix according to a given rule. Upper triangular matrix -a square matrix in which all entries below the main diagonal are zero. Elementary Row operation - the operations such as multiplication or division are performed in the original matrix to get the elementary matrix. Parameter - a quantity that influences the output or behavior of a mathematical object but is viewed as being held constant.

Hyperlink pointing from page Pa to Pb is an outlink from page Pa and an inlink to page to Pb. Intuitively, the more inlinks a page has, the more important it is. The more important a page is, the more it should be subscribed to the importance of the pages which it links to. Controversially, a link to a page with a large number of outlinks are not as vulnerable. The higher amounts of outlinks a page has, the less value in the recommendation provided by each of its individual outlinks.

Let the PageRank of page Pa be denoted by π(Pa), the number which measures the quality of page Pa. Also, let the number of outlinks from page Pa be denoted by | Pi | which equals the sum of the entries in the ith row of the hyperlink matrix A. The more outlinks, the less contribution to the importance of Page Pa. If Pa has an inlink from Pb it implies that Pb contributes to the PageRank π(Pa) of Pa.

π(Pa) / | Pi |

If there were to be a five-page web the PageRank equations:

π(P1) = π(P2) / 2 + π(P4) / 1

π(P2) = π(P1) / 1 + π(P3) / 3

π(P3) = π(P5) / 2

π(P4) = π(P2) / 2 + π(P3) / 3 + π(P5) / 2

π(P5) = π(P3) / 3

PageRank row vector satisfying the five equations:

v = [ π(P1), π(P2), π(P3), π(P4), π(P5), ]

Probability vector r(Pa) ≥ 0 for each page Pa:

π(P1) + π(P2) + π(P3) + π(P4) + π(P5) + = 1

Summation notation of each PageRank equation if the set of pages with a hyperlink to Pa by LPa:

π(Pa) = Σ r(Pb) / | Pj | Pj ∈ LPa

According to the knowledge of the more inlinks the more important, web page 4 has the most inlinks of four inlinks, therefore, web page 4 seems the most important.

Hyperlink Matrix

The hyperlink matrix A of W:

The cell in row Pa, column Pb contains a 1 if there is a link from Page Pa to Pb and contains a 0 otherwise. Removing the row and column labels of the table leave us with the hyperlink matrix A of the graph W. Then, to summarise graph W as a single matrix equation, normalise each row of the hyperlink matrix A by the sum of its row so each non-zero will sum to 1.

The five PageRank equations are now equivalent to the following matrix equation:

[ π(P1) π(P2) π(P3) π(P4) π(P5) ]

= [ π(P1) π(P2) π(P3) π(P4) π(P5) ]

The equation can be written more briefly as:

v = v H.

Calculation

However, although a web page may have a lot of inlinks, it does not necessarily mean it is the most important page because the related web pages may have low quality. Therefore the importance of the origin must be considered in order to determine which web page is truly the most important one.

As πi being the importance of a web page, α the parameter and da the number of outlinks of a, the equation of PageRank would be as:

πi (i = 1,2….n) = (1- α) Σ πa / da + α 1/n

j∈N(i)

In the vector form equation:

Π = (1 - α) A Π + α / n I

The five web pages equations will be:

π1 = (1- α) (π2 / 2 + π4 ) + α / 5

π2 = (1- α) (π1+ π3 / 3) + α / 5

π3 = (1- α) π5 / 2 + α / 5

π4 = (1- α) (π2 / 2 + π3 / 3 + π5 / 2) + α / 5

π5 = (1- α) π3 / 3 + α / 5

Substituting the five web page to the vector form equation will be:

The calculation for A is to find the determinant. In this case the determinant for 5 × 5 matrices. The first step is to turn the 5 × 5 matrices into an upper triangular matrix by turning the numbers inside the red triangle 0.

Now, the method called the row operation is applied.

Second, the determinant of the matrix equals the diagonal product of the matrix.

(1/2) (1/3) (1) (1/2) = 1/12 A = 0.08

Next, the web pages in vector form. When α = 0.15 (the parameter of Google):

π1 = (1- α) (π2 / 2 + π4 ) + α / 5 = 4.52

π2 = (1- α) (π1+ π3 / 3) + α / 5 = 4.38

π3 = (1- α) π5 / 2 + α / 5 = 0.28

π4 = (1- α) (π2 / 2 + π3 / 3 + π5 / 2) + α / 5 = 2.54

π5 = (1- α) π3 / 3 + α / 5 = 0.14

Conclusion

The PageRank algorithm is a testament to the power of mathematics in solving real-world problems. By ingeniously applying concepts from graph theory and matrix algebra, Google has been able to sift through the internet's chaos and deliver relevant information to users. This exploration into PageRank not only unveils the mathematical elegance behind search engines but also highlights the importance of algorithmic thinking in the digital age. As we continue to rely on the internet for knowledge, understanding the principles behind PageRank becomes essential for grasping how information is organized and presented in our digital world.

Updated: Feb 16, 2024
Cite this page

Exploring the Algorithm Behind Google's PageRank. (2024, Feb 16). Retrieved from https://studymoose.com/document/exploring-the-algorithm-behind-google-s-pagerank

Live chat  with support 24/7

👋 Hi! I’m your smart assistant Amy!

Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.

get help with your assignment