Showing posts with label Google Algorithms. Show all posts
Showing posts with label Google Algorithms. Show all posts

31 March 2010

Google Pagerank Algorithm Secrets And How It Works

PageRank is defined as a numeric value used by Google to show the value a page has on the internet. According to Google, all pages that have links to other pages are a benefit to that other page. If visitors go to a certain site from one particular page, then that page is ranked very important. When users visit a site from another page, that page votes for that page being visited. Google determines the value of a vote based on the importance of that web page to the search engine. To determine this importance, Google have come up with a system that enables them to calculate the importance of the page. The level of PageRank will determine if and where that page appears in search results.




Calculating PageRank
Google uses an equation to calculate PageRank. This equation is as below:

PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn)


In that equation, the values t1 and tn are pages that are linked to the page whose rank is being calculated. The value C represents all outbound links on that page while‘d’ is the damping factor. This is normally calculated as a standard 0.85. Google also uses the term ‘share’ to calculate the final PageRank of a page. This is done by dividing the PageRank value by the outbound links that are on that page. This is the ‘vote’ that other pages have helped cast on the page to increase the rank. Therefore, Google shares out the number of ‘votes’ on that page for all pages that have ‘voted’ for that page so that they all get a point towards their PageRank. Therefore, the less outbound links there are on a page the higher that page will rank and that ‘vote’ will be valued more when linked to other pages.
According to many experts, Google uses an algorithm to calculate PageRank. This algorithm is not known by people other than Google. Like all algorithms, moving from level 1 to 2 will be much easier than moving from level 2 to 3. When calculating PageRank, previous positions do not count in that all other information is discarded and PageRank calculated afresh. If two pages without any other links ‘vote’ for each other, their PageRank values will be calculated based on each others PageRanks. Therefore, one of them will have inaccurate information as it was calculated based on the other’s PageRank, which had not been calculated yet. This will consequently lead to the second one having inaccurate information because it will also be calculated based on inaccurate data.
In order to get to a more accurate PageRank value, Google uses the newly calculated PageRank values to calculate slightly more accurate PageRank values. After repeating the calculations about fifty times, the difference between the results starts becoming negligible and Google gets its values at that point. However, based on this algorithm, it is almost impossible to have 100% accuracy rates for PageRank values. This is because the initial information used was not accurate and subsequent calculations were based on that information. This long process is the reason why PageRank updates take as long as they do.
contributed by, ABHISHEK SEO

13 April 2009

Google Hilltop Algorithm

Its all about in measuring the quality of any particular web page. The name given to those quality pages referred as ‘Expert Documents', Hilltop was totally tuned for result accuracy.


The Hilltop algorithm was divided into two phases:

(a) Expert Look up

(b) Target Ranking

First phase defined an expert page as a page that is about a certain topic and has links to many non-affiliated pages on that topic. Two pages are non-affiliated conceptually if they are authored by authors from non-affiliated organizations. In a pre-processing step, a subset of the pages crawled by a search engine are identified as experts. In Given an input query, a lookup is done on the expert-index to find and rank matching expert pages. This phase computes the best expert pages on the query topic as well as associated match information.

Second Phase defined the rankings depend on the authority pages. A page is an authority on the query topic if and only if some of the best experts on the query topic point to it. By combining relevant out-links from many experts on the query topic hilltop algorithm can find the pages that are most highly regarded by the community of pages related to the query topic. This is the basis of the high relevance that our algorithm delivers.

The complete processing of Hilltop done the in the beneath manner:
The hilltop algorithm first computed a list of experts most relevant on the query topic, and then identified the targets through relevant links from these experts. The targets were ranked by the number, quality and relevance of non-affiliated experts that point to them.

Thus, the score of the target page reflected the collective opinion of the best independent experts on the query topic.

+ ve points
:

1 In computing the authority of a target page, only good and independent parents (experts) are taken into account instead of all parents.
2 Combine content analysis and link analysis together.

- ve points:

1 A target page will inherit the same score from its expert parent either the title or the anchor text of the parent page contains the query term. However, the influence of the anchor text should be stronger than the title, supposing the link is far away from the title. So I think the inherited score should weighted by the distance between the out-link to the target page and its qualified key phrases.

2 When calculating the score of a target page, only contents of its expert parents are taken into account, its own content is ignored.

3 Exact matching of query terms will exclude some good results, in case none of its parents' key phrases contain the query.

4 The rule to detect host affiliation is not enough to locate all affiliated hosts(link Spam). And, it doesn't work for identifying textual spam.

5 The first experiment is not enough to show the recall performance of a ranking algorithm.

contributed by, ABHISHEK SEO