13 April 2009

Google Hilltop Algorithm

Its all about in measuring the quality of any particular web page. The name given to those quality pages referred as ‘Expert Documents', Hilltop was totally tuned for result accuracy.


The Hilltop algorithm was divided into two phases:

(a) Expert Look up

(b) Target Ranking

First phase defined an expert page as a page that is about a certain topic and has links to many non-affiliated pages on that topic. Two pages are non-affiliated conceptually if they are authored by authors from non-affiliated organizations. In a pre-processing step, a subset of the pages crawled by a search engine are identified as experts. In Given an input query, a lookup is done on the expert-index to find and rank matching expert pages. This phase computes the best expert pages on the query topic as well as associated match information.

Second Phase defined the rankings depend on the authority pages. A page is an authority on the query topic if and only if some of the best experts on the query topic point to it. By combining relevant out-links from many experts on the query topic hilltop algorithm can find the pages that are most highly regarded by the community of pages related to the query topic. This is the basis of the high relevance that our algorithm delivers.

The complete processing of Hilltop done the in the beneath manner:
The hilltop algorithm first computed a list of experts most relevant on the query topic, and then identified the targets through relevant links from these experts. The targets were ranked by the number, quality and relevance of non-affiliated experts that point to them.

Thus, the score of the target page reflected the collective opinion of the best independent experts on the query topic.

+ ve points
:

1 In computing the authority of a target page, only good and independent parents (experts) are taken into account instead of all parents.
2 Combine content analysis and link analysis together.

- ve points:

1 A target page will inherit the same score from its expert parent either the title or the anchor text of the parent page contains the query term. However, the influence of the anchor text should be stronger than the title, supposing the link is far away from the title. So I think the inherited score should weighted by the distance between the out-link to the target page and its qualified key phrases.

2 When calculating the score of a target page, only contents of its expert parents are taken into account, its own content is ignored.

3 Exact matching of query terms will exclude some good results, in case none of its parents' key phrases contain the query.

4 The rule to detect host affiliation is not enough to locate all affiliated hosts(link Spam). And, it doesn't work for identifying textual spam.

5 The first experiment is not enough to show the recall performance of a ranking algorithm.

contributed by, ABHISHEK SEO

No comments:

Post a Comment