GTM: Google Trigram Model for Relatedness

Word-Word Relatedness  Text-Text Relatedness   Document-Document Relatedness  Document-Corpus Relatedness   Corpus Relatedness


GTM uses Google n-grams [1] to compute word relatedness of representative words from each text to ultimately compute text-pair relatedness as described in [2]. In word- pair relatedness, the frequencies of the Google trigrams that start and end with the given pair and the unigram frequencies of the pair are taken into account. The text relatedness method first separates shared words between texts being compared before computing the word relatedness. The count of the separated words and the relatedness scores of representative words between the texts are then normalized using the texts' lengths.

[LAST UPDATE: Feb 24, 2015] Corpus related services fixed.

[1] Brants, T., Franz, A.: Web 1T 5-gram corpus version 1.1. Technical report, Google Research (2006).

[2] Islam, A., Milios, E.E., Keselj, V.: Text similarity using Google tri-grams. In Kosseim, L., Inkpen, D., eds.: Canadian Conference on AI. Volume 7310 of Lecture Notes in Computer Science., Springer (2012) 312-317.

Home Page  Help Page   Publications  People