GTM: Google Trigram Model for Relatedness

Word-Word Relatedness  Text-Text Relatedness   Document-Document Relatedness  Document-Corpus Relatedness   Corpus Relatedness

Help Page

Word-Word Relatedness

This tool allows you to submit pairs of words and receive their relatedness scores.
Each pair of words should be entered on its own line, and words should be separated by spaces.

Example:

Go Back


Text-Text Relatedness

This tool allows you to submit pairs of texts and receive their relatedness scores. Each text should be entered on its own line. Each two texts that are entered will be compared against each other; if you submit an uneven number of texts, the last entry will be ignored. One assumption for text relatedness is that the difference in length of a text pair would not be much.

Example:

Go Back


Document-Document Relatedness

This tool allows you to submit two documents and compare them against each other. Enter the contents of the first document into the left text box, and the second document into the right text box. This tool allows documents to contain newline characters, allowing you to paste longer documents as they are. We assume that the two documents in any comparison are of roughly the same length.

Example:
Go Back


Document-Corpus Relatedness

This tool allows you to submit a primary document and a number of secondary documents to compare it against. First, paste the contents of the primary document into the text box. Then, ensure that each of your secondary documents is saved into its own .txt file. Zip the documents together, and then upload the resulting .zip file using the 'Choose File' button. You will receive one relatedness score for each secondary document- the primary document will be compared against every secondary one.
The .zip file containing the secondary documents is currently limited at 10MB of size, or 1000 documents.

Example:

Go Back


Corpus Relatedness

This tool allows you to submit a set of documents, and compares every document against every other document in the set. It returns a matrix that displays the relatedness of all the documents. Make sure each of the documents is in its own .txt file, then zip the documents into a single .zip file and upload it using the file uploader.
The .zip file containing the secondary documents is currently limited at 10MB of size, or 1000 documents.

Example:



Go Back


Stop Words

This text box contains a list of stop words. Stop words are words that the text analytics tool will ignore when making comparisons.
By default, this list contains words that are very common in the English language, such as 'the'. Because most documents contain these words, calculating similarity without ignoring them can lead to artificially high relatedness scores. Feel free to edit the default list as you like, entering each stop word on its own line. If the list is empty, no words will be ignored when the comparison is made.

Example:

Go Back


Home Page  Help Page   Publications  People