COALS
the correlated occurrence analogue to lexical semantics


This interface allows you to find the nearest neighbors (most similar other words) for each word or short phrase that you specify. You are limited to the 500,000 most-common words according to the corpus on which COALS is based. You may want to further limit the size of the candidate set of possible nearest neighbors.

Read the introduction to COALS to learn more about the vector types that can be used. In short, COALS vectors are the original high-dimensional real-valued vectors. Typically, 14,000 dimensions are used, although you could choose fewer or up to 100,000. COALS-SVD vectors are lower dimensional real-valued vectors resulting from using the singular value decomposition of the full COALS matrix. These can have up to 1,500 dimensions, although you may want just a few. 500-1,000 dimensions usually gives the best performance. Finally, COALS-SVDB vectors are like COALS-SVD vectors, but they have been discretized to binary values.


Vector Type: COALS Use 1,000-100,000 dimensions, 14,000 recommended
COALS-SVD Use 10-1,500 dimensions, 800 recommended
COALS-SVDB Use 10-1,500 dimensions, 500 recommended
Dimensions:
This is the number of neighbors returned for each word:
Neighbors:
This is the number of words, taking only the most frequent ones, that will be considered as possible neighbors:
Candidates:
The words should be formatted with one word per line. In place of a word, you can use a short phrase or set of words in curly braces. Their vectors will be averaged before the neighbors are computed.
Words:
Finding the neighbors could take a while. We recommend that you enter your email address here so we can notify you when the file is ready:
Email:


Written by Douglas Rohde.