COALS
the correlated occurrence analogue to lexical semantics


This interface allows you to compute pairwise similarity ratings of words or short phrases using the COALS vectors. You are limited to the 500,000 most-common words according to the corpus on which COALS is based.

Read the introduction to COALS to learn more about the vector types that can be used. In short, COALS vectors are the original high-dimensional real-valued vectors. Typically, 14,000 dimensions are used, although you could choose fewer or up to 100,000. COALS-SVD vectors are lower dimensional real-valued vectors resulting from using the singular value decomposition of the full COALS matrix. These can have up to 1,500 dimensions, although you may want just a few. 500-1,000 dimensions usually gives the best performance. Finally, COALS-SVDB vectors are like COALS-SVD vectors, but they have been discretized to binary values.


Vector Type: COALS Use 1,000-100,000 dimensions, 14,000 recommended
COALS-SVD Use 10-1,500 dimensions, 800 recommended
COALS-SVDB Use 10-1,500 dimensions, 500 recommended
Dimensions:
The word pairs should be formatted with two words per line, separated by white space. In place of a word, you can use a short phrase or set of words in curly braces. Their vectors will be averaged before the similarity is computed.
Word Pairs:
Generating the ratings could take a while. We recommend that you enter your email address here so we can notify you when the file is ready:
Email:


Written by Douglas Rohde.