COALS
the correlated occurrence analogue to lexical semantics


This interface allows you to download COALS vectors by specifying a list of words and a vector format. You are limited to the 500,000 most-common words according to the corpus on which COALS is based.

Read the introduction to COALS to learn more about the vector types. In short, COALS vectors are the original high-dimensional real-valued vectors. Typically, 14,000 dimensions are used, although you could choose fewer or up to 100,000. COALS-SVD vectors are lower dimensional real-valued vectors resulting from using the singular value decomposition of the full COALS matrix. These can have up to 1,500 dimensions, although you may want just a few. 500-1,000 dimensions usually gives the best performance. Finally, COALS-SVDB vectors are just like COALS-SVD vectors, but they are discretized to binary values: negative values map to 0 and positive values to 1.


Vector Type: COALS Use 1,000-100,000 dimensions, 14,000 recommended
COALS-SVD Use 10-1,500 dimensions, 800 recommended
COALS-SVDB Use 10-1,500 dimensions, 500 recommended
Dimensions:
File Format: Text Encoding Explanation of the File Formats
Binary Encoding
Words: Generate vectors for the most common words.
Generate vectors for the following list of words:
Generating the vectors could take a while. We recommend that you enter your email address here so we can notify you when the file is ready:
Email:


Written by Douglas Rohde.