The utility learns semantic space from documents stored in Bag-Of-Words input file ("-i"). Semantic space is than stored as binary ".ssp" file ("-ob") or as text file ("-ot"). Binary output can than be used as input for other utilities and text output shows most common words for each basis vector of semantic space.
The parameter "-t" determines which method will be used for generating
semantic space. The parameter "-dims" determines the dimensionality of
semantic space. The parameter "-reorto" is specific to LSI and deteremines
what kind of reolrtogonalization will be performed: "none" is the fastest
but also least numerical stable, "selective" is in the middle and "full" is
the slowest but most stable. When number of documents in input file is small
use "full", otherwise "selective" should work fine. Use "none" only when
time is really critical and both "full" and "selective" are not fast
enough.
usage: Bow2SemSpace.exe
-i:Input-BagOfWords-File (default:'')
-ob:Output-SemanticSpace-Binary-File (default:'')
-ot:Output-SemanticSpace-Text-File (default:'')
-t:Semantic-Space-Type (lsi, pca) (default:'lsi')
-dims:Number-Of-Space-Dimensions (default:50)
-reorto:Reortogonalization (none, selective, full) (default:'selective')
Example 1:
Bow2SemSpace.exe -i:fp6-ist.bow -ob:fp6.ssp -ot:fp6.txt -t:lsi -dims:30 -reorto:full
The above example learns semantic space with 30 dimensions from set of documents from
Bag-Of-Words file fp6-ist.bow using LSI with full reortogonalozation.