Bag-Of-Words To Semantic-Space

The utility learns semantic space from documents stored in Bag-Of-Words input file ("-i"). Semantic space is than stored as binary ".ssp" file ("-ob") or as text file ("-ot"). Binary output can than be used as input for other utilities and text output shows most common words for each basis vector of semantic space.

The parameter "-t" determines which method will be used for generating semantic space. The parameter "-dims" determines the dimensionality of semantic space. The parameter "-reorto" is specific to LSI and deteremines what kind of reolrtogonalization will be performed: "none" is the fastest but also least numerical stable, "selective" is in the middle and "full" is the slowest but most stable. When number of documents in input file is small use "full", otherwise "selective" should work fine. Use "none" only when time is really critical and both "full" and "selective" are not fast enough.

usage: Bow2SemSpace.exe
-i:Input-BagOfWords-File (default:'')
-ob:Output-SemanticSpace-Binary-File (default:'')
-ot:Output-SemanticSpace-Text-File (default:'')
-t:Semantic-Space-Type (lsi, pca) (default:'lsi')
-dims:Number-Of-Space-Dimensions (default:50)
-reorto:Reortogonalization (none, selective, full) (default:'selective')

Example 1:
Bow2SemSpace.exe -i:fp6-ist.bow -ob:fp6.ssp -ot:fp6.txt -t:lsi -dims:30 -reorto:full

The above example learns semantic space with 30 dimensions from set of documents from Bag-Of-Words file fp6-ist.bow using LSI with full reortogonalozation.