Bag-Of-Words Logistic Regression Classification

Learns Logistic Regression classifier on a set of training documents. Logistic curve is fitted over the space of documents so it fits their labels best. Both positive and negative examples are needed for learning.

The utility learns Logistic Regression classifier on classifier on the input file ("-i" and "-iw") for classifying documents into one category ("-cat"). It produces model ("-o") in Bag-Of-Words format ".bowmd". Both positive and negative examples are needed for learning.

The parameter "-v" determines verbosity during learning. Parameters "-eps" and "-max_step" determine stopping criteria for learning. The parameter "-t" is used for Reuters21578 dataset. It determines what documents from ModApte split of this dataset are used for learning.

usage: BowTrainLogReg.exe
-i:Input-BagOfWords-FileName (default:'')
-iw:Input-BagOfWordWeights-FileName (default:'')
-o:Output-Logistic-Regresion-Model-FileName (default:'')
-cat:Category-Name (default:'')
-td:Training-Documents (0 - all, 1 - train, 2 - test) (default:0)
-v:Verbosity (default:0)
-t:Treshold (default:0.5)
-eps:Stop-Crateria (default:0.01)
-max_step:Maximal-Number-of-Steps-for-CG (default:100)

Example:
BowTrainLogReg.exe -i:Reuters21578.Bow -iw:Reuters21578.Boww -cat:corn -td:1 -t:0.7

The above example learns linear SVM classifier for category corn using documents from Reuters21578 tagged as training documents. Documents are classified into class when value of logistic curve is higher than 0.7 on that document. Model is saved into file reuters21578.BowMd.