Bag-Of-Words One-Class SVM Classification

The utility learns One-Class Support Vector Machine (SVM) classifier on the input file ("-i") for classifying documents into one category ("-cat"). It produces model ("-o") in Bag-Of-Words format ".bowmd". Only positive examples are needed for learning. Input vectors can be weighted ("-w") with different weights.

The parameter "-nu" determines the value of nu parameter for SVM (like cost parameter for binary SVM), which must be between 0 and 1. The parameter "-t" selects kernel used for learning:

  1. 0 - linear kernel (much faster than others)
  2. 1 - polynomial kernel k(x, y) = (s (xTy) + c)p
  3. 2 - radial kernel k(x, y) = exp(-gamma ||x - y||2)
  4. 3 - sigmoid kernel k(x, y) = tanh(s xTy + c)
Parameters "-ker_p", "-ker_s", "-ker_c" and "-ker_gamma" determine parameters of nonlinear kernels.

The parameter "-cachesize" determines size of cache (in MB) non-linear SVM can use for caching evaluated kernel functions. The parameter "-time" determines maximal time in seconds allowed for learning classifier. The parameter "-v" determines verbosity during learning. The parameters "-subsize" determines size of sub-problems used at learning algorithm (-1 means classifier decides). The parameters "-ter" determines termination criteria. By increasing it learning gets faster but at the end classifier is less accurate. The parameters "-shrink" determines if support vectors are prediction while learning. Using this option can increases learning time dramatically
The parameter "-t" is used for Reuters21578 dataset. It determines what documents from ModApte split of this dataset are used for learning.

usage: BowTrainOneClassSVM.exe
-i:Input-BagOfWords-FileName (default:'')
-o:Output-One-Class-SVM-Model-FileName (default:'')
-cat:Category-Name (default:'')
-td:Training-Documents (0 - all, 1 - train, 2 - test) (default:0)
-w:Weighting (none, norm, bin, tfidf) (default:'tfidf')
-nu:Nu-Parameter (default:0.1)
-t:SVM-Type: 0-linear, 1-polynomial, 2-radial, 3-sigmoid (default:0)
-ker_p:Degree-of-Polynomail-Kernel (default:3)
-ker_s:Linear-Part-in-Polynomial-Kernel (default:1)
-ker_c:Constant-Part-in-Polynomail-Kernel (default:1)
-ker_gamma:Gamma-for-Radial-Kernel (default:1)
-cachesize:Memory-Cache-Size (default:50)
-time:Upper-Time-Limit (default:-1)
-v:Verbosity (default:0)
-subsize:Subproblem-Size (default:-1)
-ter:Terminating-Condition (default:0.001)
-shrink:Shrinking (default:'T')

Example 1:
BowTrainOneClassSVM.exe -i:Reuters21578.Bow -w:tfidf -cat:corn -nu:0.2 -td:1

The above example learns linear SVM classifier for category corn using documents from Reuters21578 tagged as training documents. Nu parameter is set to 0.2. Model is saved into file reuters21578.BowMd.

Example 2:
BowTrainOneClassSVM.exe -i:Reuters21578.Bow -w:tfidf -cat:corn -nu:0.2 -t:1 -ker_p:2 -td:1

The above example learns SVM classifier with Polynomial kernel for category corn using documents from Reuters21578 tagged as training documents. Degree of polynomial kernel is set with parameter "-ker_p" to 2. Model is saved into file reuters21578.BowMd.