Active learning on sparse training sets using binary SVM model

 

The utility ALTrainBinSVM.exe performs active learning loop on the specified input. The input is a set of unlabelled vectors in the form of a sparse trainset (“.sts”) file. Until sufficient number of samples for both classes are acquired, labeling requests are created randomly or by using a smart selection based on the data structure. When enough examples are labeled for the SVM to converge to a meaningful result, the SimpleMargin algorithm is used to calculate the next batch of queries. The size of the batch sets the number of required answers before the main loop is run again. The best results are usually achieved by a batch size of 1 but at the same time this setting requires the most CPU time (which is inversly proportional to the batch size). When the prespecified number of queries is processed, the user returns an “END” literal or the unlabelled pool is depleted the program terminates with an exit code of 0 and writes the resulting SVM model, the union of the labeled and unlabelled pool or both to the specified locations.

 

usage: ALTrainBinSVM.exe

-i:Input-SparseTrainset-Data (default:’’)

-b:Batch-Size (default:1)

-s:Initial-Selection-Mode (default:0) (0:random, 1:data analysis)

-q:Max-Queries (default:-1)

-do:Data-Output-File (default:’’)

-mo:Model-Output-File (default:’’)

 

Example:

 

ALTrainBinSVM.exe –i:input.sts –b:2 –s:1 –q:100 –mo:test.svm

[communication on standard input/output]

> 001 IS-A 43 0.501 1

< 001 1

> 002 IS-A 2231 0.447 -1

< 002

< END