Bag-Of-Words To Graph-Vizualization

The utility performs calculation of the Graph-Vizualization procedure and outputs the result into the XML file ("-o"). With the parameter "-docs" the number of documents to be considered is determined (value "-1" means all documents"). The parameter "-clusts" determines the final number of graph nodes (document clusters). The parameter "-rseed" determines the value of random-number-generator seed, where value 0 means nondeterministic value. The parameter "-ctrials" determines the number of different runs/trials of K-Means algorithm in a search for the best solution. The parameter "-ceps" determines convergence epsilon value which influences the stopping criterium for the K-Means algorithm. The parameter "-cutww" determines the percentage of the sum of the weights for the best words in the centroids which appear in the textual output file. The parameter "-mnwfq" determines the minimal document-frequency of the words which are used for the document representation.

usage: Bow2VizGraph.exe
-i:Input-BagOfWords-File (default:'')
-o:Output-Graph-File (default:'VizGraph.Xml')
-docs:Documents (default:-1)
-clusts:Clusters (default:10)
-rseed:RNG-Seed (default:1)
-ctrials:Clustering-Trials (default:1)
-ceps:Convergence-Epsilon (default:10)
-cutww:Cut-Word-Weight-Sum-Percentage (default:0.5)
-cssp:Cluster-Similarity-Sum-Percent-Treshold (default:0.3)
-mnwfq:Minimal-Word-Frequency (default:5)

Bow2VizGraph.exe -i:Reuters21578.Bow -clusts:10 -docs:1000

Calculates K-Means clustering on 1000 documents ("-docs") from Reuters21578 into 10 clusters ("-clusts"), transforms clusters into similarity graph and ouputs the graph in the XML file with the default name "VizGraph.Xml" ("-o").