New publications

October 12th, 2014

Two of our new papers have recently been accepted.

The paper titled Boosting for Vote Learning in High-dimensional kNN Classification has been accepted for presentation at the International Conference on Data Mining (ICDM) workshop on High-dimensional Data Analysis. In the paper, the possibility of using boosting for vote learning in high-dimensional data is examined, since it has been determined that hubness-aware k-nearest neighbor classifiers permit boosting in the classical sense. Standard kNN baselines are known to be robust to training data sub-sampling and the instance sampling and instance re-weighting approaches to boosting do not typically work on kNN, which can be boosted by feature sub-sampling instead. In case of hubness-aware classifiers, it is possible to use the re-weighting type of boosting without greatly increasing the computational complexity (as the kNN graph only needs to be calculated once on the training data for the neighbor occurrence model). We have extended the basic neighbor occurrence models by introducing instance weights and weighted neighbor occurrences, with trivial changes to the hubness-aware voting frameworks. The results look promising, though we have only tried the Adaboost.M2 boosting approach so far – and other branch programs are less prone to over-fit and more robust to noise… So, there is more work to be done here.

Speaking of noise, our paper on Hubness-aware kNN Classification of High-dimensional Data in Presence of Label Noise has just been accepted for publication at Neurocomputing Special Issue on Learning from Label Noise. It is an in-depth study of the impact of data hubness and the curse of dimensionality on classification performance and the inherent robustness of hubness-aware approaches in particular. Additionally, we have introduced a novel concept of hubness-proportional random label noise as a way to test for worst-case scenarios. To show that this noise model is realistic, we have demonstrated an adversarial label-flip attack based on the estimated TF-IDF message weights that were inversely correlated with point-wise hubness in SMS spam data under standard TF-IDF normalization. We hope to do more work on hubness-aware learning under label noise soon.

Comments are closed.