Archive

Archive for the ‘Clustering’ Category

Hub Miner Development

April 3rd, 2015 Comments off

Hub Miner (https://github.com/datapoet/hubminer) has been significantly improved since its initial release and it now has full OpenML support for networked experiments in classification and a detailed user manual for all common use cases.

There have also been many new method implementations, especially for data filtering, reduction and outlier detection.

I have ambitious implementation plans for future versions.
If you would like to join the project as a contributor, let me know!

While I am still dedicated to the project, I have somewhat less time than before since I joined Google earlier (since January 2015), so I have decided to open up the project for new contributors that can help in making this an awesome machine learning library.

I am also interested in developing Python/R/Julia/C++ implementations of hubness-aware approaches, so feel free to ping me if you would be interested in that as well.

First Hub Miner release

October 18th, 2014 Comments off

This is the announcement for the first release of Hub Miner code.

Hub Miner is the machine learning library that I have been working on during the course of my Ph.D. research. It is written in Java and released as open source on GitHub. This is the first release and updates are already underway, so please be a little patient. The code is well documented, with many comments – but the library is quite large and it is not that easy to navigate without a manual.

Luckily, a full manual should be done by the end of October and will also appear on GitHub along with the code, as well as on this website, under the Hub Miner page.

Hub Miner is a hubness-aware machine learning library and it implements methods for classification, clustering, instance selection, metric learning, stochastic optimization – and more. It handles standard data types and can handle both dense and sparse data types, continuous and discrete and discretized features. There is some basic implemented support for text and image data processing.

Image Hub Explorer is also within Hub Miner source, a GUI for visual hubness inspection in image data.

A powerful experimentation framework under learning.supervised.evaluation.cv.BatchClassifierTester and learning.unsupervised.evaluation.BatchClusteringTester allows for testing the various baselines in challenging conditions.

OpenML support is also under way and should be completed by the end of October, so expect it to appear in the next release.

A Novel Kernel Clustering Algorithm

July 26th, 2014 Comments off

We have a new book chapter coming out now on high-dimensional data clustering in the book on partitional clustering algorithms. It is titled ‘Hubness-Based Clustering of High-Dimensional Data’ and it is an extension of our earlier work where we have shown that it is possible to exploit kNN hubs for effective data clustering in many dimensions.

In our chapter, we have extended the original algorithm to incorporate a ‘kernel trick’ in order to be able to handle non-hyperspherical clusters in the data. This has resulted in the Kernel Global Hubness-proportional K-Means algorithm (Kernel-GHPKM) that our experiments show as highly promising and preferable to standard kernel K-means on some high-dimensional datasets.

The implementation is available in Hub Miner and will be released very soon along with the rest of the library.

Stay tuned for more updates.