Archive

Archive for May, 2013

Image Hub Explorer Demo Video

May 10th, 2013 Comments off

We have completed the initial demo video of the Image Hub Explorer System and it is now available on YouTube:

The demo covers the basic functionality of the system, demonstrating some of the predicted use cases.
The user interface will undergo further changes and improvements and the functions will be extended by new models and learning approaches.

Improving the semantic representations for cross-lingual document retrieval

May 4th, 2013 Comments off

I have had the pleasure of presenting some of our recent results at PAKDD 2013 in Gold Coast, Australia. The conference was great and the location couldn’t have been better, so we were able to catch some sun and walk along the beaches while discussing future collaboration, theory and applications.

Hubs are known to be the centers of influence and are known to arise in textual data. Also, they are known to cause problems by being frequent neighbors (= very similar) to semantically different types of documents. However, it was previously unknown whether this property is language-dependent and how it affects the cross-lingual information retrieval process.

What we have shown by analyzing aligned text corpora can be summarized by the following: Hubs is one language are not necessarily hubs in another language, different documents become influential. However, surprisingly, the percentage of label mismatches in reverse neighbor sets remains more or less unchanged. In other words, the nature of occurrences is preserved over different languages. This comes as a bit of surprise, since hubness is arguably a geometric property arising from the interplay of metrics and data representations. Yet, it seems that more semantics than was previously thought remains hidden there, captured and preserved across different languages.

We have used this observation to show that it was possible to improve the common semantic representation made via the CCA method (canonical correlation analysis) by simply introducing some hubness-aware instance weights. This is certainly not the only way to go about it and probably not the very best one, but it served as a good proof-of-concept.

The entire paper can be found here: The Role of Hubs in Cross-lingual Supervised Document Retrieval

Categories: Application, Data Mining, Hubness, Text Tags: