WIKImage: Correlating Images and Words

May 9th, 2013

Textual and image data are both abundant and highly important in various machine learning and information retrieval systems, as they encode a lot of potentially useful information. Extracting this information and using it in a proper way has been a topic of research for many years and many highly sophisticated approaches are available for many individual analytic tasks.

In 2010, we at the Artificial Intelligence Laboratory have started a bilateral project with the Department of Mathematics and Informatics in Novi Sad (Serbia) named: “Correlating Images and Words: Enhancing Image Analysis through Machine Learning and Semantic Web Technologies” (Project Code: BI-SR/10-11-029). The people currently participating in the collaboration are (in no particular order): Nenad Tomašev, Jan Rupnik, Doni Pracner, Miloš Radovanović, Boštjan Pajntar, Dunja Mladenić and Mirjana Ivanović. Raluca Brehar from the Cluj Technical Institute has recently also started working with us on image feature extraction.

The idea was to use our mutual experience in order to make steps towards a better understanding of correlations between images and written text. The final goal was to allow for effective cross-representational search, where one would be able to easily obtain the desired image corresponding to a particular textual query or a document corresponding to the scene captured in a particular image query.

Most image search tools that allow for textual queries are based on searching the paragraphs and captions that surround the images in their webpage context. However, we wished to build a different kind of system, where you would actually be allowed to textually search over images that have no captions or annotations. In order to do this, we have to correlate image features (visual words) with actual textual words and their meanings. This can be done by creating a common semantic representation and the corresponding projections for both textual and image representations. One possible way to make the common semantic space is via KCCA, the Kernel Canonical Correlation Analysis.

The mappings can be learned from an aligned dataset of images and their annotations. So, our first step was to create a dataset that could be used to infer the correlation model. This has resulted in WIKImage, a dataset of publicly available Wikipedia images and their annotations and captions. The images have been crawled by Doni Pracner and the students at the Department of Informatics have labeled the data through a specially designed system that he designed. Each image can have several different labels.

We have mostly been working with quantized SIFT (scale invariant feature transform) features, but have recently introduced SURF and ORB features as well. Textual data is available as appearing in Wikipedia, though we are also working on various processing filters for forming the bag of words (BOW) textual representation. We will soon add the common semantic representations that allow for the desired type of cross-representational retrieval.

Of course, correlating textual image annotations and image features or objects is not a trivial task, as the information is encoded in entirely different ways. Additionally, many problems with interpretation can arise as outlined by the following example:

In the first part of the project, we have mostly done research towards the influence of some general aspects of high dimensionality on the analytic process. The list of publications formally associated with the formal part of the bilateral (2010-2011) is given here:

Nenad Tomašev, Milos Radovanović, Dunja Mladenić and Mirjana Ivanović: The role of hubness in clustering high-dimensional data, PAKDD, Shenzen, 2011 : Best Research Paper Runner-up Award
Nenad Tomašev, Milos Radovanović, Dunja Mladenić and Mirjana Ivanović: Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification, MLDM, New York, 2011
Nenad Tomašev, Milos Radovanović, Dunja Mladenić, Mirjana Ivanović: A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian kNN, CIKM, Glasgow, 2011
Doni Pracner, Nenad Tomašev, Milos Radovanović, Dunja Mladenić, Mirjana Ivanović: WIKImage: Correlated Image and Text Datasets. SiKDD, Ljubljana, 2011

We have published more papers in the meantime on the topic of high-dimensional data processing and the issue of emerging hubs and the skewed distribution of influence. Some of them can be seen in my bibliography.

We are currently working on the main part of the project, the actual correlation between the image and textual features and will upload new results and content if and when it becomes available. If You would like to participate in some way or use the data for Your own research, feel free to contact us.

Comments are closed.