Research · 2019 → present

Projects

Years of European research projects on language technology, health informatics, and small-language-model deployment.

Active

Currently running · 2

SLM4IE

Mar 2026 — Present

ARIS Postdoctoral

SLM4IE is an ARIS-funded postdoctoral research project focused on developing computationally efficient Small Language Models (SLMs) for zero-shot information extraction in European languages. The project addresses key limitations of current large language models — high computational costs for on-premise deployment, training data gaps for sensitive domains and low-resource languages, and inconsistent outputs in information extraction tasks.

The project develops SLMs optimized for commercial GPU hardware (targeting <1B parameters and <8GB VRAM), creates multilingual benchmark datasets across sensitive domains such as medicine and science, and conducts systematic evaluations against established metrics. All models, datasets, training code, and documentation will be publicly released, when possible.

WEBSITE

Project Homepage

PREPARE

Jun 2023 — Present

Horizon Europe

PREPARE is a Horizon Europe project aimed at improving the lives of people with chronic noncommunicable diseases by developing tools that enable patients and healthcare providers to select optimal therapy strategies. The project transforms rehabilitation through personalized care approaches, combining advances in clinical research, socio-behavioral science, data science, and AI methods to overcome challenges in patient stratification and outcome prediction.

Our contributions include developing text anonymization tools (Anonipy) and medical terminology extraction methods (Medtermex) to support privacy-preserving health data analysis. In addition, we developed an extraction tool to support medical term extraction and mapping to standardized medical vocabularies.

WEBSITE

Project Homepage

CODE

Anonipy Medtermex PREPARE Extraction Tool

Past

Completed · 10

SEMTEH

Mar 2023 — Oct 2023

Slovenian Ministry for Culture

SEMTEH upgraded open-access semantic resources for the Slovenian language by creating a new, manually reviewed version of the Slovenian WordNet semantic lexicon and establishing semantic links to the Digital Dictionary Database (DSB). The project employed deep learning and neural network methods to prepare data for manual review and to connect semantic resources, improving both vocabulary coverage and data reliability of the existing WordNet. The resulting linked data ensures availability of Slovenian semantic resources in major language technology collections such as BabelNet and Wikidata.

European Statistics Awards

Jan 2022 — Dec 2025

Eurostat Contract

The European Statistics Awards is a Eurostat initiative promoting statistical literacy and data-driven thinking among students and young researchers across Europe. The project developed and maintains a competition platform that hosts annual challenges where participants analyze official European statistics to address real-world questions.

Our contribution involves developing and maintaining the web platform that manages competition organization, team registrations, submissions, and evaluation workflows.

WEBSITE

Competition Platform

Humane AI Network

Sept 2020 — Aug 2024

Horizon 2020

The Humane AI Network is a European initiative advancing human-centered artificial intelligence that meaningfully integrates with human contexts and capabilities. The network coordinates research across multiple work packages covering human-in-the-loop systems, multimodal perception, ethics, and societal applications, while funding over 120 collaborative micro-projects. The initiative bridges AI research with societal impact, examining how artificial intelligence affects diverse communities and addressing ethical dimensions of AI deployment.

WEBSITE

Project Homepage

CURLICAT

Jun 2020 — Nov 2022

CEF Telecomm Programme

CURLICAT compiled curated multilingual language datasets in seven Central and Eastern European languages (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak, and Slovenian) to enhance the EU's Automated Translation infrastructure. The project targeted domains relevant to European Digital Service Infrastructures, creating high-quality parallel corpora for training machine translation systems. These resources support improved translation capabilities for underrepresented European languages within the CEF digital services.

WEBSITE

Project Homepage

RSDO

May 2020 — Feb 2023

Development of Slovene in a Digital Environment

RSDO developed modern language technologies for the Slovenian language, creating computational tools and services for research institutions, businesses, and the general public. The project delivered applications for speech recognition, machine transcription, machine translation, and terminology extraction, all released under open licenses through a public portal.

Our contributions included developing commonsense knowledge resources (SloATOMIC) and language models for Slovenian to ensure the language remains viable in the digital age.

WEBSITE

Project Homepage

CODE

SloATOMIC 2020 Dataset SloMET ATOMIC 2020 Model

Infinitech

Oct 2019 — Dec 2022

Horizon 2020

Infinitech developed novel Big Data, IoT, and AI technologies for managing diverse data types in the finance and insurance sectors, with strong emphasis on regulatory compliance and data governance. The project established nine testbeds and sandboxes offering open APIs for testing and validating innovative fintech solutions. These tools support both regulatory authorities and financial institutions in adopting data-driven decision-making while maintaining compliance.

WEBSITE

Project Homepage

EnviroLENS

Dec 2018 — Jun 2021

Horizon 2020

EnviroLENS bridged the gap between European satellite capabilities provided by Copernicus and environmental law enforcement needs. The project delivered Earth observation-based services providing evidence on environmental incidents and legal violations to support judicial data-gathering processes and foster data-driven decision-making.

Our contribution included developing the eLENS Miner System for processing and analyzing environmental legal documents and connecting them to geo-locations.

WEBSITE

Project Homepage

CODE

eLENS Miner System

X5GON

Sep 2017 — Dec 2020

Horizon 2020

X5GON created an AI-driven platform connecting open educational resources across languages, cultures, and domains to deliver personalized learning experiences. The project developed discovery, recommendation, and translation services that adapt to learner needs, making educational materials accessible regardless of language barriers. The technology formed the basis for UNESCO's International Research Centre on Artificial Intelligence work in education.

WEBSITE

Project Homepage X5GON Platform X5GON Discovery

CODE

X5GON Organization

European Data Science Academy

Sep 2015 — Jan 2018

Horizon 2020

EDSA addressed Europe's data science skills shortage by designing modular curricula for data science training across the European Union. The project analyzed sector-specific skillsets, developed adaptable training programs, and delivered learning resources through multi-platform and multilingual channels. The initiative established a virtuous learning cycle connecting industry demands with educational offerings through an interactive dashboard and aggregated courses portal.

WEBSITE

Project Homepage

QMiner

Feb 2015 — Jun 2022

Open Source Project

QMiner is a data analytics platform for processing large-scale real-time streams containing structured and unstructured data. Built as a Node.js addon, it provides support for text mining, full-text search, similarity matching, stream processing, and machine learning on continuous data flows. The platform enables anomaly detection, keyword analysis, and document classification for applications requiring real-time data analysis.

WEBSITE

Project Homepage

CODE

qminer