Projects
Years of European research projects on language technology, health informatics, and small-language-model deployment.
Active
Currently running · 2SLM4IE is an ARIS-funded postdoctoral research project focused on developing computationally efficient Small Language Models (SLMs) for zero-shot information extraction in European languages. The project addresses key limitations of current large language models — high computational costs for on-premise deployment, training data gaps for sensitive domains and low-resource languages, and inconsistent outputs in information extraction tasks.
The project develops SLMs optimized for commercial GPU hardware (targeting <1B parameters and <8GB VRAM), creates multilingual benchmark datasets across sensitive domains such as medicine and science, and conducts systematic evaluations against established metrics. All models, datasets, training code, and documentation will be publicly released, when possible.
PREPARE is a Horizon Europe project aimed at improving the lives of people with chronic noncommunicable diseases by developing tools that enable patients and healthcare providers to select optimal therapy strategies. The project transforms rehabilitation through personalized care approaches, combining advances in clinical research, socio-behavioral science, data science, and AI methods to overcome challenges in patient stratification and outcome prediction.
Our contributions include developing text anonymization tools (Anonipy) and medical terminology extraction methods (Medtermex) to support privacy-preserving health data analysis. In addition, we developed an extraction tool to support medical term extraction and mapping to standardized medical vocabularies.
Past
Completed · 10SEMTEH upgraded open-access semantic resources for the Slovenian language by creating a new, manually reviewed version of the Slovenian WordNet semantic lexicon and establishing semantic links to the Digital Dictionary Database (DSB). The project employed deep learning and neural network methods to prepare data for manual review and to connect semantic resources, improving both vocabulary coverage and data reliability of the existing WordNet. The resulting linked data ensures availability of Slovenian semantic resources in major language technology collections such as BabelNet and Wikidata.
The European Statistics Awards is a Eurostat initiative promoting statistical literacy and data-driven thinking among students and young researchers across Europe. The project developed and maintains a competition platform that hosts annual challenges where participants analyze official European statistics to address real-world questions.
Our contribution involves developing and maintaining the web platform that manages competition organization, team registrations, submissions, and evaluation workflows.
The Humane AI Network is a European initiative advancing human-centered artificial intelligence that meaningfully integrates with human contexts and capabilities. The network coordinates research across multiple work packages covering human-in-the-loop systems, multimodal perception, ethics, and societal applications, while funding over 120 collaborative micro-projects. The initiative bridges AI research with societal impact, examining how artificial intelligence affects diverse communities and addressing ethical dimensions of AI deployment.
CURLICAT compiled curated multilingual language datasets in seven Central and Eastern European languages (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak, and Slovenian) to enhance the EU's Automated Translation infrastructure. The project targeted domains relevant to European Digital Service Infrastructures, creating high-quality parallel corpora for training machine translation systems. These resources support improved translation capabilities for underrepresented European languages within the CEF digital services.
RSDO developed modern language technologies for the Slovenian language, creating computational tools and services for research institutions, businesses, and the general public. The project delivered applications for speech recognition, machine transcription, machine translation, and terminology extraction, all released under open licenses through a public portal.
Our contributions included developing commonsense knowledge resources (SloATOMIC) and language models for Slovenian to ensure the language remains viable in the digital age.
Infinitech developed novel Big Data, IoT, and AI technologies for managing diverse data types in the finance and insurance sectors, with strong emphasis on regulatory compliance and data governance. The project established nine testbeds and sandboxes offering open APIs for testing and validating innovative fintech solutions. These tools support both regulatory authorities and financial institutions in adopting data-driven decision-making while maintaining compliance.
EnviroLENS bridged the gap between European satellite capabilities provided by Copernicus and environmental law enforcement needs. The project delivered Earth observation-based services providing evidence on environmental incidents and legal violations to support judicial data-gathering processes and foster data-driven decision-making.
Our contribution included developing the eLENS Miner System for processing and analyzing environmental legal documents and connecting them to geo-locations.
X5GON created an AI-driven platform connecting open educational resources across languages, cultures, and domains to deliver personalized learning experiences. The project developed discovery, recommendation, and translation services that adapt to learner needs, making educational materials accessible regardless of language barriers. The technology formed the basis for UNESCO's International Research Centre on Artificial Intelligence work in education.
EDSA addressed Europe's data science skills shortage by designing modular curricula for data science training across the European Union. The project analyzed sector-specific skillsets, developed adaptable training programs, and delivered learning resources through multi-platform and multilingual channels. The initiative established a virtuous learning cycle connecting industry demands with educational offerings through an interactive dashboard and aggregated courses portal.
QMiner is a data analytics platform for processing large-scale real-time streams containing structured and unstructured data. Built as a Node.js addon, it provides support for text mining, full-text search, similarity matching, stream processing, and machine learning on continuous data flows. The platform enables anomaly detection, keyword analysis, and document classification for applications requiring real-time data analysis.