Wednesday, Sep 14 Thursday, Sep 15 Friday, Sep 16
























Image processing and reconstruction

9:00 – 10:00 (60 min)
Friday, Sep 16

Multimedia understanding and classification

9:00 – 10:00 (60 min)
Thursday, Sep 15


9:00 – 9:30 (30 min)
Wednesday, Sep 14

Keynote 1: Multimedia in Global Free Knowledge Ecosystems

9:30 – 10:30 (60 min)
Wednesday, Sep 14
Miriam Redi, Wikimedia Foundation


10:00 – 10:30 (30 min)
Friday, Sep 16


10:00 – 10:30 (30 min)
Thursday, Sep 15

Image analysis and enrichment

10:30 – 11:30 (60 min)
Thursday, Sep 15


10:30 – 11:00 (30 min)
Wednesday, Sep 14

Best Papers

11:00 – 12:30 (90 min)
Wednesday, Sep 14

Special Session: Computer-Assisted Clinical Applications

11:30 – 12:30 (60 min)
Thursday, Sep 15

Keynote 2: The Machine Learning of Time and Dynamics in Images, Videos, Simulations

11:40 – 12:40 (60 min)
Friday, Sep 16
Efstratios Gavves, University of Amsterdam


12:30 – 14:00 (90 min)
Thursday, Sep 15


12:30 – 13:50 (80 min)
Wednesday, Sep 14


12:40 – 13:00 (20 min)
Friday, Sep 16


13:00 – 14:30 (90 min)
Friday, Sep 16

Multimedia Indexing and Retrieval

14:00 – 15:20 (80 min)
Thursday, Sep 15

Posters and Demos

15:00 – 16:20 (80 min)
Wednesday, Sep 14
and coffee


15:20 – 15:50 (30 min)
Thursday, Sep 15

Music Meets Science

16:30 – 17:30 (60 min)
Wednesday, Sep 14

Steering Committee Meeting

17:30 – 18:30 (60 min)
Thursday, Sep 15
(closed session)

Guided Tour

18:00 – 19:30 (90 min)
Wednesday, Sep 14
Graz tour, ending at reception venue


19:30 – 22:00 (150 min)
Thursday, Sep 15


19:30 – 22:00 (150 min)
Wednesday, Sep 14

Keynote 1: Multimedia in Global Free Knowledge Ecosystems (Miriam Redi, Wikimedia Foundation)

Chair: Stefan Rudinac

Miriam Redi is a Research Manager at the Wikimedia Foundation and Visiting Research Fellow at King’s College London. Formerly, she worked as a Research Scientist at Yahoo Labs in Barcelona and Nokia Bell Labs in Cambridge. She received her PhD from EURECOM, Sophia Antipolis. She conducts research in social multimedia computing, working on fair, interpretable, multimodal machine learning solutions to improve knowledge equity.

Best Papers

Chair: Werner Bailer

Retrieval-Augmented Transformer for Image Captioning
Sara Sarto, Marcella Cornia, Lorenzo Baraldi and Rita Cucchiara

Hybrid Transformer Network for Deepfake Detection
Sohail Ahmed Khan and Duc-Tien Dang-Nguyen

An exploration into the benefits of the CLIP model for lifelog retrieval
Ly-Duyen Tran, Naushad Alam, Yvette Graham, Liting Zhou and Cathal Gurrin

Special Session: Multimodal Signal processing technologies for Protecting people and environment against Natural Disasters

Chair: Krishna Chandramouli

BiasUNet: Learning Change Detection over Sentinel-2 Image Pairs
Maria Eirini Pegia, Anastasia Moumtzidou, Ilias Gialampoukidis, Björn Þór Jónsson, Stefanos Vrochidis and Ioannis Kompatsiaris

Wildfire Segmentation using Deep-RegSeg Semantic Segmentation Architecture
Rafik Ghali, Moulay Akhloufi, Wided Souidene Mseddi and Marwa Jmal

Ecological Impact Assessment Framework for areas affected by Natural Disasters
Gardyas Bidari Adninda, Kusrini Kusrini, Arief Setyanto, Renindya A Kartikakirana, Rhisa A Suprapto, Arif D Laksito, I Made A Agastya, Krishna Chandramouli, Andrea Majlingova, Yvonne Brodrechtová, Konstantinos Demestichas and Ebroul Izquierdo

Posters and Demos

Chair: Mathias Lux


  • StyleGAN-based CLIP-guided Image Shape Manipulation | Yuchen Qian, Kohei Yamamoto and Keiji Yanai
  • Streaming learning with Move-to-Data approach for image classification | Abel Kahsay Gebreslassie, Jenny Benois-Pineau and Akka Zemmari
  • Analysing the  Memorability of a Procedural Crime-Drama TV Series | Seán Cummins, Lorin Sweeney and Alan Smeaton
  • A large-scale TV video and metadata database for French political content analysis and fact-checking | Frédéric Rayar, Mathieu Delalandre and Van-Hao Le
  • Relational Database Performance for Multimedia: A Case Study | Björn Þór Jónsson, Aaron Duane and Nikolaj Mertz
  • The Potential of Webcam Based Real Time Eye-Tracking to Reduce Rendering Cost | Isabel Kütemeyer, Mathias Lux
  • Self-Supervised Spiking Neural Networks applied to Digit Classification | Benjamin Chamand and Philippe Joly


  • A Virtual Reality Talking Avatar for Investigative Interviews of Maltreat Children | Syed Zohaib Hassan, Pegah Salehi, Michael Alexander Riegler, Miriam Sinkerud Johnson, Gunn Astrid Baugerud, Pål Halvorsen and Saeed Shafiee Sabet
  • A Toolchain for Extracting and Visualising Road Traffic Data | Helmut Neuschmied, Florian Krebs, Stefan Ladstätter and Georg Thallinger

Multimedia understanding and classification

Chair: Giuseppe Amato

An Audio-Visual Dataset and Deep Learning Frameworks for Crowded Scene Classification
Lam Pham, Dat Ngo, Tho Nguyen, Phu Nguyen, Truong Hoang and Alexander Schindler

A Fine Grained Quality Assessment of Video Anomaly Detection
Jiang Zhou, Kevin McGuinness, Noel E. O Connor and Joseph Antony

Learning Co-occurrence Features Across Spatial and Temporal Domains for Hand Gesture Recognition
Mohammad Rehan, Hazem Wannous, Jafar Alkheir and Kinda Aboukassem

Image analysis and enrichment

Chair: Klaus Schöffmann

Sentiment analysis on 2D images of urban and indoor spaces using deep learning architectures
Konstantinos Chatzistavros, Theodora Pistola, Sotiris Diplaris, Konstantinos Ioannidis, Stefanos Vrochidis and Ioannis Kompatsiaris

Urban Image Geo-Localization Using Open Data on Public Spaces
Mathias Glistrup, Stevan Rudinac and Björn Þór Jónsson

A domain adaptive deep learning solution for scanpath prediction of paintings
Mohamed Amine Kerkouri, Marouane Tliba, Aladine Chetouani and Alessandro Bruno

Special Session: Computer-Assisted Clinical Applications

Chair: Klaus Schöffmann

Segmenting partially annotated medical images
Nicolas Martin, Jean-Pierre Chevallet and Georges Quénot

Chest Diseases classification using CXR and deep ensemble learning
Adnane Ait Nasser and Moulay Akhloufi

Skin Cancer Detection using Ensemble Learning and Grouping of Deep Models
Takfarines Guergueb and Moulay Akhloufi

Multimedia Indexing and Retrieval

Chair: Björn Þór Jónsson

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval
Nicola Messina, Matteo Stefanini, Marcella Cornia, Lorenzo Baraldi, Fabrizio Falchi, Giuseppe Amato and Rita Cucchiara

Improving Nearest Neighbor Indexing by Multitask Learning
Amorntip Prayoonwong, Ke-Long Zeng and Chih-Yi Chiu

Towards Human Performance on Sketch-Based Image Retrieval
Omar Seddati, Stéphane Dupont, Saïd Mahmoudi and Thierry Dutoit

Analysis of the Complementarity of Latent and Concept Spaces for Cross-Modal Video Search
Varsha Devi, Philippe Mulhem and Georges Quénot

Panel: Multimedia Indexing and Retrieval Challenges in Media Archives

Chair: Georg Thallinger

  • Brecht DECLERCQ, President of FIAT/IFTA (International Association of TV Archives), Digitisation and Acquisition Manager at meemoo, the Flemish Institute for Archives
  • Richard WRIGHT, Preservation Guide
  • Johan OOMEN, Netherlands Institute of Sound and Vision
  • Christoph BAUER, Multimedia Archive of the Austrian Broadcasting Corporation

Brecht Declercq, MA, MSc is the President of FIAT/IFTA, the world association of media archives, and the Digitisation and Acquisition Manager at meemoo – The Flemish Institute for Archiving. He is responsible for the preservation of the Flemish audiovisual heritage, including one of the largest audiovisual digitisation programs currently going on globally. He worked for the Belgian public broadcaster VRT for almost 10 years in several digitisation, media asset management and access projects and led the FIAT/IFTA Preservation and Migration Commission from 2016 to 2019. He’s a frequent conference curator, presenter, guest lecturer, writer and reviewer. He advises policy makers, audiovisual archives and media organisations worldwide.

Richard Wright is an independent consultant on audiovisual preservation and access. His previous positions include Archive Preservation at BBC Research and Development (2007-2011) and Archive Technology Manager at BBC Information & Archives (1994-2007). He has been working on European standards for 30 years, including the widely used Broadcast Wave Format and the EBU guidance on core metadata (EBUCore). Since 1995 Richard has worked on European R&D projects, beginning with Euromedia which developed the first European audiovisual asset management system. He started the series of Presto projects (Presto, PrestoSpace, PrestoPRIME, Presto4U, Presto Centre) on audiovisual digitisation and digital preservation. Before joining the BBC, he worked in speech and hearing research, including speech recognition and synthesis, from 1967 to 1994.

Christoph Bauer was born in 1960 in Vienna/Austria, studied at Vienna’s University of Economics and has several other qualifications like cantor, pianist, organist, choir-conductor, composer, IT-developer, theologist, etc.; when starting to quit fooling around, he joined ORF in 1981. He acted as Project Officer ORF for several EC/IST/ICT/H2020/FAA-Projects (PRESTO, PRIMAVERA, FIRST, NODAL, PRESTOSPACE, eCHASE, PRESTOPRIME, DAVID, EUNOMIA, TailoredMedia, etc.) and is the senior specialist for preservation, digitization and restoration in the ORF archive department. In addition, he is acting as system-administrator for Archive-Systems and AV-Digitization, workflow-developer and AI-mining-specialist (audio&video). Christoph was chairman of the SNML-TNG Management Board (2011-2013), vice-chair of maa (Media-Archives-Austria Association) (2012-2016), member of the ARD K-ARL Expert group for Video-Mining, NKE for the EU-Project “Empowering Society” and lecturer at the University of Vienna. He is the current general-secretary of maa (Media-Archives-Austria Association), member of the ARD medas Expert group Mining (AI) and member of the Digitization & Migration Commission of FIAT/IFTA.

Image processing and reconstruction

Chair: Werner Bailer

Real-time deblurring network for face AR applications
Juhwan Lee, Jongha Lee and Sangwook Yoo

Hyperspectral Image Reconstruction of Heritage Artwork Using RGB Images and Deep Neural Networks
Ailin Chen, Rui Jesus and Márcia Vilarigues

A survey for image based methods in construction: from images to digital twins
Ilias Koulalis, Nikolaos Dourvas, Theocharis Triantafyllidis, Konstantinos Ioannidis, Stefanos Vrochidis and Ioannis Kompatsiaris

Special Session: Learning from scarce data challenges in the media domain

Chair: Hannes Fassold

Learning to Detect Fallen People in Virtual Worlds
Fabio Carrara, Lorenzo Pasco, Claudio Gennaro and Fabrizio Falchi

Few-shot Object Detection as a Semi-supervised Learning Problem
Werner Bailer and Hannes Fassold

Deep Features for CBIR with Scarce Data using Hebbian Learning
Gabriele Lagani, Davide Bacciu, Claudio Gallicchio, Fabrizio Falchi, Claudio Gennaro and Giuseppe Amato

Keynote 2: The Machine Learning of Time and Dynamics in Images, Videos, Simulations (Efstratios Gavves, University of Amsterdam)

Chair: Stefan Rudinac

Efstratios Gavves is an Associate Professor at the University of Amsterdam and co-founder of Ellogon.AI. He explores temporal machine learning and dynamics in many fronts: from neural system dynamics and symmetries to computational memory, dynamical systems and causality, with applications from fluid dynamics to biomedical, astronomy, climate science, oncology, and more.

Music Meets Science

Cultural event supported by SIG MM.

Trio concert | Schubert, Haydn

Performers: Olga Chepovetsky (CH), piano; François Pineau-Benois (FR), violin; Dorottya Standi (AT), cello;