These
  • research

THESIS: Mamadou KEITA

Mr. Mamadou KEITA will publicly submit his thesis work entitled:
AI-generated image detection using multimodal deep learning.

THESIS: Mamadout KEITA - AI-generated image detection using multimodal deep learning

these

Mamadou KEITA

Soutenance: January 28, 2026
Amphitheatre - IEMN Site de Valenciennes - UPHF - Valenciennes

Jury

Reporters:

  • Mr. Zahid AKHTAR - State University of New York Polytechnic Institute
  • Mrs Laetitia JOURDAN - CRISTAL, Lille

Examiners:

  • Mr Azeddine BEGHDADI - Université Sorbonne Paris Nord
  • Mr Yassine RUICHECK - Université de Technologie de Belfort-Montbéliard

Invited guests:

  • Mr Smail NIAR - Université Polytechnique Hauts-de- France
  • Mrs Atika RIVENQ - Université Polytechnique Hauts-de- France

Thesis supervisors:

  • Mr Abdelmalik TALEB-AHMED - Université Polytechnique Hauts de France
  • Mr. Abdenour HADID - Sorbonne University, Abu Dhabi (UAE)

Summary

Generative artificial intelligence (AI) and its applications offer considerable advantages, but raise potentially critical societal and ethical issues. This thesis explores the field of synthetic content generation and detection, with a particular focus on synthetic images. These are AI-generated images that can appear indistinguishable from real photographs, creating new challenges for digital forensics, media integrity and public trust. While generative models such as generative adversarial networks (GANs) and diffusion models have evolved rapidly, most existing detection methods remain limited in terms of generalizability, robustness and interpretability. To address these challenges, this research investigates new AI-based approaches to improve the generalizability, robustness and interpretability of current image detection methods. The various aspects of synthetic image detection are examined in this thesis through four major contributions. The first contribution, entitled Bi-LORA, proposes an efficient vision-language approach that reformulates the detection problem in the form of an image caption generation task. The method demonstrates a remarkable zero-shot degeneracy capability in the face of as yet unseen generative models. The second
contribution, named RAVID, introduces a retrieval-augmented visual detection framework. This framework aims to enhance the robustness and interpretability of detection systems by integrating a relevant external visual context. The third contribution, DeeCLIP, presents a lightweight model based on a transformer-type architecture, which combines shallow and deep features to improve resilience to visual degradation and post-processing operations. Finally, the fourth contribution, FIDAVL, proposes a unified approach to synthetic image detection and source attribution, within a multitasking framework based on soft prompt tuning in vision-language models. This thesis provides an in-depth analysis of the current state of research on synthetic images and their societal impact. It underlines the importance and urgency of designing detection methods that are both efficient and robust. The approaches developed in this work contribute to the advancement of scientific research, and have concrete application potential in the fields of media authenticity and digital security. In this perspective, this research represents a significant step towards the development of solutions to the challenges raised by GenAI.