28. April 2026

What AI Actually Does in Diffusion Models What AI Actually Does in Diffusion Models

+++ RESEARCH TICKER UNIVERSITY OF BONN: Artificial Intelligence for Drug Design +++

In the search for new drugs, artificial intelligence in the form of diffusion models is being used in drug design. What exactly does AI do in this context? Dr. Andrea Mastropietro and Prof. Dr. Jürgen Bajorath from Life Science Informatics at the University of Bonn and the Lamarr Institute for Machine Learning and Artificial Intelligence have investigated this.

Isolated fragments are connected by a linker,
Isolated fragments are connected by a linker, - generated by the diffusion model. DiffSHAPer rationalizes the process, by determining which atoms favor (or oppose) the generation. © Image: Andrea Mastropietro
Download all images in original size The impression in connection with the service is free, while the image specified author is mentioned.
Please fill out this field using the example format provided in the placeholder.
The phone number will be handled in accordance with GDPR.

WHAT IS IT ABOUT?
Diffusion models have been mainly used for image and video generation. Recently, their usage has been extended to new domains, such as chemistry for the generation of new molecules. For our analysis we aimed at generality and approached the explanation of diffusion models for linker design of molecules with different applications.

WHAT IS A “LINKER”?
A linker is a substructure of a molecule that connects two or more disconnected fragments of atoms. Linker design is an important task in drug development, as it plays a central role in the design of effective molecules with specific properties.

HOW DO DIFFUSION MODELS WORK IN PRINCIPLE?
Diffusion models learn a data distribution and generate new data by sampling from that distribution. The diffusion model itself is an advanced AI model. We try to understand its generative process.

HOW DOES “NOISE” COME INTO PLAY?
Adding and removing noise is the hallmark of diffusion models. Starting from a sample in the dataset (an image or, in our case, a molecule), they add “noise” until the original sample is “destroyed”—like the transition from a detailed image to a “TV static effect.” Then, the model learns how such added noise needs to be removed to retrieve a valid sample, generating a new image (or molecule).

HOW DID YOU PROCEED?
For our study, we selected a state-of-the-art diffusion model for linker design and developed a novel explainability strategy extending a well-known concept in the field on explainable artificial intelligence: Shapley values. For our method, DiffSHAPer, we adapted the widely used Shapley value formalism for explaining machine learning predictions to diffusion models. Our goal was to find which fragment atoms were the most influential for linker generation.

WHAT IS THE MOST IMPORTANT FINDING?
We found that, to generate chemically valid linkers, diffusion models do not learn or exploit chemistry principles, but they mostly rely on distance constraints between atoms. Therefore, they take into account recurrent statistical patterns in the data without learning generalizable chemical rules.

WHAT WAS THE BIGGEST CHALLENGE?
From a computational perspective, running inference and explaining the generations of diffusion models are time-consuming tasks. From a methodological perspective, our approach represents a novelty, therefore we had to find the best way to present our results effectively.

IS THERE AN APPLICATION?
Our methodology can be used to understand what molecular diffusion models learn. In the specific case of linker design, it’s useful to determine what drives the generation of the linker. Linkers are important in drug design, as they can improve critical molecular properties (such as potency and stability). Consequently, a linker generated solely based on distance and geometric constraints does not guarantee optimization of properties or practical chemical utility.

WHAT ARE THE NEXT STEPS?
The first step would be to apply DiffSHAPer to molecular diffusion models tailored to different tasks. Future research will be focused on the development of models able to include more chemical context in their internal reasoning. 

Andrea Mastropietro and Jürgen Bajorath: Explaining a molecular diffusion model, Cell Reports Physical Science, DOI: 10.1016/j.xcrp.2026.103270, URL: https://www.cell.com/cell-reports-physical-science/fulltext/S2666-3864(26)00176-1

Prof. Dr. Jürgen Bajorath
Dr. Andrea Mastropietro
Bonn-Aachen International Center for Information Technology (b-it)
Lamarr Institute for Machine Learning and Artificial Intelligence
University of Bonn
Tel. +49 228 7369 100
E-Mails: bajorath@bit.uni-bonn.de , mastropietro@bit.uni-bonn.de 

Wird geladen