2022 Thematic meetings (ML & Sampling)

Goal: To hold focused meetings on hot topics in structural bioinformatics

When: 5-8 December 2022

Where: Paris Sorbonne Université, Jussieu

Organizers: Frédéric Cazals (Sampling; Inria Sophia-Antipolis), Sergei Grudinin (ML), Yann Ponty, Alessandra Carbone

Location

The events will be hosted by the Jussieu campus of Sorbonne Université in Paris, France.

Lectures will be delivered in the C404 room located on the Jussieu campus (Metro Jussieu). Coffee breaks will be served in the colibri room.

Program

Monday Dec 5th
14:00-14:05 Introduction – Welcome address
Machine Learning 1
14:05-14:35 Vincent Mallet Institut Pasteur
Representations learning on the structure of biomolecules (abstract)
14:35-15:05 Yasser Mohseni Behbahani Sorbonne Université
Deep Local Analysis estimates effects of mutations on protein-protein interactions (abstract)
15:05-15:35 Dmitrii Zhemchuzhnikov Université Grenoble Alpes
6DCNN with Roto-Translational Convolution Filters for Volumetric Data Processing (abstract)
15:35-16:20 Coffee break
16:20-16:50 Valentin Lombard LCQB
Toward a comprehensive description of protein conformational diversity (abstract)
16:50-17:20 Pablo Chacon Institute of Physical Chemistry (IQFR-CSIC)
Geometric Algebra Models of Proteins for learning orientations (abstract)
17:20-18:00 Free discussions
Tuesday Dec 6th
Machine Learning 2
9:00-9:50 Martin Weigt LCQB
Generative models of biomolecular sequences (abstract)
9:50-10:20 Nicolas Buton Data and Knowledge Management, Univ. Rennes, Inria, IRISA, Rennes, 35000, France
Predicting enzymatic function of protein sequences with attention (abstract)
11:00-11:20 Yulia Kacher Université de Lorraine, CNRS, LPCT, F-54000 Nancy, France
Proteins conformational space investigation: a machine learning approach to uncover missing intermediates (abstract)
10:20-11:00 Coffee break
11:20-11:40 Romain Menegaux INRIA
Encoding graph structure in transformers (abstract)
12:00-14:00 Lunch break
14:00-14:20 Stéphane Téletchéa Nantes Université, US2B, CNRS, UMR6286, F-44000 Nantes
Forces et faiblesses des méthodes de prédiction des structures de protéines (abstract)
14:20-14:40 Helene Bret CEA / I2BC
Modeling disordered regions in interactomes using AlphaFold (abstract)
14:40-15:00 Vaitea Opuu Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
CARNAGE: structure-in-sequence compression with graph neural network (abstract)
15:00-15:30 Marianne Defresne INSA Toulouse
Protein Design with Automated Reasoning and Deep Learning (abstract)
15:30-16:00 Sergei Grudinin LJK CNRS
News from CASP15 and Discussion topics (abstract)
16:00-18:00 Coffee break and Discussion
Wednesday Dec 7th
Sampling 1
8:45-9:40 Jérôme Hénin CNRS – IBPC
Tools for exploring molecular free energy landscapes using mostly natural (and some artificial) neural networks (abstract)
9:40-10:35 Gabriel Stoltz Ecole des Ponts
Coarse-graining and efficiently sampling with autoencoders (abstract)
10:30-11:00 Coffee break
11:00-12:00 Elias Tsigaridas Inria Paris and IMJ-PRG
Geometric random walks and sampling from convex bodies (abstract)
12:00-14:00 Lunch break
14:00-14:45 Augustin Chevallier Université Strasbourg
Sampling using Piecewise Deterministic Markov Processes (abstract)
14:45-15:50 David Wales University of Cambridge
Energy landscapes: from machine learning to enhanced sampling (abstract)
15:50-16:20 Coffee break
16:20-17:00 Samuela Pasquali Université Paris Cité
Modeling RNA Polymorphism (abstract)
Thursday Dec 8th
Sampling 2
8:45-9:40 Tony Lelievre Ecole des Ponts
Sampling probability measures on submanifolds (abstract)
9:40-10:35 Juan Cortés CNRS
A unified computational method with reinforcement learning for modeling conformational ensembles of complex protein architectures (abstract)
10:35-11:05 Coffee break
11:05-12:00 Frederic Cazals Inria
Sampling molecular conformations with seven-league boots: can we reconcile structure and thermodynamics? (abstract)
12:00-14:00 Lunch break

Abstracts

Vincent Mallet Institut Pasteur
Representations learning on the structure of biomolecules
With countless success stories, machine learning now appears to hold a lot of potential to make the most of biological data. In the structural biology field, AlphaFold2 was a game changer for protein structure prediction. However, computer vision or natural language processing models do not capture the relevant properties of structural biology data. To leverage this data in the most efficient way, we must model our biomolecules as mathematical objects, devise learning methods that respect the mathematical properties of these objects and adapt these models to each problem at hand. Several representations (graph, surface, point cloud, voxels…) and many methods – coined as geometric deep learning – exist.
We will start by reviewing and comparing several such approaches. We will then talk about getting vector embeddings for structures. These embeddings can approximate the feature maps of kernels or derive from self-supervised learning. They can be used to find structural homologs or to mine motifs.
Yasser Mohseni Behbahani Sorbonne Université
Deep Local Analysis estimates effects of mutations on protein-protein interactions
The spectacular recent advances in protein and protein complex structure prediction hold promise for the reconstruction of interactomes at large scale and at the residue resolution. Beyond determining the 3D arrangement of interacting partners, modeling approaches should be able to sense the impact of sequence variations such as point mutations on the strength of the association. In this work, we report on Deep Local Analysis (DLA), a novel and efficient deep learning framework for accurately predicting mutation-induced binding affinity changes. It relies on a 3D-invariant description of local 3D environments at protein-protein interfaces. DLA leverages on the large amount of available protein complex structures through self-supervised learning. We show that the learnt representations are informative about residue- and interface-based physico-chemical and functional properties. Enriching these representations with evolutionary information and a description of interface structural regions in a siamese architecture show promising results in predicting binding affinity changes. We assess DLA performance against experimental binding affinity measurements from SKEMPI-2.0. We generated mutant and wild-type conformations for 142 complexes displaying 2003 single-point mutations (S2003) using Rosetta-Backrub algorithm. The latter explicitly models the flexibility of the backbone and side-chains, and accounts for their fluctuations around the native state. We demonstrate that DLA competes or outperforms the state-of-the-art on two blind test sets derived from S2003: (i) 391 mutations from 39 complexes (not seen during training), with Pearson correlation of 0.72, and (ii) 112 mutations from 17 complexes (less than 30% sequence identity with any complex from the train set), with Pearson correlation of 0.54. We further evaluate the influence of conformational variability and the contribution of the different types of information, namely evolutionary, geometrical and physico-chemical. DLA has a very fast inference performance even on a user’s machine with no GPU. With the rapid increase of protein complex structural coverage and potential improvements on the quality of mutant structures, DLA will become instrumental for high-throughput mutational scans of interactomes.
Yasser Mohseni Behbahani, Elodie Laine, Alessandra Carbone
Dmitrii Zhemchuzhnikov Université Grenoble Alpes
6DCNN with Roto-Translational Convolution Filters for Volumetric Data Processing
In this work, we introduce 6D Convolutional Neural Network (6DCNN) designed to tackle the problem of detecting rela- tive positions and orientations of local patterns when process- ing three-dimensional volumetric data. 6DCNN also includes SE(3)-equivariant message-passing and nonlinear activation operations constructed in the Fourier space. Working in the Fourier space allows significantly reducing the computational complexity of our operations. We demonstrate the properties of the 6D convolution and its efficiency in the recognition of spatial patterns. We also assess the 6DCNN model on several datasets from the recent CASP protein structure prediction challenges. Here, 6DCNN improves over the baseline architecture and also outperforms the state-of-the-art.
Valentin Lombard LCQB
Toward a comprehensive description of protein conformational diversity
I will present my study on proteins’ deformations and conformational diversity as they influence their functions. This work consists of creating a database of similar protein clusters from the PDB and applying linear and non-linear dimensionality reduction techniques to these clusters. Our database consists of more than 48,000 clusters that rigorously cover all the protein chains contained in the PDB. From these clusters, we sub-selected 12 well-documented case studies. These are of therapeutic interest, with many literature references. They have multiple conformations with known functional roles. This resource and our results with dimensionality reduction techniques can be helpful to the community to benchmark methods predicting functional states.
Pablo Chacon Institute of Physical Chemistry (IQFR-CSIC)
Geometric Algebra Models of Proteins for learning orientations
We are interested to explore the advantage of using Geometric Algebra (GA) to model proteins. After a brief introduction to GA and some successful applications to protein structure, I present our work on using these techniques to learn the orientation of the amino acids in the protein chain. We propose a single map based on a GA description of the protein geometry, which is intuitive, compact, descriptive of the protein folding, and easily predictable, with the potential of simplifying both the protein modeling and the complex protein structure prediction pipelines.
Martin Weigt LCQB
Generative models of biomolecular sequences
Nicolas Buton Data and Knowledge Management, Univ. Rennes, Inria, IRISA, Rennes, 35000, France
Predicting enzymatic function of protein sequences with attention
Deep Learning has recently demonstrated tremendous success in Natural Language Processing. In this presentation, we will show how a Transformer pre-trained language model on protein sequences can be used to predict enzyme commission (EC) number. This result in a state-of-the-art model for monofunctional enzyme class prediction, with improvements in level two EC prediction accuracy on the EC40 dataset from 84% to 95% and level four EC prediction accuracy on the ECPred40 dataset from 41% macro-f1 score to 54%. We also show that when validating with catalytic residues, using a combination of attentions map is on par or better than other interpretability methods, with a max F-Gain score of 96,05% (better than known methods with max scores of 91,44%).
Yulia Kacher Université de Lorraine, CNRS, LPCT, F-54000 Nancy, France
Proteins conformational space investigation: a machine learning approach to uncover missing intermediates
Proteins are not static structures and most often their functioning requires large conformational rearrangements and transitions between key intermediate states. The more complex the protein function is – the harder it is to determine these rearrangements and elucidate the mechanisms that underlie their structure-function relationship. Due to the lack of experimental data able to uncover the whole protein conformational space, one often resorts to atomistic molecular dynamics (MD) simulations as an alternative. The latter fails however to sample correctly the regions which involve crossing high-energy barriers and very large timescales.

Recently, Machine Learning algorithms have been introduced to boost MD simulations sampling and data post-processing. Some were shown to succeed in uncovering intermediate structures and the path between them for small proteins and model systems. We present here a recent algorithm introduced by Ramaswamy et al. that encompasses an interpolation scheme able to predict transition paths between different conformations of a given protein. The deep-learning algorithm, 1D convolutional autoencoder, reconstructs a complex manifold representing the conformational space by training the neural network on the atom coordinates obtained from MD simulations.

We discuss here particular limitations of the model and show how we adapted and improved the neural network to investigate the conformational space of several proteins. We introduce furthermore an interactive visualization module that makes it possible to represent the manifold in an intuitively interpretable way.

1 Ramaswamy, V. K., Musson, S. C., Willcocks, C. G. & Degiacomi, M. T. Deep Learning Protein Conformational Space with Convolutions and Latent Interpolations. Phys. Rev. X 11, 011052 (2021).
Romain Menegaux INRIA
Encoding graph structure in transformers
In this talk we show a new way to incorporate both edge features and graph structure in graph transformers, as well as a new more expressive flavour of self-attention. Our method reaches state of the art results on molecular benchmarks.
Stéphane Téletchéa Nantes Université, US2B, CNRS, UMR6286, F-44000 Nantes
Forces et faiblesses des méthodes de prédiction des structures de protéines
Les récents développement en apprentissage profond ont permis la prédiction à grande échelle de la structure de toutes les protéines connues. Les 220 millions de séquences disponibles dans les banques publiques ont ainsi été traitées pour obtenir un modèle tridimensionnel de chacune des protéines. Ces modèles mis à la disposition de la communauté ne présentent cependant pas tous une qualité égale, ce qui peut conduire à une surinterprétation ou à une interprétation erronée de la fonction des protéines inconnues, ou encore à la structure réelle de ces protéines. Cet exposé présentera une étude de ces prédictions sur plusieurs séquences modèles possédant divers degrés d’homologie avec des protéines naturelles.
Lucas DAVID, Florian ECHELARD, Damien GARCIA
Helene Bret CEA / I2BC
Modeling disordered regions in interactomes using AlphaFold
The Alphafold2 (AF) model is a breakthrough in the structural prediction of protein-protein interactions and also appears to be effective in predicting protein-peptide interactions. We explored the performance of this tool to unravel the complexity of PPI networks mediated by disordered regions of proteins. Using an unbiased dataset, we benchmarked several predictive strategies and evaluated their ability to produce accurate models of interactions.

Tested strategies challenge the way multiple sequence alignments are built and the impact of the length of the input proteins on the prediction success rates. Additionally, we also questioned the specificity of the prediction in case the correct binding partner need to be identified within a set of multiple interactions. Altogether, we propose a protocol to maximize the chances for identifying the proper binding regions.
Raphael Guérois, Jessica Andreani
Vaitea Opuu Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany
CARNAGE: structure-in-sequence compression with graph neural network
The biological function of natural non-coding RNAs (ncRNA) is tightly bound to their molecular structure. Sequence analyses such as multiple sequence alignments (MSA) are the bread and butter of DNA, protein, and RNA functional analysis; however, analyzing sequence and structure simultaneously is not easy. In contrast with protein and DNA, the RNA sequence alignment usually requires explicit secondary structure constraints, which eventually yield higher computational costs.

In this work, we propose CARNAGE (Clustering/Alignment of RNA with Graph-network Embedding), which leverages a graph neural network encoder to imprint secondary structure information into a sequence-like embedding; therefore, usual downstream sequence analyses applied on them now account implicitly for structural constraints. Our method is versatile and has been tested on 1) designing RNAs for targeted structures, 2) clustering sequences, and 3) aligning sequences.

In the mainstream approach, alignments maximize likely mutations by extracting substitution scores from hand-crafted alignments—BLOSUM60 is one famous example. Many modern deep learning-based alignment methods are no different as they are being trained directly on high-quality structural alignments. In contrast, we trained our network on a masking problem, independent from the alignment problem, where the sequence-like embedding is used as a bottleneck of information flow, allowing a rich embedding.

While our method performed fairly in designing RNA sequences to fold into targeted structures, it showed good performances in clustering synthetic sequences designed to adopt specific structures. Similarly, it showed good performances at clustering natural sequences. Moreover, we used our approach to build MSAs that were found to exhibit comparable performances to state-of-the-art tools while using the simplest Needleman and Wunsch’s alignment algorithm.

Not only can this approach be readily extended to RNA tridimensional structures, but it can also be applied to proteins. Our ongoing efforts aim at introducing information beyond intra-interactions, for example, RNA-protein contact sites.
Hélène Bret, I2BC/CEA, Université Paris-Saclay.
Marianne Defresne INSA Toulouse
Protein Design with Automated Reasoning and Deep Learning
In this talk, we introduce a method for Computational Protein Design (CPD) based on two different Artificial Intelligence technologies: automated reasoning, augmented with deep learning. The goal of CPD is to design proteins with enhanced or new properties or functions. We formulate CPD as an optimization problem: given an input backbone (crafted to carry out the desired function/property), we want to find the most suitable sequence, i.e., the sequence minimizing the energy of the backbone. Our approach is based on the exact optimization of a pairwise decomposable energy function.
This algorithm has been successfully applied to various real-case designs, but it is limited by the energy function optimized. Existing decomposable energy functions are based on simplified physic force-fields or on statistics. Here, we tried to directly learn a pairwise decomposable fitness function on known protein structures using Deep Learning techniques. We in silico tested the quality of the learned energy function, and found it to be competitive with state-of-the art hybrid and statistical functions such as those available in Rosetta [1] or KORP [2].
[1] Park, Hahnbeom et al. (2016). “Simultaneous Optimization of Biomolecular Energy Functions on Features from Small Molecules and Macromolecules”. In: Journal of Chemical Theory and Computation 12.12, pp. 6201–6212.
[2] Lopez-Blanco, José Ramon and Pablo Chacon (Jan. 2019). “KORP: knowledge-based 6D potential for fast protein and loop modeling”. In: Bioinformatics 35.17, pp. 3013–3019. issn: 1367-4803.
Sergei Grudinin LJK CNRS
News from CASP15 and Discussion topics
In my talk, I will present new challenges introduced in the CASP15 blind community-wise computational exercise and outline some of the proposed deep-learning architectures. I will also present several topics for the subsequent discussion.
Jérôme Hénin CNRS – IBPC
Tools for exploring molecular free energy landscapes using mostly natural (and some artificial) neural networks
A powerful family of methods for exploring free energy landscapes relies on dimensionality reduction processes to obtain reduced representations of manageable size. If these representations describe the low-dimension manifold within which the important configurations lie, then enhancing sampling within that manifold leads to correct estimates of properties averaged over the whole distribution of configurations. I will present the Collective Variables Module, a library to design, run, and interpret simulations based on such reduced representations. At the other end of the spectrum, one may want to avoid the delicate operation of finding low-dimension representations, and stay clear of the statistical limitations of simulations by bypassing simulations altogether. Instead, we would like to sample directly from the Boltzmann distribution using a generator approach. I will discuss progress in this respect, in particular, the quest for a generator that can be trained without simulation data.

[A] J. Hénin, Laura J. S. Lopes, G. Fiorin, Human learning for molecular simulations: the Collective Variables Dashboard in VMD, J. Chem. Theo. Comput. 18 (3), 1945–1956 (2022)

[B] Lesage A., Lelièvre T., Stoltz G., Hénin J., Smoothed biasing forces yield unbiased free energies with the extended-system Adaptive Biasing Force method, J. Phys. Chem. B 121 (15), 3676–3685 (2017)

[C] G. Fiorin, M. L. Klein, J. Hénin, Using collective variables to drive molecular dynamics simulations, Molecular Physics 111 (22-23), 3345-3362 (2013)
Gabriel Stoltz Ecole des Ponts
Coarse-graining and efficiently sampling with autoencoders
A coarse-grained description of atomistic systems in molecular dynamics is provided by reaction coordinates. These nonlinear functions of the atomic positions are a basic ingredient to compute more efficiently average properties of the system of interest, such as free energy profiles. However, reaction coordinates are often based on an intuitive understanding of the system, and one would like to complement this intuition or even replace it with automated tools. One appealing tool is autoencoders, for which the bottleneck layer provides a low dimensional representation of high dimensional atomistic systems. In order to have an efficient numerical method, autoencoders should be combined with importance sampling techniques based on adaptive biasing methods. The algorithm then iterates between an update of the reaction coordinate, and free energy biasing. I will discuss some mathematical foundations of this method, and present illustrative applications for biophysical systems, including alanine dipeptide and chignolin. Some on-going extensions to more demanding systems, namely HSP90, will also be hinted at. Depending on time, I will also mention current extensions aiming at sampling reactive paths.

[1] Z. Belkacemi, P. Gkeka, T. Lelièvre, G. Stoltz, Chasing collective variables using autoencoders and biased trajectories, J. Chem. Theory Comput. 18(1), 59-78 (2022)

[2] P. Gkeka, G. Stoltz, A. Barati Farimani, Z. Belkacemi, M. Ceriotti, J. Chodera, A. Dinner, A. Ferguson, J.-B. Maillet, H. Minoux, C. Peter, F. Pietrucci, A. Silveira, A. Tkatchenko, Z. Trstanova, R. Wiewiora, T. Lelièvre, Machine learning force fields and coarse-grained variables in molecular dynamics: application to materials and biological systems, J. Chem. Theory Comput. 16(8), 4757–4775 (2020)
Elias Tsigaridas Inria Paris and IMJ-PRG
Geometric random walks and sampling from convex bodies
We present algorithmic, complexity, and implementation results on the problem of sampling points from polytopes (the feasible regions of linear programs) and spectrahedra (the feasible region of a semidefinite programs). To sample from log-concave distributions, our main tool is geometric random walks that we realize using primitive geometric operations, that in turn exploit the algebraic and geometric properties of convex bodies and the (polynomial) eigenvalue problem.
We apply our methods to sample from polytopes that correspond to the steady states of metabolic networks. In particular, we sample a polytope of dimension 5 335 to (part of) the human metabolic network Recon3D, in less than 30 hours.

The talk is based on joint works with Apostolos Chalkis and Vissarion Fisikopoulos and also Ioannis Emiris, Marios Papachristou, and Haris Zafeiropoulos.

[1] Apostolos Chalkis, Vissarion Fisikopoulos, Elias Tsigaridas, Haris Zafeiropoulos:
Geometric Algorithms for Sampling the Flux Space of Metabolic Networks. SoCG 2021: 21:1-21:16

[2] Apostolos Chalkis, Vissarion Fisikopoulos, Marios Papachristou, Elias Tsigaridas:
Truncated Log-concave Sampling with Reflective Hamiltonian Monte Carlo. CoRR abs/2102.13068 (2021)

[3] Apostolos Chalkis, Ioannis Emiris, Vissarion Fisikopoulos, Panagiotis Repouskos, and Elias Tsigaridas, 2020. Efficient Sampling from Feasible Sets of SDPs and Volume Approximation. arXiv preprint arXiv:2010.03817 (to appear in Linear Algebra and its Applications).
Augustin Chevallier Université Strasbourg
Sampling using Piecewise Deterministic Markov Processes
I will present PDMP sampling, a class of non reversible sampling algorithms that have garnered a lot of attention in the last few years. These algorithms are hard to implement since they require computing bounds (by hand) of some function that depends on the target. This talk will be focused on recent techniques that allow us to bypass this bound computation.

Efficient computation of the the volume of a polytope in high-dimensions using Piecewise Deterministic Markov Processes
A. Chevallier, and F. Cazals, and P. Fearnhead
AISTATS, 2022
F. Cazals and P. Fearnhead
David Wales University of Cambridge
Energy landscapes: from machine learning to enhanced sampling
The potential energy landscape provides a conceptual and computational framework for investigating structure, dynamics and thermodynamics in atomic and molecular science. This talk will summarise new approaches for global optimisation, quantum dynamics, the thermodynamic properties of systems exhibiting broken ergodicity, and rare event dynamics. Applications will be presented that range from prediction and analysis of high-resolution spectra, to coarse-grained models and design principles for self-assembly of mesoscopic structures. The computational energy landscapes approach is based on geometry optimisation, and the methodology can be applied to explore the solution landscapes for neural networks. Recent results include applications to patient outcomes based on electronic healthcare data, and to K-means clustering of gene expression datasets for identification of cancer subtypes.

Selected Publications:
[1] Elucidating the Solution Structure of the K-means Cost Function Using Energy Landscape Theory, J. Chem. Phys., 156, 054109, 2022.
[2] Ann. Rev. Phys. Chem., 69, 401-425, (2017). Exploring Energy Landscapes
[3] Chem. Commun, 53, 6974-6988 (2017). Exploring biomolecular energy landscapes
[4] Perspective: Energy Landscapes for Machine Learning, PCCP, 19, 12585-12603, 2017.
[5] Perspective: Insight Into Reaction Coordinates and Dynamics From the Potential Energy Landscape, JCP, 142, 130901, 2015.
[6] Energy Landscapes: Some New Horizons, Curr. Op. Struct. Biol., 20, 3-10, 2010.
[7] Energy Landscapes, Cambridge University Press, Cambridge, 2003
Samuela Pasquali Université Paris Cité
Modeling RNA Polymorphism
RNA molecules are characterized by the existence of a multitude of stable states that result in a frustrated energy landscape, where the observed structures depend sensibly on experimental conditions. Using both atomistic and coarse-grained models for RNAs, combined with enhanced sampling methods, we investigate the energy landscape of these systems to understand what the most relevant structures in the different conditions are. I will describe the different simulation methods we employ to study these systems and discuss a few examples highlighting the complementarity of the various methods to obtain a comprehensive picture of the molecule’s behavior.


[1] T. Cragnolini, Y. Laurin, P. Derreumaux, S. Pasquali, « The coarse-grained HiRE-RNA model for de novo calculations of RNA free energy surfaces, folding, pathways and complex structure predictions », JCTC, 11, 3510 (2015);
[2] Joseph, J., Roeder, K., Chakraborty, D., Mantell, R., & Wales, D., « Exploring biomolecular energy landscapes », Chem Comm, 53 (52), 6974-6988 (2017);
[3] S. Pasquali, E. Frezza, F.L. Barroso da Silva, « Coarse-grained dynamic RNA titration simulations », Interface Focus 9: 20180066 (2019)
[4] K Röder, G Stirnemann, AC Dock-Bregeon, DJ Wales, S Pasquali, « Structural transitions in the RNA 7SK 5′ hairpin and their effect on HEXIM binding », Nucleic Acids Research 48 (1), 373-389 (2020)
[5] K Röder, AM Barker, A Whitehouse, S Pasquali, « Investigating the structural changes due to adenosine methylation of the Kaposi’s sarcoma-associated herpes virus ORF50 transcript », PLOS Computational Biology (2022)
[6] G Lazzeri, C Micheletto, S Pasquali, P Faccioli, « RNA Folding Pathways from All-Atom Simulations with a Variationally Improved History-Dependent Bias » arXiv:2205.12603 (2022)
Tony Lelievre Ecole des Ponts
Sampling probability measures on submanifolds
Various applications require sampling probability measures restricted to submanifolds. For example, in molecular dynamics, one often considers molecular systems whose configurations are distributed according to the Boltzmann–Gibbs measure with so-called molecular constraints such as fixed bond lengths or fixed bending angles in molecules and/or fixed values of the so-called reaction coordinate function for the computation of free energy differences using thermodynamic integration. Such sampling problems also appear in computational statistics and machine learning.

Probability measures supported on submanifolds can be sampled by adding an extra momentum variable to the state of the system, and discretizing the associated Hamiltonian dynamics with some stochastic perturbation in the extra variable. In order to avoid biases in the invariant probability measures sampled by discretizations of these stochastically perturbed Hamiltonian dynamics, a Metropolis rejection procedure can be considered. The so-obtained scheme belongs to the class of generalized Hybrid Monte Carlo algorithms. A special care should be taken into account in the rejection procedure to avoid biases. We will in particular explain generalizations of a procedure suggested by Goodman, Holmes-Cerfon and Zappa for Metropolis random walks on submanifolds, where a reverse projection check is performed to enforce the reversibility of the algorithm.

References:
[A] T. Lelièvre, M. Rousset and G. Stoltz, Langevin dynamics with constraints and computation of free energy differences, Mathematics of Computation, 81(280), 2071-2125, (2012).
[B] T. Lelièvre, M. Rousset and G. Stoltz, Hybrid Monte Carlo methods for sampling probability measures on submanifolds, Numerische Mathematik, 143(2), 379-421, (2019).
[C] T. Lelièvre, G. Stoltz and W. Zhang, Multiple projection MCMC algorithms on submanifolds, to appear in IMA Journal of Numerical Analysis.
Juan Cortés CNRS
A unified computational method with reinforcement learning for modeling conformational ensembles of complex protein architectures
Proteins can have very different architectures, generally involving a concatenation of relatively rigid domains and flexible regions. Indeed, most proteins in prokaryotes and eukaryotes are composed of several domains connected by linkers, and flexible tails are also frequently found at the termini of rigid domains. In addition, flexible loops connecting secondary structure elements within domains are omnipresent in proteins. All these types of flexible regions (linkers, tails and loops) play key functional roles, usually related to inter- or intramolecular interactions [1,2].

While the structure of rigid domains can be accurately determined using experimental methods or predictors such as AlphaFold2, the structural study of flexible regions remains a challenge. It requires computational methods for the construction of conformational ensemble models that are fitted or refined on the basis of experimental measurements. In recent years, we have developed several algorithms, based on fragment databases and robotics-inspired techniques, for the conformational sampling of flexible loops [3] and intrinsically disordered regions [4]. Building on this work, we present here a unified approach to sample conformations of proteins with complex architectures composed of rigid and flexible regions. Our approach integrates a multi-agent reinforcement learning technique to improve sampling performance while taking into account the specificities of each flexible/disoriented region of the protein.

References:

[1] I. Clerc, A. Sagar, A. Barducci, N. Sibille, P. Bernadó, J. Cortés. The diversity of molecular interactions involving intrinsically disordered proteins: A molecular modeling perspective. Computational and Structural Biotechnology Journal, 19:3817-3828, 2021.
[2] A. Barozet, P. Chacón, J. Cortés. Current approaches to flexible loop modeling. Current Research in Structural Biology, 3:187-191, 2021.
[3] A. Barozet, K. Molloy, M. Vaisset, T. Siméon, H. Minoux, J. Cortés. A reinforcement learning- based approach to enhance exhaustive protein loop sampling. Bioinformatics, 36(4):1099-1106, 2020.
[4] A. Estañna, N. Sibille, E. Delaforge, M. Vaisset, J. Cortés, P. Bernadó. Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database. Structure, 27(2):381-391, 2019.
I. Clerc and P. Bernadó
Frederic Cazals Inria
Sampling molecular conformations with seven-league boots: can we reconcile structure and thermodynamics?
In molecular modeling, the term sampling ambiguously refers to the generation of diverse conformational ensembles expected to be representative of meta-stable states typically seen in structures, and to that of thermodynamic ensembles from which macroscopic averages are obtained. As a matter of fact, the techniques used in these two contexts are quite different.

In this talk, I will review recent work aiming at reconciling both tiers, by developing algorithms delivering high quality diverse ensembles [1,2], but relying on core algorithmic techniques meant to sample the uniform measure in conformational space [3,4].

[1] Geometric constraints within tripeptides and the existence of tripeptide reconstructions
T. O’Donnell, and F. Cazals
https://www.biorxiv.org/content/10.1101/2022.06.21.497005v1

[2] Enhanced conformational exploration of protein loops using a global parameterization of the backbone geometry
T. O’Donnell, and F. Cazals
https://www.biorxiv.org/content/10.1101/2022.06.21.497022v1

[3] Efficient computation of the the volume of a polytope in high-dimensions using Piecewise Deterministic Markov Processes
A. Chevallier, and F. Cazals, and P. Fearnhead
AISTATS, 2022

[4] Improved polytope volume calculations based on Hamiltonian Monte Carlo with boundary reflections and sweet arithmetics
A. Chevallier, and S. Pion, and F. Cazals
J. of Computational Geometry, NA (NA), 2022
T. O’Donnell (Inria) and Augustin Chevallier (Univ. Strasbourg)

Participants

Elena Álvarez Sánchez Nantes Université, US2B, CNRS, UMR6286, F-44000 Nantes, France / Affilogic, SAS, Nantes, France
Marina AbakarovaLCQB
Samia Aci-SècheCNRS
Camille AlliotUniversité de paris cite
Diego AMAYAINRIA Nancy Grand Est
Jessica AndreaniCEA I2BC
Benjamin BardiauxCNRS-Institut Pasteur
Clément BernardLaboratoire IBISC
Vadim BERTRANDUGA
WARDA BOUTEGRABETInstitut Mondor de recherche biomédicale
Costas BouyioukosUniversité Paris Cité
Helene BretCEA / I2BC
Nicolas ButonData and Knowledge Management, Univ. Rennes, Inria, IRISA, Rennes, 35000, France
Alessandra CarboneSorbonne Université – LCQB
Mathilde CarpentierSorbonne Université
Frederic CazalsInria
Maud CHAN YAO CHONGQST institute
Augustin ChevallierUniversité Strasbourg
Nathalie Colloc’hISTCT UMR 6030 CNRS Université de Caen-Normandie
Juan CortésCNRS
François CosteInria
Wagner Alan Aparecido DA ROCHALaboratoire d’informatique de l’École polytechnique
Davy DARANKOUMLaboratoire Jean Kuntzmann
Rajkumar Darbarxxx
Isaure de BeauchêneCNRS
Sjoerd de VriesCNRS
Élise Duboué-DijonLaboratoire de biochimie théorique
Elodie DupratSorbonne Université (IMPMC)
Delphine FlattersUniversité Paris Cité
Tatiana GalochkinaDSIMB, UMRS 1134 – Inserm et Université Paris Cité
Aria GheeraertINSERM
Paraskevi GkekaSanofi R&D
Nohad GreshLCT UMR 7616 CNRS
Sergei GrudininLJK CNRS
Konrad HinsenCNRS – Centre de Biophysique Moléculaire
Samuel HolvecInstitut de Génétique et de Biologie Moléculaire et Cellulaire – illkirch
Jérôme HéninCNRS – IBPC
Diego InostrozaLaboratoire de Chimie Théorique
Laurent JacobCNRS
Slavica JONICCNRS & Sorbonne Université
Yulia KacherUniversite de Lorraine
Yasaman KaramiInstitut Pasteur
Florencia KleinLaboratoire de Biochimie Théorique-Institut de Biologie Physico-Chimique (IBPC)
Anna KravchenkoLORIA
Louis LagardèreSorbonne Université
Elodie LaineSorbonne Université
Olivier Languin-CattoënLaboratoire de Biochimie Théorique – IBPC
Benoist LAURENTCNRS
Quoc Khang LeParis-Saclay Unversity
Fabrice LeclercI2BC CNRS-CEA-Univ. Paris Saclay
Julie LedouxENS Paris-Saclay
Tony LELIEVREEcole des Ponts
Valentin LombardLCQB
Gianluca LombardiLaboratoire LCQB, Sorbonne Université
Irene MaffucciUniversité de technologie de Compiègne – UMR 7025
Ikram MahmoudiDoctorante
Bernard MaigretrLORIA – CNRS, INRIA, Université de Lorraine
Vincent MALLETInstitut Pasteur
Therese MalliavinLaboratoire de Physique et Chimie Theorique, Université de Lorraine et CNRS
R. Charbel MarounINSERM U1204
R. Charbel MarounUMR-S U1204 INSERM/U-Paris Saclay
Juliette MartinCNRS
Romain MenegauxINRIA
Dominique MIASLORiA
David MignonLaboratoire BIOC, Ecole Polytechnique
Yasser MOHSENI BEHBAHANISorbonne Université
Samuel MurailUniversité Paris Cité
Roberto NettiLCQB
Phuong NGUYENLaboratoire de Biochimie Théorique, UPR9080 CNRS, 13 rue Pierre et Marie Curie, 75005 Paris
Khanh-Chi NGUYEN-PHAMUniversité Grenoble Alpes
Khanh-Chi NGUYEN-PHAMUniversity Grenoble Alpes
Maxence NobleEcole Polytechnique
Loïc OmnesLaboratoire IBISC – université d’Evry
vaitea opuuMax Planck Institute for Mathematics in the Sciences, Leipzig, Germany
Nicolas PanelToulouse Biotechnology Institute
Marco PasiENS Paris-Saclay
samuela pasqualiUniversité Paris Cité
Thomas PigeonINRIA – IFPEN
Jean-Philip PIQUEMALSorbonne Université
Lélia PolitLCQB
Yann PontyCNRS/Ecole Polytechnique
Chantal PrévostLaboratoire de Biochimie Théorique – IBPC – CNRS
Ivan RevegukPh.D. student, BIOC, Ecole Polytechnique, Palaiseau
Ivan RevegukPhD Student in BIOC, Ecole Polytechnique
Charles ROBERTCNRS Laboratorie de Biochimie Théorique, IBPC Paris 5ème
Klypa RomanEcole Polytechnique
Gaspar RoyInstitut de Recherche pour le Développement
Seyedehafra SabeiLaboratoire de Biochimie Théorique – IBPC – CNRS (future doctorante)
Edoardo SartiInria
Vittore ScolariInstitut Curie
Thomas SimonsonCNRS/Ecole Polytechnique
Gabriel StoltzEcole des Ponts
Dirk StratmannSorbonne Université
Fariza TAHIIBISC. Univ Evry, Université Paris-Saclay
Antoine TalyCNRS-Laboratoire de Biochimie Théorique
Antoine TalyCNRS- Laboratoire de Biochimie Théorique
Mounir TarekCNRS- Université de Lorraine