Journée Graphes et Bioinfos 2023

Quand : 8 Novembre 2023

Où : Sorbonne Universités, campus Jussieu – Salle 211 (2ème étage), couloir 55-65

Organisateurs : Cédric Bentz (CEDRIC, CNAM), Bruno Escoffier (LIP6, Sorbonne Universités) et Yann Ponty (LIX, Polytechnique)

Inscription : Closes pour limite de capacité – Lien visio prochainement disponible ici

Cette journée est consacrée à l’utilisation des graphes comme modélisation/abstraction des problèmes en Bioinformatique, et aux algorithmes qui permettent leur résolution. L’évènement, à l’initiative des GdR ROD (Axe CAGDO) et GdR BIM (GT MASIM) vise à renforcer les liens entre les communautés « Algo Discrète/Recherche Opérationnelle » et Bioinformatique dans une acception transverse.

Des bioinformaticiens y évoqueront des problèmes algorithmiques (résolus ou ouverts) naturels dans leur domaine de recherche, de façon formalisés. Des « algorithmiciens purs » viendront les rejoindre, dans le but ultime de susciter des collaborations nouvelles sur des sujets de pointe en algorithmique discrète, ou bénéficiant potentiellement d’un apport de techniques « modernes » (complexité paramétrée, algo faiblement exponentiels …). En complément d’exposés « appliqués », le programme comprendra un tutoriel consacré à la mise en oeuvre concrète des techniques de complexité paramétrée en bioinformatique.

Accès

La journée sera hébergée par le LIP6, au coeur du campus Jussieu de Sorbonne Université. Il est facilement accessible via les lignes de métro 6 et 10 (arrêt Jussieu). Les exposés auront lieu dans la salle 211, située au deuxième étage dans le couloir 55/65 (situé entre les tours 55 et 65).

Programme préliminaire

Mercredi 8 novembre
9:30-10:15 Accueil (café)
10:15-10:30 Cédric Bentz – Yann Ponty  
Accueil et introduction des journées (slides)
10:30-11:15 Laurent Bulteau LIGM
Parameterized Complexity in Bioinformatics (abstract) (slides)
11h15-12:00 Karel Brinda INRIA/IRISA
K-mers and their graphs in computational biology (abstract) (slides)
12:00-13:45 Lunch break
13:45-14:30 Alain Denise LISN
Search and analysis of recurrent patterns in 3D RNA structures and graph problems (abstract) (slides)
14:30-15:15 Geraldine Jean LS2N
Genome rearrangements on both gene order and intergenic regions (abstract) (slides)
15:15-15:35 Pause café
15:35-16:20 Arnaud Mary LBBE
Enumeration algorithms in bioinformatics (abstract) (slides)
16:20-16:30 Cédric Bentz – Yann Ponty
Conclusions

Abstracts

Laurent Bulteau LIGM
Parameterized Complexity in Bioinformatics
Many natural problems admit graph-based modelisations, but the search for exact algorithms often stumbles on NP-hardness. This excludes fully general solutions, but real-world instances often satisfy additional specificities, quantified as *parameters*, allowing for efficient specialized solutions. We will introduce several well-established techniques for parameterized algorithmics, with recent applications e.g. to RNA bioinformatics.
Karel Brinda INRIA/IRISA
K-mers and their graphs in computational biology
In recent decades, the exponential growth of DNA sequencing data has outpaced computational capacities, resulting in a rapidly decreasing proportion of searchable genomic data. To tackle this challenge, the bioinformatics community has largely shifted from traditional alignment-based methods to techniques based on so-called k-mers, which are fixed-length substrings of the original genomes. In modern bioinformatics applications, k-mers serve as elementary data analysis tokens, offering a relatively accurate representation of the original genomes and their collections, as well as facilitating their rapid comparisons, often in conjunction with sketching. Under the hood of these methods, k-mers are frequently organized in various types of graphs such as Bruijn graphs, overlap graphs, and evolutionary graphs. In this talk, we will explore how algorithms applied to k-mer graphs have recently led to efficient data representations such as simplitigs, matchings, and masked superstrings, as well as to novel compression and search approaches such as phylogenetic compression. We will also discuss related algorithmic challenges and underlying theoretical questions.
Alain Denise LISN
Search and analysis of recurrent patterns in 3D RNA structures and graph problems
Résumé : Les molécules d’ARN se replient dans l’espace en prenant des conformations tridimensionnelles particulières dues aux interactions entre les nucléotides qui les composent. La difficulté de prédire la structure (ou les structures) qu’adoptera une molécule d’ARN est due à plusieurs causes, l’une d’entre elles étant le fait que cette structure dépend beaucoup de la présence de ce qu’on appelle des motifs 3D d’ARN : des motifs composés de nucléotides liés par des interactions à faible énergie, motifs qui se retrouvent à l’identique ou presque dans un grand nombre de structures différentes. Savoir caractériser ces motifs, et éventuellement prédire leur apparition au cours du repliement, serait un atout pour mieux comprendre et donc peut-être mieux prédire la structuration de l’ARN. La représentation des molécules d’ARN par des graphes (et donc des motifs 3D par des sous-graphes) permet d’avancer dans cette problématique, et en même temps elle soulève des problèmes en termes de comparaison et de classification de graphes. Je présenterai quelques études récentes (et d’autres moins récentes) sur ce sujet.

Abstract: RNA molecules fold in space, taking on particular three-dimensional conformations due to interactions between the nucleotides of which they are composed. The difficulty of predicting the structure (or structures) that an RNA molecule will adopt is due to several causes, one of which is the fact that this structure depends very much on the presence of what are known as RNA 3D motifs: motifs composed of nucleotides linked by low-energy interactions, motifs that are found identically or almost identically in a large number of different structures. Knowing how to characterize these motifs, and possibly predict their appearance during folding, would be an asset for better understanding and therefore perhaps better predicting how RNA folds. The representation of RNA molecules by graphs (and therefore of 3D motifs by sub-graphs) enables us to make progress in this area, and at the same time raises problems in terms of graph comparison and classification. I will present some recent (and not so recent) studies on this subject.
Geraldine Jean LS2N
Genome rearrangements on both gene order and intergenic regions
During evolution, genomes undergo large-scale mutations known as genomic rearrangements. Evolutionary distance between two genomes can be estimated by computing a minimum length sequence of rearrangements required to transform one genome into another. Typically, to calculate such a distance, a genome is represented by an ordered sequence of genes, and the rearrangements are simulated through mathematical operations on this model. There are many variants according to the type of genomes (e.g linear or circular, signed or unsigned) and the type of rearrangements (e.g. reversals, transpositions, DCJ…) we consider. Since the mid-nineties, a very large amount of work has been done concerning the algorithmic issues of computing distances between pairs of genomes. However, the original combinatorial works on genome rearrangements do not take into account the whole biological information available. In particular, taking into account the intergenic regions between consecutive genes may improve the distance estimate from an evolutionary point of view. In this talk, I will present the genome rearrangement model that modifies both the gene order and the intergene size distribution of the genome (introduced by Fertin et al. in 2017), some interesting algorithmic results and will discuss some open problems.
Arnaud Mary LBBE
Enumeration algorithms in bioinformatics
Enumeration problems, in contrast to classical algorithmic problems that typically involve determining the existence of a solution or finding a single solution, focus on the task of identifying all solutions to a given problem. This presentation will present some examples of enumeration problems on graphs applied to bioinformatics, covering different topics, from de novo variant detection in sequencing data to the analysis of metabolic pathways. We will explore the current limitations of some of these strategies and present open problems related to them.

Participants

Camille Alliot
Manuel AmoussouLIP6Virginia Ardévol MartínezLAMSADESandrine ArosInstitut PasteurThomas BaudeauUniversité de Lille, CRIStAL, BonsaiAlexis BaudinLip6Cédric BentzCNAM & Laboratoire CEDRICRonan BocquillonUniversité de ToursThéo BouryLaboratoire d’Informatique de l’École PolytechniqueFlorent CabretUniversité de ToursBryan DafnietWhiteLab GenomicsOscar De FeliceWHITELAB GENOMICSAlain DeniseLISN et I2BC – Université Paris-SaclayMarine Djaffardjylisn – équipe bioinfoYoann DufresneInstitut PasteurBruno EscoffierLIP6, Sorbonne UniversitéPaul EtheimerMines PSL / CurieColine GianfrottaUniversité de Versailles Saint-Quentin-en-Yvelines, Université Paris SaclayEgor GladkikhTopplanLuís Felipe Ignácio CunhaFluminense Federal UniversityFlorian IngelsLaboratoire CRIStAL, équipe BonsaïCamille JuignéINRIAJulia KendeInstitut PasteurDavid LagorceINSERMDamien Lamydamien.lamy@emse.frPierre Lechatinstitut PasteurSalma MakboulLIST3N UTTLilian MarchandUNIVERSITÉ DE LILLECamille MarchetCNRSChahdil MarouaUS14Igor MartayanUniv LilleArnaud MengInstitut PasteurMaxence MorinGREYC UNICAEN CNRS UMR 6072 – équipe SAFERene NatowiczUGE ESIEESouhil Omari
Adeline PierrotLISN – Université Paris SaclayLéo PlancheParis Saclay LISNYann PontyCNRS/Ecole PolytechniqueChantal PrévostLBT – CNRSJérémy RousseauMuséum National d’Histoire NaturelleTimothé RouzéCRIStAL – BONSAIAfra SabeiPhdYoshihiro ShibuyaInstitut PasteurFlorian SikoraLAMSADE, Université Paris DauphineJonic SlavicaCNRS – Sorbonne UniversitéGuy SomekhGroupe DynnovationsLéa VandammeLaboratoire CRIStALSofia Vazquez AlferezUniversité Paris Dauphine