Complex Network Mining
Active since 2015
Network data modeling has emerged in various disciplines as a unified way of representing complex relational data. Formally, these complex networks (which we call multidimensional networks) are multigraphs for which nodes and edges are (multi-)labeled. The core of our research activity focuses on analyzing these complex networks for information extraction purposes (Pasquier 2018).
Prediction of microRNA-disease associations
A microRNA (miRNA) is a small RNA molecule that, by its ability to regulate gene expression, plays a critical role in many physiological processes. Since its discovery, a great deal of information has been gained about its involvement in disease development and drug resistance. However, there is still much to be done to gain a full understanding of the miRNA world. A challenge for miRNA research is establishing a clear relationship between miRNA dysregulation, target dysregulation and ultimate biological impact. Computational methods can make an important contribution to this goal.
To this end, we have been working on a new method to predict associations between miRNAs and diseases. Its principle involves representing miRNAs and their links with elements highlighting various facets of these molecules (targeted genes, neighboring miRNAs, terms associated with them in scientific articles) in the form of a multidimensional network and then projecting this network into a vector space in order to use metrics within this space to predict MiRNA-disease associations. The performance of our algorithm, MiRAI, was characteristic of an excellent classifier and corresponded to the state of the art in the field (Pasquier and Gardès 2016). Subsequently, we proposed an improvement by using a parallel surrogate-assisted evolutionary algorithm to automatically find an optimal configuration of our predictive method (Pallez et al. 2017).
Study of triplex topology
It has been known since the 1960s that some short RNA sequences are likely to match particular areas of DNA to form triple-stranded structures called triplex DNA. We have undertaken an in silico study to locate, quantify and analyze triplex DNA on a genome in order to increase our knowledge of these structures. Our analyses, which identified many potential triplex sites within the genes, strongly suggest that some RNA fragments, coding or not, could have a significant influence on many chromosomal loci for large-scale genetic or epigenetic controls. This study paves the way for a new possible pathway for genetic regulation through RNA fragments (Pasquier et al. 2017).
Network of genetic interactions via lncRNA:DNA triplex formation highlighting 5 sub-networks corresponding to distinct processes. More details can be found in our article on triplex analysis (Pasquier et al. 2017).
Computational analysis of double-stranded RNA
RNA interference (RNAi) refers to a conserved post-transcriptional mechanism for the degradation of RNA by short double-stranded RNAs (dsRNAs). A genome-wide analysis of mRNAs that are complementary to RNAs was performed through computational searches in the Drosophila model. We report segments originating from pre-mRNAs introns and exons as well as lncRNAs as potential sources of siRNAs. The computationally predicted interactions have been modeled as a network in which we have noted that the central genes (those potentially most regulated by RNA interference) are strongly involved in the processes of development, morphogenesis and neurogenesis. The distribution of the genes for which transcripts are engaged in intermolecular segmental pairing is largely lacking in the gene collections defined as showing no expression in each individual developmental stage from early embryos to adulthood. This trend was also observed for the genes showing very low expression from the 8-12-hour embryonic to larval stage 2. These results suggest a genome-wide scale of mRNA homeostasis via RNAi metabolism and could extend the known roles of canonical miRNAs and hairpin RNAs (Pasquier et al. 2020).
Network of RNA-RNA interactions in which central genes are involved in development, morphogenesis and neurogenesis processes.
Active module identification
The identification of condition specific gene sets from transcriptomic experiments has important biological applications, ranging from the discovery of altered pathways between different phenotypes to the selection of disease-related biomarkers. Statistical approaches using only gene expression data are based on an overly simplistic assumption that the genes with the most altered expressions are the most important in the process under study. However, a phenotype is rarely a direct consequence of the activity of a single gene, but rather reflects the interplay of several genes to perform certain molecular processes. We are working on different approaches to analyze gene activity in the light of our knowledge about their molecular interactions. These include a population-based meta-heuristics based on new crossover and mutation operators (Correa et al. 2019) as well as methods based on network embedding.
Sentiment analysis and multi-domain transfer
Sentiment analysis consists of automatically determining the polarity (positive, negative or neutral) of documents. In this field of research, we particularly study how different polarities, depending on the domain, can be learned for the same concept. The approach we are developing consists in combining a multidimensional graph representing the semantics of terms with a method of propagation of polarities using fuzzy logic. Our method shows improved performance over the state of the art, good cross-domain generalization capabilities, and an excellent coverage (Pasquier and Robichon 2020).
|Program||ARC fundation grant|
|Grant name||Role of electrical remodeling of pancreatic adenocarcinoma epithelial cells in response to the micro environment|
|Project coordinator||Olivier Soriani|
|Funder||Université Côte d'Azur|
|Grant name||Multi-objective evolutionary algorithms for the identification of master regulators in pancreatic cancer|
|Grant recipient||Leandro Corrêa|
|Project coordinator||Claude Pasquier|
- MIRAI: Prédiction of miRNA-disease associations
Publications related to this theme
Correa, L., Pallez, D., Tichit, L., Soriani, O., and Pasquier, C. (2019), “Population-based meta-heuristic for active modules identification,” in Proceedings of the tenth international conference on computational systems-biology and bioinformatics, New York, NY, USA: ACM, pp. 1–8. https://doi.org/10.1145/3365953.3365957.
Pallez, D., Gardès, J., and Pasquier, C. (2017), “Prediction of miRNA-disease Associations using an Evolutionary Tuned Latent Semantic Analysis.” Scientific reports, Nature Publishing Group, 7, 10548. https://doi.org/10.1038/s41598-017-10065-y.
Pasquier, C. (2018), “Contributions à la fouille de données complexes,” Habilitation Thesis, Université Côte d’Azur.
Pasquier, C., Agnel, S., and Robichon, A. (2017), “The Mapping of Predicted Triplex DNA:RNA in the Drosophila Genome Reveals a Prominent Location in Development- and Morphogenesis-Related Genes,” G3: Genes, Genomes, Genetics, Genetics Society of America, 7, 2295–2304. https://doi.org/10.1534/g3.117.042911.
Pasquier, C., Agnel, S., and Robichon, A. (2020), “Transcriptome-wide-scale-predicted dsRNAs potentially involved in RNA homoeostasis are remarkably excluded from genes with no/very low expression in all developmental stages,” RNA Biology, 17, 554–570. https://doi.org/10.1080/15476286.2020.1717154.
Pasquier, C., and Gardès, J. (2016), “Prediction of miRNA-disease associations with a vector space model.” Scientific reports, Nature Publishing Group, 6, 27036. https://doi.org/10.1038/srep27036.
Pasquier, C., and Robichon, A. (2020), “Computational search of hybrid human/ SARS-CoV-2dsRNA reveals unique viral sequences that divergefrom other coronavirus strains,” bioRxiv, 1–15. https://doi.org/10.1101/2020.04.08.031856.