Robert Finn (EMBL-EBI, UK)
Title: Understanding community composition of different microbiomes using resolved metagenomics
The field of metagenomics has expanded rapidly over the past decade, both in terms of the number and diversity of different datasets. Furthermore, with application of either deep sequenced short-read or long-read sequencing the depth of the biological insight that can be gleaned from the sequence data has changed dramatically. One area of significant growth has been the assembly of datasets, and the subsequent elucidation of genomes, so called metagenome assembled genomes (MAG). In this presentation I will describe out latest efforts in understanding the microbial biodiversity found in different environmental samples. Having these large contiguous units also allows deeper insights into the metabolic functions encoded, and which microbes are producing them. For example, biosynthetic gene clusters (BGCs) encode the genes necessary for natural products such as antimicrobials and signalling molecules that can play major roles in various ecological processes. Many of these natural products have been exploited for industrial biotechnology or pharmaceutical applications. Thus, accurate identification of BGCs in (meta)genomic data is key to unveiling ecological dynamics and/or the discovery of new commercially important products. Thus, I will also present a new machine learning based tool for BGC-detection in either genomic or metagenomic assemblies, called SanntiS. Compared to other tools, our benchmarks show that our tool outperforms in the ability to detect BGCs across different classes, and notably retains precision in metagenomic datasets. Application to our metagenomic assemblies has revealed millions of potential BGCs, many of which are likely to give rise to new natural products.
Dr. Rob Finn heads the Genome Assembly and Annotation Section at is the lead of the Microbiome Informatics team at EMBL’s European Bioinformatics Institute (EMBL-EBI). This team produces MGnify, a world leading resource for the functional and taxonomic analysis and archiving of microbiome derived sequence data. In addition to making large numbers of datasets available that have been processed in a systematic way, the resource allows scientists to upload their own data, either privately or publicly, and assemble and analyse their data. The MGnify resource contains one of the largest public collections of assembled metagenomes, which have been used to derive billions of proteins. In the past year, MGnify has also started to produce biome-specific catalogues of metagenome assembled genomes, derived from the aforementioned assemblies and community contributions. Collectively, these are providing new insights into the microbial diversity found in a range of environments, or associated with different hosts, such as humans. Previously, Dr Rob Finn led a range of different data resources at EMBL-EBI, namely InterPro, Pfam, Rfam and RNAcentral, which all build on his background of using probabilistic models for biological sequence analysis and genomic annotation. Rob joined EMBL-EBI from the Janelia Research Campus in the US, where he led a group that designed fast, web-based, interactive protein-sequence searches and annotations. Between 2001 and 2010, he was the project leader for Pfam at the Wellcome Trust Sanger Institute in the UK. Rob’s academic background is in microbiology and he holds a PhD in biochemistry from Imperial College, London.
João Meidanis (IC-UNICAMP, Brazil)
Title: Distinguishing tumor types using mass spectra in pediatric brain tissue
João Meidanis completed his PhD in Computer Sciences from the University of Wisconsin-Madison in 1992. He has been a faculty member with the University of Campinas since 1986. He received the Science and Technology Medal from the State of São Paulo in 2000 for his achievements in several Brazilian genome projects. He was one of the founders of Brazilian bioinformatics company Scylla. His interests include computational biology, algorithms, and graph theory.
Martin Morgan (Fred Hutchinson Cancer Research Center, USA)
Title: Bioconductor Symposium
Miguel Rocha (Universidade do Minho, Portugal)
Title: A sweet tale on deep learning applications to focused molecule generation
In this talk, I will describe some recent work from our group on the development of deep learning approaches towards predicting the properties and activity of compounds and proteins, based on different representations and model classes from traditional machine learning and deep learning. I will also address the development of some tools and applications of deep generative models, to create novel compounds with desired activities, and how we use multi-objective Evolutionary Computation to guide the search of these compounds towards different aims. A case study on computationally designing novel sweeteners will be used to illustrate the approach.
Miguel Rocha is an Associate Professor at the University of Minho, where he teaches subjects related to Artificial Intelligence/ Machine Learning, Bioinformatics and other Computer Science topics at BSc, MSc and PhD degrees, being also the Director of the MSc in Bioinformatics degree since 2007. He is also a senior researcher at Centre of Biological Engineering, where leads the Bioinformatics and Systems Biology research team , an interdisciplinary group with around 20 researchers. He has been the PI of national and international research projects, the author of over 230 publications and 3 books, supervised 16 PhD and 80+ MSc students, coordinated several open-source software projects, maintains relevant international collaborations with researchers from EMBL/ EBI, UCL, Heidelberg Univ., Argonne NL, Leiden UMC, UFSC (Brazil), U. Cambridge, among others. He is also the founder of the spin-off companies SIlicoLife and OmniumAI. Over the last years, he has focused his research on the exploration of two distinct, but complementary topics: (i) Systems Biology, where different modeling paradigms have been used to optimize biological processes based on optimization approaches from the fields of Evolutionary Computation, mainly metabolic systems, with relevant applications in the design of strain optimization methods for in silico Metabolic Engineering, but also recently in cancer research; (ii) Machine and deep learning, with the development of algorithms and computational tools to handle different types of input data (including omics data, literature, compounds, and protein sequences) and being applied to distinct biomedical applications; these include more recently deep generative models applied to the generation of compounds with activities of interest. The main aim of the research group for the forthcoming years is to integrate these two approaches to address relevant biomedical problems.
Helder Takashi Imoto Nakaya (Hospital Israelita Albert Einstein, Brazil)
Network science is an emerging field of research that analyzes complex networks of biological data (and people). In this seminar I will show how this approach can be used for projects related to human health.
Dr Nakaya is an associate professor at the University of Sao Paulo, in the Department of Clinical Analyses and Toxicology, School of Pharmaceutical Sciences. He has a PhD in Molecular Biology with extensive training in Bioinformatics. He is an expert in Systems Vaccinology, an interdisciplinary field that combines systems-wide measurements, networks, and predictive modeling in the context of vaccines and infectious disease. Dr. Nakaya has developed systems biology approaches to understand and predict the mechanisms of vaccine induced-immunity for Yellow Fever, seasonal Influenza, Meningococcal, and Tularemia vaccines. His lab is focused on investigating the basis of infectious diseases using computational systems biology. Additionally, Dr. Nakaya is an adjunct professor at Emory University School of Medicine in the Department of Pathology.
Sameer Velankar (EMBL-EBI, UK)
As a founding member of the Worldwide Protein Data Bank (wwPDB), the Protein Data Bank in Europe (PDBe) manages the worldwide biomacromolecular structure archive, the Protein Data Bank (PDB). The wwPDB partners accept and annotate worldwide depositions of biomacromolecular structures determined using X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy, 3D Electron Microscopy (EM) and other structure-determination methods. PDBe is a founding member of EMDataBank, which manages the deposition and annotation of EM data in EMDB. PDBe aims to ensure that this important resource truly serves the needs of the biomedical community. The PDBe team, led by Dr Sameer Velankar, works to improve the web interface so that users can make the most of existing tools and services. The team designs new tools as needed in order to make structural data available and accessible to? all. In the context of the SIFTS project, the team integrates structural data with other biological data to facilitate discovery. These integrated data form the basis for many query interfaces that allow biomacromolecular structure data to be presented in its biological context. Specific focus areas are data integrity, data quality, integration and data dissemination to the non-expert biomedical community.