Distinguished Speaker Series of Faculty of Exact Sciences

Distinguished Speaker Series: De novo transcriptome reconstruction from long reads

Prof. Paul Medvedev, Associate Professor in the Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology, Director of the Center for Computational Biology and Bioinformatics, Pennsylvania State University

26 June 2019, 12:00

Schreiber 006, Computer Science, TAU

Abstract: Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. A second bottleneck is to be able to distinguish sequences errors from true variation with these families. I will present two recent methods to address these challenges. The first is IsoCon (Sahlin et al, 2018, Nat Comm), a method to determine the full-length transcripts of multicopy gene families at nucleotide-level precision, from PacBio data. I will show how IsoCon was applied to Y chromosome ampliconic gene families, each of which contains many nearly identical gene copies. The second is isONclust (Sahlin & Medvedev, RECOMB 2019), a clustering algorithm that can assign Nanopore reads to their gene family of origin.

Host: Prof. Ron Shamir, Computer Science School, TAU.