Distinguished Speaker Series: De novo transcriptome reconstruction from long reads
Abstract: Long-read sequencing of transcripts with PacBio Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. A second bottleneck is to be able to distinguish sequences errors from true variation with these families. I will present two recent methods to address these challenges. The first is IsoCon (Sahlin et al, 2018, Nat Comm), a method to determine the full-length transcripts of multicopy gene families at nucleotide-level precision, from PacBio data. I will show how IsoCon was applied to Y chromosome ampliconic gene families, each of which contains many nearly identical gene copies. The second is isONclust (Sahlin & Medvedev, RECOMB 2019), a clustering algorithm that can assign Nanopore reads to their gene family of origin.
Host: Prof. Ron Shamir, Computer Science School, TAU.