Congratulations to our CSI PhD student Tham Cheng Yong from Dr Touati Benoukraf’s lab!
Cheng Yong’s abstract titled ‘Characterizing Structural Variants in Acute Myeloid Leukemia using Long and Short Read Sequencing’ was selected for presentation at the 12th Great Lakes Bioinformatics Conference (GLBIO). He will be heading to the University of Illinois at Chicago, USA in May 2017.
The conference is organized by the Great Lakes Bioinformatics Consortium and an official conference of the International Society for Computational Biology. A vital goal of the two-day GLBIO conference is to forge long-term collaborations and networking possibilities within the research area of computational approaches to biology.
Characterizing Structural Variants in Acute Myeloid Leukemia using Long and Short Read Sequencing
Cheng Yong THAM1, Touati BENOUKRAF1
1 Cancer Science Institute of Singapore, National University of Singapore, Singapore
Current Next-Generation Sequencing (NGS) techniques face multiple challenges to accurately detect genomic structural variants (SVs) due to their short sequencing read lengths. Here, we present recent findings on SV detection using the combination of long-read sequencing technology from Oxford Nanopore Technologies (ONT) and short-read whole genome sequencing (WGS) from Illumina NGS in primary acute myeloid leukemia (AML).
Long read sequences produced by ONT MinION were mapped to GRCh38 reference genome using BWA-MEM to identify chimeric reads for selection. The alignment of these chimeric reads were refined by local BLASTn against GRCh38 reference genome and masked of telomeric, centromeric and constitutive heterochromatin regions. Next, the chimeric reads were filtered based on their alignment result using a python script to remove most false positives (E.g. Reads that contain repetitive elements that align to multiple genomic locations). Finally, we incorporate the short reads of Illumina WGS by mapping them to the filtered group of chimeric reads using Bowtie2 to validate the SVs in those chimeric reads.
Preliminary results from the ONT MinION sequencing on a primary AML detected several novel SVs and one of them described a retrotransposon LINE 1 (Long interspersed nuclear element 1) insertion phenomenon within the gene NRCAM (Neuronal cell adhesion molecule) in chromosome 7q31.1. This event was captured on a single 6,803 nucleotides long read where positions 3 to 2,961 is aligned to NRCAM intron 20 region and 2,972 to 6,783 is aligned to LINE (L1HS, L1PA2 and L1PA3). When short reads from Illumina WGS were mapped to this long read sequence, the breakpoint between NRCAM and the LINE was captured with good coverage which validated the chimeric read sequence and provided greater confidence of the SV.
These results suggest a great potential in using long and short read sequencing technologies in conjunction for validated SVs characterization, especially for large SVs that contain repetitive or mobile elements. To obtain a comprehensive characterization of SVs in the AML sample, ONT MinION sequencing output is currently being scaled up to produce more long reads for greater SV capturing coverage.