講演情報
[BO-1]Nanoporeデータを用いたターゲット局所アセンブリによって反復配列の詳細な配列解析を行う
○池本 滉, 藤本 明洋 (東京大学 医学系研究科 国際保健学専攻 人類遺伝学分野)
Long reads can compensate for shortcomings of short reads. However, handling long reads is difficult for reasons of substantial error rates and dependence on alignment accuracy of reads. These hurdles attributed to both sequencing and bioinformatical processes can potentially be overcome by developing a new analysis tool. Here, we developed an analysis method for long reads, LoMA. LoMA overlaps and aligns a bunch of reads from a region of interest and constructs an error-corrected consensus sequence. It detects partial haplotype structures and produces two different sequences on the basis of structural variants in a target region. Error rates dropped from 8.7% to 0.6%, and heterozygous loci based on LoMA classification were experimentally validated, which suggested sufficient accuracy of LoMA.
To identify true structures of human genomes, we analyzed insertions using Nanopore sequencing data of NA18943 and NA19240. We first defined target regions based on indels and clips in reads and then detected 5,516 and 6,542 insertions (≧100 bp) in NA18943 and NA19240. In these insertions, tandem repeats and transposable elements accounted for approximately 83-84%. Additionally, our analysis identified various types of insertions; dispersed duplications, processed pseudogenes, and alternative sequences. In-depth analysis showed repeat element bias of tandem repeats and enrichment of short tandem duplications in transposons and genes. This study indicates that LoMA is applicable to human genetics studies and can reveal repetitive and complexly structured regions in the human genome.
To identify true structures of human genomes, we analyzed insertions using Nanopore sequencing data of NA18943 and NA19240. We first defined target regions based on indels and clips in reads and then detected 5,516 and 6,542 insertions (≧100 bp) in NA18943 and NA19240. In these insertions, tandem repeats and transposable elements accounted for approximately 83-84%. Additionally, our analysis identified various types of insertions; dispersed duplications, processed pseudogenes, and alternative sequences. In-depth analysis showed repeat element bias of tandem repeats and enrichment of short tandem duplications in transposons and genes. This study indicates that LoMA is applicable to human genetics studies and can reveal repetitive and complexly structured regions in the human genome.