Automated Telomere-To-Telomere crop genome assembly using only Oxford Nanopore sequencing presented at international conferences

March 21, 2023

Bandage graph from tomato Heinz 1706 Verkko assembly showing the twelve T2T chromosomes

Using Oxford Nanopore stereo duplex sequencing, KeyGene together with the Telomere-to-Telomere consortium, enabled the construction of unprecedented quality reference genomes of tomato and maize. By generating data on a single platform capable of producing both highly accurate and ultra-long sequence reads, access to high-quality crop genome assemblies is democratized, providing the wider research community with greater accessibility.

Results of this collaboration will be presented by Sergey Koren at the AGBT AG meeting at the end of March in Texas, USA, and the Oxford Nanopore Technologies London Calling conference in May.

Alexander Wittenberg Genomics Scientist at KeyGene, will present this work at an upcoming Oxford Nanopore Knowledge Exchange webinar on April 20, 2023.

In May 2012, an international consortium published the tomato genome of accession Heinz 1706 in the journal Nature, after nearly a decade of work. Since then, the genome has undergone five updates (SL5, latest version), with each version more complete and accurate than the previous one.

Limitations in T2T assemblies

Recently, the combination of ultra-long Oxford Nanopore sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of the human genome and has spurred similar efforts for other species. However, the current recipe for “telomere-to-telomere” genome assembly relies on sequencing data from two different instruments, limiting its adoption.

To test the ability of a single platform to produce complete genome assemblies, we generated Oxford Nanopore sequencing data for two important crop species, Zea mays B73 and Solanum lycopersicum Heinz 1706. For both species KeyGene generated Oxford Nanopore duplex sequencing reads, where each molecule is read twice, once from each strand, to obtain very high quality reads; and simplex sequencing reads, where molecules are read from a single strand, with a focus on read-length over quality. In addition, for tomato, ONT ultra-long (>100Kb) data was used to close the last remaining gaps in the assembly. HiFi datasets were used for validation of the telomere to telomere assembly.

Analyzing KeyGene’s ONT data only

Together with scientists from the National Human Genome Research Institute, Chinese Academy of Agricultural Sciences, Johns Hopkins University and Cold Spring Harbor Laboratory the data was analyzed. We compared sequencing coverage and assemblies between HiFi data and ONT duplex/simplex sequencing reads across hard-to-sequence regions to better characterize sequence context dependencies in both technologies. Finally, we combined the Solanum lycopersicum Heinz 1706 data using the recently published Verkko genome assembler followed by manual analysis, resulting in a complete, telomere-to-telomere genome assembly with only 1 gap remaining, corresponding to the rDNA.

Duplex data quality is similar to PacBio HiFi, although read lengths are tens of kilobases longer. This allows for a very high-quality initial assembly graph to be constructed from the duplex data, which is then further resolved using the ultra-long simplex reads. The final assembly has a base-accuracy exceeding 99.999% (Q50).

One sequencing technology

We conclude that Oxford Nanopore duplex sequencing reads are a viable substitute for PacBio HiFi reads, and, in combination with simplex sequencing, have the potential to provide a single-instrument solution for complete genome assembly. These findings have important implications for the field of genomics, as they demonstrate that complete genome assembly can be achieved using a single sequencing platform.

Are you interested in telomere-to-telomere crop genomes, please contact us by mail.

Oxford Nanopore Knowledge Exchange webinar

Registration for the Oxford Nanopore Knowledge Exchange webinar on April 20, 2023 is open.