Exploring neighborhoods in large metagenome assembly graphs reveals hidden sequence diversity
Published at
Genome Biology
2020
Abstract
Genomes computationally inferred from large metagenomic data sets are often
incomplete and may be missing functionally important content and strain
variation. We introduce an information retrieval system for large metagenomic
data sets that exploits the sparsity of DNA assembly graphs to efficiently
extract subgraphs surrounding an inferred genome. We apply this system to
recover missing content from genome bins and show that substantial genomic
sequence variation is present in a real metagenome. Our software implementation
is available at
github.com/spacegraphcats
spacegraphcats under the 3-Clause BSD License