Uneven sequencing (coverage) depth can bias microbial intraspecies diversity estimates and how to account for it
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Oxford University Press
Abstract
An unbiased and accurate estimation of intraspecies diversity, i.e. the extent of genetic diversity within species (or microdiversity), is crucial for clinical and environmental microbiome studies. Although it is well appreciated that sequencing depth (or coverage depth) below 10X can provide biased estimates of microdiversity, typically underestimating diversity due to the random sampling of alleles, there is a widely accepted convention that microdiversity estimates tend to be relatively stable at sequencing depth exceeding 10X. Therefore, discarding species with <10X or rarefying to 10-20X sequencing depth are generally used to compare microdiversity among taxa and samples. Our findings showed that these biases may persist even at depth levels above 50-200X for all popular sequencing platforms, including Illumina, PacBio, and Oxford Nanopore. The biases mostly, but not always, represent an underestimation of diversity and were attributable to the incomplete recovery of Single Nucleotide Variants (SNVs) at lower sequencing depth levels. To address this issue, we recommend using rarefaction-based approaches to standardize data at least 50X, and ideally at 200X sequencing depth, which reduces differences between observed and expected microdiversity values to <0.5%. Furthermore, the Average Nucleotide Identity of reads (ANIr) metric is significantly less sensitive to sequencing depth variability than nucleotide diversity (π), making it a robust alternative for estimating microdiversity at sequencing depth close or exceeding 10X, without a need to rarefying data. Therefore, the sequencing depth thresholds proposed herein provide a more standardized framework for direct comparisons of microdiversity across samples and studies.
Description
DATA AVAILABILITY : The datasets analyzed during the current study are available in the European Nucleotide Archive (ENA) repository, at https://www.ebi.ac.uk/ena/browser/home under BioProject accession numbers PRJEB75750, PRJEB52999, and PRJNA763692. Custom R code developed in this study for the estimation of average error (%) is available at https://github.com/ebustos128/Uneven-sequencing-can-bias-estimates-of-microbial-intraspecies-diversity.
Keywords
Sequencing depth, Nucleotide diversity, ANIr, Metagenomics, Bias
Sustainable Development Goals
SDG-15: Life on land
Citation
Bustos-Caparros, E., Viver, T., Gago, J.F. et al. 2025, 'Uneven sequencing (coverage) depth can bias microbial intraspecies diversity estimates and how to account for it', ISME Communications, vol. 5, no. 1, art. ycaf228, pp. 1-11. https://doi.org/10.1093/ismeco/ycaf228.
