Uneven sequencing (coverage) depth can bias microbial intraspecies diversity estimates and how to account for it

dc.contributor.authorBustos-Caparros, Esteban
dc.contributor.authorViver, Tomeu
dc.contributor.authorGago, Juan F.
dc.contributor.authorVenter, S.N. (Stephanus Nicolaas)
dc.contributor.authorBosch, Rafael
dc.contributor.authorKonstantinidis, Konstantinos T.
dc.contributor.authorRodriguez-R, Luis M.
dc.contributor.authorRossello-Mora, Ramon
dc.date.accessioned2026-04-07T13:35:45Z
dc.date.available2026-04-07T13:35:45Z
dc.date.issued2025-01
dc.descriptionDATA AVAILABILITY : The datasets analyzed during the current study are available in the European Nucleotide Archive (ENA) repository, at https://www.ebi.ac.uk/ena/browser/home under BioProject accession numbers PRJEB75750, PRJEB52999, and PRJNA763692. Custom R code developed in this study for the estimation of average error (%) is available at https://github.com/ebustos128/Uneven-sequencing-can-bias-estimates-of-microbial-intraspecies-diversity.
dc.description.abstractAn unbiased and accurate estimation of intraspecies diversity, i.e. the extent of genetic diversity within species (or microdiversity), is crucial for clinical and environmental microbiome studies. Although it is well appreciated that sequencing depth (or coverage depth) below 10X can provide biased estimates of microdiversity, typically underestimating diversity due to the random sampling of alleles, there is a widely accepted convention that microdiversity estimates tend to be relatively stable at sequencing depth exceeding 10X. Therefore, discarding species with <10X or rarefying to 10-20X sequencing depth are generally used to compare microdiversity among taxa and samples. Our findings showed that these biases may persist even at depth levels above 50-200X for all popular sequencing platforms, including Illumina, PacBio, and Oxford Nanopore. The biases mostly, but not always, represent an underestimation of diversity and were attributable to the incomplete recovery of Single Nucleotide Variants (SNVs) at lower sequencing depth levels. To address this issue, we recommend using rarefaction-based approaches to standardize data at least 50X, and ideally at 200X sequencing depth, which reduces differences between observed and expected microdiversity values to <0.5%. Furthermore, the Average Nucleotide Identity of reads (ANIr) metric is significantly less sensitive to sequencing depth variability than nucleotide diversity (π), making it a robust alternative for estimating microdiversity at sequencing depth close or exceeding 10X, without a need to rarefying data. Therefore, the sequencing depth thresholds proposed herein provide a more standardized framework for direct comparisons of microdiversity across samples and studies.
dc.description.departmentBiochemistry, Genetics and Microbiology (BGM)
dc.description.departmentForestry and Agricultural Biotechnology Institute (FABI)
dc.description.librarianam2026
dc.description.sdgSDG-15: Life on land
dc.description.sponsorshipFunded by the Spanish Ministry of Science, Innovation and Universities projects; supported with European Regional Development Funds; a pre-doctoral contract from the Spanish Government Ministry for Science and Innovation; the financial support of the Research and Training Grants from the Federation of European Microbiological Societies (FEMS) for a 3-month stay in DiSC of University of Innsbruck, Austria.
dc.description.urihttps://academic.oup.com/ismecommun
dc.identifier.citationBustos-Caparros, E., Viver, T., Gago, J.F. et al. 2025, 'Uneven sequencing (coverage) depth can bias microbial intraspecies diversity estimates and how to account for it', ISME Communications, vol. 5, no. 1, art. ycaf228, pp. 1-11. https://doi.org/10.1093/ismeco/ycaf228.
dc.identifier.issn2730-6151 (online)
dc.identifier.other10.1093/ismeco/ycaf228
dc.identifier.urihttp://hdl.handle.net/2263/109445
dc.language.isoen
dc.publisherOxford University Press
dc.rights© The Author(s) 2024. This article is licensed under a Creative Commons Attribution 4.0 International License.
dc.subjectSequencing depth
dc.subjectNucleotide diversity
dc.subjectANIr
dc.subjectMetagenomics
dc.subjectBias
dc.titleUneven sequencing (coverage) depth can bias microbial intraspecies diversity estimates and how to account for it
dc.typeArticle

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
BustosCaparros_Uneven_2025.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format
Description:
Article
Loading...
Thumbnail Image
Name:
BustosCaparros_UnevenSuppl1_2025.docx
Size:
1.76 MB
Format:
Microsoft Word XML
Description:
Supplementary Material 1
Loading...
Thumbnail Image
Name:
BustosCaparros_UnevenSuppl2_2025.xlsx
Size:
24.02 KB
Format:
Microsoft Excel XML
Description:
Supplementary Material 2

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: