Show Notes
️ Episode 63: Sampling Scale Matters — A Spatial Model for Rare Variant Discovery in Biobank-Scale Datasets
In this episode of Base by Base, we delve into a methodological advance by Steiner et al. (2025) published in PNAS that introduces a spatial branching process framework to investigate how the geographic breadth of sampling influences the discovery and allele frequency spectrum of rare, deleterious genetic variants in large sequencing cohorts.
Study highlights: The authors show analytically that as sampling breadth increases, more distinct deleterious variants are uncovered while their observed frequencies diminish, a dual phenomenon they term “discovery and dilution.” They derive effective mutation supply and selection intensity parameters that depend on the ratio of sampling kernel width to allelic dispersal scale, yielding a closed-form expression for the sample site frequency spectrum. Validation using both spatial branching process simulations and forward-time SLiM simulations confirms the precision of these theoretical predictions. Finally, in silico resampling of UK Biobank exome data on chromosome 1 demonstrates that expanding sampling breadth substantially raises variant and singleton counts per kilobase and lowers heterozygosity at variant sites, yet leaves genome-wide average heterozygosity unchanged.
Conclusion: Steiner et al.’s work underscores the critical impact of geographic sampling design on rare variant discovery and frequency estimates in biobank-scale studies, providing a quantitative foundation for optimizing sampling strategies and interpreting genetic diversity metrics in both population and biomedical genetics.
Reference:
Steiner, M. C., Rice, D. P., Biddanda, A., Ianni-Ravna, M. K., Porras, C., & Novembre, J. (2025). Study design and the sampling of deleterious rare variants in biobank-scale datasets. Proceedings of the National Academy of Sciences, 122(23), e2425196122. https://doi.org/10.1073/pnas.2425196122
License:
This episode is based on an open access article distributed under the Creative Commons Attribution License 4.0 International (CC BY 4.0) — https://creativecommons.org/licenses/by/4.0/