Show Notes
Steiner MC et al., PNAS - This episode examines a theoretical and empirical study showing how the geographic breadth of sampling affects discovery and observed frequencies of deleterious rare variants. The authors develop a spatial stochastic model, validate it with simulations, and test predictions using UK Biobank exome resampling. Key terms: sampling breadth, rare variants, negative selection, site frequency spectrum, UK Biobank.
Study Highlights:
The authors build a stochastic spatial model that combines dispersal, genetic drift, selection, mutation, and geographically concentrated sampling to derive expected sample site frequency spectra for deleterious variants. They find that broader sampling increases the number of distinct variants discovered ("discovery") while reducing the average observed frequency per variant ("dilution"). The magnitude of these effects scales with the sampling breadth relative to the allele spread length (`c), sample size, and selection strength. Theoretical predictions are validated with branching-process and SLiM simulations and by in silico resampling of UK Biobank exomes.
Conclusion:
Geographic sampling breadth produces a trade-off: broader samples discover more distinct deleterious variants but each at lower frequency, a pattern that affects association study power and SFS-based inference of negative selection. Study design for biobank-scale genetics should explicitly account for sampling breadth and its downstream effects.
Music:
Enjoy the music based on this article at the end of the episode.
Article title:
Study design and the sampling of deleterious rare variants in biobank-scale datasets
First author:
Steiner MC
Journal:
PNAS
DOI:
10.1073/pnas.2425196122
Reference:
Steiner MC, Rice DP, Biddanda A, Ianni-Ravna MK, Porras C, Novembre J. Study design and the sampling of deleterious rare variants in biobank-scale datasets. PNAS. 2025;122:e2425196122. https://doi.org/10.1073/pnas.2425196122
License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/
Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00
Official website https://basebybase.com
On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics.
Episode link: https://basebybase.com/episodes/discovery-dilution-sampling-breadth-rare-variants
QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2025-07-02.
QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Audited the transcript portions describing (1) geographic breadth effects on variant discovery and frequency (discovery vs dilution), (2) the stochastic spatial model and sampling breadth parameter w, (3) UK Biobank exome resampling results, and (4) implications for GWAS power and inference of negative selection, inclu
- transcript topics: Geographic breadth and site frequency spectrum (SFS); Discovery and dilution trade-off; Stochastic spatial model and sampling breadth (w) and characteristic length (c); UK Biobank exome resampling results (LoF variants, heterozygosity, singletons); Implications for GWAS power and inference of negative selection; Model limitations and boundary conditions (torus assumption vs real geography)
QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 7
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0
Metadata Audited:
- article_doi
- article_title
- article_journal
- license
Factual Items Audited:
- Broad geographic sampling increases the number of distinct deleterious variants discovered (discovery).
- Broad geographic sampling reduces the average frequency of each discovered variant (dilution).
- The average allele frequency across the entire study is unchanged by sampling breadth (discovery-dilution balance).
- UK Biobank resampling shows 72.3% more loss-of-function (LoF) variants discovered with broader sampling, and 36.75% lower heterozygosity at LoF variant sites; singletons increase s
- There is a discovery vs dilution trade-off affecting GWAS power and inference of negative selection.
QC result: Pass.