Episode 316

March 14, 2026

00:23:18

316: Inclusion bias in UCLA ATLAS: enrollment models, weighting, and effects on GWAS and PGS

Hosted by

Gustavo B Barra
316: Inclusion bias in UCLA ATLAS: enrollment models, weighting, and effects on GWAS and PGS
Base by Base
316: Inclusion bias in UCLA ATLAS: enrollment models, weighting, and effects on GWAS and PGS

Mar 14 2026 | 00:23:18

/

Show Notes

Pimplaskar A et al., The American Journal of Human Genetics - In UCLA ATLAS EHR-linked biobank analyses, random forest-derived enrollment probabilities and inverse-probability weighting increased replication of known GWAS variants and altered PGS associations. Key terms: inclusion bias, UCLA ATLAS, inverse-probability weighting, random forest, polygenic scores.

Study Highlights:
Using the UCLA ATLAS EHR-linked biobank, the authors trained random forest classifiers on demographics, healthcare utilization, and ICD-10 features to distinguish enrolled from background patients. They converted predicted enrollment probabilities into inverse-probability weights and applied these to GWAS replication tests and PGS-PheWAS scans. The classifier achieved AUROC≈0.85 and weighting increased replication of known GWAS variants by 54% while changing phenome-wide PGS association patterns. These results indicate that enrollment-driven inclusion bias can materially affect variant discovery and downstream PGS-based phenotypic associations in health-system biobanks.

Conclusion:
Inclusion bias in EHR-linked biobanks like UCLA ATLAS measurably affects common-variant discovery and PGS associations, and enrollment-aware inverse-probability weighting can improve replication while reducing effective sample size.

Music:
Enjoy the music based on this article at the end of the episode.

Article title:
Inclusion bias affects common variant discovery and replication in a health-system linked biobank

First author:
Pimplaskar A

Journal:
The American Journal of Human Genetics

DOI:
10.1016/j.ajhg.2026.02.011

Reference:
Pimplaskar A, Qiu J, Lapinska S, Tozzo V, Chiang JN, Pasaniuc B, Olde Loohuis LM. Inclusion bias affects common variant discovery and replication in a health-system linked biobank. The American Journal of Human Genetics. 2026;113:1–13. https://doi.org/10.1016/j.ajhg.2026.02.011

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/

Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00

Official website https://basebybase.com

On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Episode link: https://basebybase.com/episodes/inclusion-bias-ucla-atlas

QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-03-14.

QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Audited the transcript sections describing enrollment-bias methodology (random forest classifier, inverse-probability weighting), key numeric results (AUROC/AUPRC, enrollment counts, ORs), GWAS replication improvements, and PGS-PheWAS outcomes, plus implications and limitations.
- transcript topics: Enrollment bias in UCLA ATLAS biobank; Random forest classifier for enrollment prediction; Inverse-probability weighting and normalization; Effective sample size and trade-offs; GWAS variant replication under weighting; Variant-level associations and ancestry effects

QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 8
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0

Metadata Audited:
- article_doi
- article_title
- article_journal
- license

Factual Items Audited:
- Enrollment in ATLAS: background population ~1.57–1.57 million; enrolled ~104,516
- Primary care at UCLA strongly predicts enrollment: ~70.2% enrolled vs ~21.8% unenrolled; OR ≈ 8.44
- Enrolled individuals have higher healthcare utilization: ~12.8 visits/year vs ~6.7
- RF model discriminates enrollment with AUROC ≈ 0.85 and AUPRC ≈ 0.82
- Inverse-probability weighting reduces effective sample size to ≈11,319.9 (4.3× reduction; from ≈48k)
- Weighting increases replication of known GWAS variants by ≈54%

QC result: Pass.

Other Episodes