Episode 111

August 19, 2025

00:17:06

111: HANCOCK: Multimodal Dataset for Precision Oncology in Head and Neck Cancer

Hosted by

Gustavo B Barra
111: HANCOCK: Multimodal Dataset for Precision Oncology in Head and Neck Cancer
Base by Base
111: HANCOCK: Multimodal Dataset for Precision Oncology in Head and Neck Cancer

Aug 19 2025 | 00:17:06

/

Show Notes

Dörrich M et al., Nature Communications - This episode summarizes HANCOCK, a monocentric multimodal dataset of 763 head and neck cancer patients combining demographics, structured pathology and blood data, surgery reports, whole-slide images (WSIs) and tissue microarrays (TMAs). The paper demonstrates that multimodal machine learning and multiple instance learning with histopathology foundation models improve prediction of recurrence and survival and that the dataset is publicly available for research. Key terms: multimodal dataset, head and neck cancer, histopathology, machine learning, precision oncology.

Study Highlights:
The authors assembled HANCOCK, a harmonized multimodal cohort of 763 head and neck cancer patients including 701 primary tumor WSIs and 368 TMAs alongside clinical, laboratory, and surgery-report data. They encoded each modality into multimodal patient vectors, used UMAP to explore patient clusters, and trained Random Forests to predict recurrence and survival with a maximum average AUC of 0.79. Multiple instance learning (CLAM) with self-supervised histology encoders (e.g., UNI, ResNet18) produced high localization AUCs and combining WSI and TMA inputs improved survival prediction (test AUC ~0.69 vs 0.65 and 0.52). The dataset and code are publicly released to enable reproducible multimodal research in precision oncology.

Conclusion:
HANCOCK is a large, publicly available multimodal resource that enables development and validation of multimodal ML and MIL approaches in head and neck oncology and is positioned to accelerate biomarker discovery and precision treatment research.

Music:
Enjoy the music based on this article at the end of the episode.

Article title:
A multimodal dataset for precision oncology in head and neck cancer

First author:
Dörrich M

Journal:
Nature Communications

DOI:
10.1038/s41467-025-62386-6

Reference:
Dörrich M., Balk M., Heusinger T., et al. A multimodal dataset for precision oncology in head and neck cancer. Nature Communications (2025) 16:7163. doi:10.1038/s41467-025-62386-6

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/

Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00

Official website https://basebybase.com

On PaperCast Base by Base you'll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Episode link: https://basebybase.com/episodes/a-multimodal-dataset-for-precision-oncology-in-head-and-neck-cancer

QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2025-08-19.

QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Evaluated the transcript sections describing HANCOCK dataset composition, multimodal fusion strategy, imaging-based MIL using CLAM and UNI, data-split strategy with genetic algorithm, performance outcomes (AUCs), biomarker context (HPV/PD-L1), and dataset availability/future directions.
- transcript topics: HANCOCK dataset overview and modalities; Multimodal data integration and early fusion; Imaging data: MIL/CLAM and histology foundation models (UNI); Genetic algorithm-based data splits for training/testing; Performance outcomes: recurrence and survival AUCs; Biomarkers and biology: HPV status and PD-L1

QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 6
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0

Metadata Audited:
- article_doi
- article_title
- article_journal
- license

Factual Items Audited:
- HANCOCK contains real-world data from 763 head and neck cancer patients with multimodal data (demographics, blood data, pathology/surgery reports, histologic images).
- WSIs: 701 primary-tumor WSIs; 396 lymph-node WSIs; 368 TMAs with HE/IHC staining.
- Multimodal integration uses early fusion; maximum average AUC ~0.79 for recurrence and survival.
- MIL (CLAM) with histology foundation models (UNI) yields high localization AUCs (~0.94–0.96).
- Combining WSIs and TMAs improves survival prediction (average AUC ~0.69) versus WSIs alone (0.65) or TMAs alone (0.52).
- Reproduction of known biomarkers: HPV status and PD-L1 have predictive value for outcomes; dataset public availability is highlighted.

QC result: Pass.

Other Episodes