Episode 175

October 22, 2025

00:19:13

175: Predictive Prioritization of Pancreatic Enhancers Linked to Disease Risk

Hosted by

Gustavo B Barra
175: Predictive Prioritization of Pancreatic Enhancers Linked to Disease Risk
Base by Base
175: Predictive Prioritization of Pancreatic Enhancers Linked to Disease Risk

Oct 22 2025 | 00:19:13

/

Show Notes

️ Episode 175: Predictive Prioritization of Pancreatic Enhancers Linked to Disease Risk

In this episode of PaperCast Base by Base, we explore how enhancer–promoter 3D chromatin maps from five primary human pancreatic cell types were transformed into graph “tree” models to quantify enhancer connectivity and prioritize elements most critical for cell-type-specific gene expression, creating a framework to connect noncoding variants to function in pancreatic disease.

Study Highlights:
The authors profiled H3K27ac HiChIP and ATAC‑seq across 28 donors, building enhancer–promoter tree models that capture direct and indirect loops and reveal modular “forests” centered on promoter–promoter hubs.
They developed EPIC, a k‑nearest‑neighbors model using chromatin features and tree topology to rank enhancers by their predicted effect on cell‑type‑specific transcription and validated top predictions in primary human cells using CRISPRa/i with single‑cell RNA FISH readouts.
Direct E1 enhancer loops predominated and multiple enhancers additively boosted expression of lineage‑defining genes, while EPIC‑prioritized enhancers overlapped germline risk variants for type 2 diabetes and pancreatic ductal adenocarcinoma.
GWAS integration pointed to unexpected enrichment of PDAC‑associated variants in acinar enhancers and experimental perturbation at the XBP1 locus reduced transcripts in line with predicted effect sizes.

Conclusion:
Enhancer tree models coupled with the EPIC prioritization algorithm provide a scalable route to nominate and validate functional noncoding elements and their target genes in the human pancreas, sharpening variant‑to‑function studies and disease mechanism discovery.

Reference:
Wang L, Baek S, Prasad G, Wildenthal J, Guo K, Sturgill D, Truongvo T, Char E, Pegoraro G, McKinnon K, The Pancreatic Cancer Cohort Consortium, The Pancreatic Cancer Case‑Control Consortium, Hoskins JW, Amundadottir LT, Arda HE. Predictive prioritization of enhancers associated with pancreatic disease risk. Cell Genomics. 2026;6:101040. https://doi.org/10.1016/j.xgen.2025.101040

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/

Support:
If you'd like to support Base by Base, you can make a one-time or monthly donation here: https://basebybase.castos.com/

Chapters

  • (00:00:14) - Pancreatic cancer risk map
  • (00:05:25) - How they mapped the enhancer network in the pancreas
  • (00:10:16) - Epic the enhancer network: diabetes, cancer
  • (00:13:32) - Pancreatic cancer: The genetic predisposition
View Full Transcript

Episode Transcript

[00:00:14] Speaker A: Welcome to Base by Base, the papercast that brings genomics to you wherever you are. [00:00:19] Speaker B: It's great to be diving into this one today. [00:00:21] Speaker A: So when we think about inherited risk for, you know, really devastating diseases like type 2 diabetes or pancreatic cancer, our first thought often goes to the genes themselves, the instructions for making proteins. [00:00:34] Speaker B: That's the common assumption. Yeah. A faulty instruction said a gene coding for a. Well, a bad protein. [00:00:40] Speaker A: But what's really fascinating and kind of counterintuitive is where the danger often physically lies in the genome. [00:00:46] Speaker B: Exactly, yeah. And here's a fact that really sets the stage for what we're talking about today. Genome wide studies, G. Watkins, they show that over 90%, think about that. 90% of genetic variations linked to common diseases or are not actually inside the. [00:01:01] Speaker A: Protein coding genes, not in the genes themselves. So where are they? [00:01:04] Speaker B: They're in the non coding regions, the vast stretches of DNA we used to call, somewhat dismissively junk DNA, but it's not junk at all. It's the regulatory territory. [00:01:13] Speaker A: Uh, the control panel. [00:01:15] Speaker B: Precisely. It contains these elements, often called enhancers, which act like switches. They control when and how strongly the gene gets turned on or off. [00:01:24] Speaker A: And if those switches are faulty or maybe connected wrong, then the cell can. [00:01:29] Speaker B: Lose its proper identity, its function. It can become vulnerable to disease. [00:01:33] Speaker A: And trying to figure out these switches in the pancreas is, well, it's notoriously difficult, isn't it? [00:01:40] Speaker B: Oh, incredibly so. The pancreas is just fantastically complex. You've got these two major components. The endocrine part making hormones like insulin. [00:01:47] Speaker A: With the alpha, beta and delta cells. [00:01:49] Speaker B: Right. And then the exocrine part making digestive enzymes with acinar and duck cells, each doing a very specific job. [00:01:56] Speaker A: It sounds like a functional maze. If you want to understand disease risk, you can't just look at the whole organ as one thing. [00:02:02] Speaker B: You absolutely can't. You need to know exactly which regulatory switch controls which gene and critically, in which specific type of cell. Because a switch that's vital in a beta cell might be irrelevant or even harmful if active in a duck cell. [00:02:16] Speaker A: And getting that level of detail, that specific wiring diagram in 3D space has been the major roadblock. [00:02:23] Speaker B: It really has. We've had blurry pictures, maybe averages across mixed cell populations. But today we're going to unpack how a frankly groundbreaking new model managed to map the precise 3D DNA organization inside these distinct purified pancreatic cells. [00:02:40] Speaker A: And this allowed them to predict which genetic variations are the real culprits. [00:02:44] Speaker B: Yes. And the results are pretty Stunning. They actually challenge a really long standing idea, a dogma almost about how one of the deadliest cancers, pancreatic cancer, is even get started. [00:02:54] Speaker A: Okay, I'm definitely intrigued. Who do we have to thank for this leap forward? [00:02:57] Speaker B: Well, today we really celebrate the fantastic work of Li Wang Song Joon Baek Gauri Prasad and the whole team. The lead contact on the paper is HF Sinarda. And this work came out of the National Cancer Institute, the NIH and associated consortia. [00:03:11] Speaker A: A big collaborative effort. [00:03:13] Speaker B: Absolutely. And their findings were published in Cell Genomics. It's provided this massive new resource really for understanding how enhancers connect to genes and how the genome structure influen influences pancreatic disease risk. [00:03:27] Speaker A: So let's zoom out for a second. Why the pancreas? Why is tackling this specific organ so critical? [00:03:33] Speaker B: Well, the sheer impact for 1 pancreatic disorders, we're talking type 2 diabetes, chronic pancreatitis, pancreatic cancer. These affect a huge number of people globally. Yeah, over 10% of the population, easily. The health burden is immense. [00:03:48] Speaker A: And you mentioned the complexity earlier, that seems central. [00:03:51] Speaker B: It is those very distinct cell types, the endocrine and exocrine cells, they come from a common origin during development. And while they normally maintain their specialized identities really well, there's a catch. There's a catch. They also possess a certain degree of cellular plasticity. [00:04:04] Speaker A: Meaning they can change state potentially. [00:04:07] Speaker B: Exactly. Sometimes that plasticity is good, like for tissue repair, but it also creates a vulnerability. If that change goes wrong, it can lead towards malignancy, towards cancer. Pancreatic ductal adenocarcinoma PDAC is the classic very deadly example. [00:04:22] Speaker A: And the enhancers, these regulatory switches are key to keeping cells on the right track, maintaining that identity. [00:04:27] Speaker B: They are the guardians of cell identity in a way. And they do this job often by physically looping across the DNA. An enhancer sequence might be thousands, even millions of base pairs away from the gene it controls on the linear sequence. But in 3D space inside the cell nucleus, the the DNA folds up, bringing that distant enhancer right next to the gene's starting point, the promoter to regulate it. [00:04:51] Speaker A: And that looping is what makes it so hard to connect the dots. From a genetic variation found in a Genoa study SNP to the actual gene it affects. [00:04:59] Speaker B: Precisely. The SNP might sit in an enhancer, but the gene controls isn't necessarily the closest one on the DNA strand. It could be much further away. [00:05:07] Speaker A: And previous studies just didn't have the tools or the scale to map those long range cell type specific Connections accurately? [00:05:15] Speaker B: Generally, no. They often lacked either the resolution to pinpoint interactions in specific cell types or the sample size needed to see consistent patterns. You need both. [00:05:25] Speaker A: Okay, so how did this team crack that mapping challenge? What was different? [00:05:30] Speaker B: They started with the foundation. Incredibly high quality, high resolution data generated at a really impressive scale. For this type of study, they got pancreatic tissue from 28 to 20 human donors. [00:05:42] Speaker A: 28 donors. That is a good cohort size. [00:05:44] Speaker B: It really is. And crucially, they didn't just grind up the tissue. They painstakingly purified five distinct cell types. The alpha, beta and delta islet cells, plus the exagranacinar and duck cells. They got these populations over 95% purity. [00:05:59] Speaker A: Wow. That purity is key, isn't it? Otherwise you're just averaging signals from different cell types. [00:06:03] Speaker B: You'd completely muddy the waters. Mixing cell types means mixing up their unique regulatory wiring diagrams. So that purity was absolutely essential. [00:06:11] Speaker A: Okay, so they have these pure CE populations. What did they do next? [00:06:14] Speaker B: They applied two main techniques. First, a tecsec, which maps out the open accessible regions of the chromatin, places where regulatory proteins combined. Think of it as finding potential landing pads. Okay. Second, and this is critical for the 3D structure, they used a method called H3K27AK HEP. This technique specifically captures the physical interactions, the loops between active enhancers marked by H3K27EK and gene promoters. It shows you who is actually talking to whom in 3D space. [00:06:47] Speaker A: So they have the potential switches from ATAXEC and the actual connections from heighthigh. [00:06:52] Speaker B: Exactly. But the real innovation, I think was how they analyzed this complex web of interactions. Instead of just listing pairs of connected regions, they developed a more sophisticated network approach. They built what they called enhancer promoter tree models. [00:07:04] Speaker A: Tree models. How does that work? [00:07:06] Speaker B: It's quite elegant actually. Think of the genes promoter its start site as the root of the tree. Let's call it P0. [00:07:12] Speaker A: Okay. The root. [00:07:13] Speaker B: Any enhancer that directly loops to that promoter in the hip data is considered a level one enhancer or E1. It's directly connected. Makes sense then if another enhancer loops to an E1 enhancer, but not directly to the promoter itself. That's a level two enhancer, E2 and so on. [00:07:29] Speaker A: Ah, so it captures these multi step connections like branches on a tree? [00:07:33] Speaker B: Precisely. It allows you to see the entire regulatory architecture controlling a gene, not just the direct links. It captures how multiple enhancers might work together, perhaps hierarchically. It moves beyond just simple one to one enhancer gene pairs. [00:07:48] Speaker A: That sounds much more biologically. Realistic. [00:07:50] Speaker B: It does. And once they had these detailed cell type specific tree maps, they took another big step. They built a machine learning algorithm. [00:07:58] Speaker A: Of course. Machine learning. What does this algorithm do? [00:08:00] Speaker B: It's called epic, which stands for Enhancer prioritizer using integrated chromatin data. EPIC is designed to predict how important each enhancer is. [00:08:09] Speaker A: How does it do that? [00:08:10] Speaker B: It takes in 24 different features derived from their data for each enhancer. Things like the strength of the acax signal, how often the hitch grip is detected, where the enhancer sits in the tree structure, E1, E2, and other chromatin features. And based on these features, Epyc predicts a functional effect size for every single enhancer. Essentially, it ranks the switches, telling you which ones likely have the biggest impact on controlling that specific gene's expression in that specific cell type. [00:08:38] Speaker A: A prioritized list of regulatory switches. That's powerful. But how do you know if epic's predictions are right? It sounds good. Theoretically, right? [00:08:47] Speaker B: That's the crucial question. You need experimental validation. And they did this using a really cutting edge approach. They used CRISPR gene editing tools, specifically CRISPR interference, to turn enhancers down and CRISPR activation to turn them up. [00:09:03] Speaker A: Okay, so they physically manipulated the enhancers EPIC pointed to. [00:09:07] Speaker B: Exactly. And they did this in primary human pancreas cells. The real deal. Then they used a technique called RnaFish, which lets you visualize and count individual RNA molecules inside single cells. [00:09:18] Speaker A: Single cell resolution. [00:09:19] Speaker B: Yes. So they could perturb a specific enhancer in, say, beta cells and then directly measure if the target gene's expression went up or down in those specific beta cells. It's about as direct a validation as you can get. [00:09:31] Speaker A: That sounds incredibly thorough. Okay, so they built the maps, they built the predictor, they validated it. What did these enhancer trees actually reveal about how gene regulation is organized? [00:09:43] Speaker B: Well, the first major finding was perhaps a bit surprising in its simplicity. The vast majority of interactions were direct. These level one enhancers, the ones looping straight to the promoter, they accounted for about 73% of all the enhancers mapped and a whopping 80% of all the detected loops. [00:09:59] Speaker A: So most of the action is direct contact. [00:10:02] Speaker B: It suggests that predominantly the system might be simpler than we thought, maybe more additive, with these direct E1 enhancers being the primary drivers of regulation, rather than complex multi level hierarchies dominating everywhere. [00:10:15] Speaker A: Interesting. What else? [00:10:17] Speaker B: Well, they saw strong evidence for something researchers have observed hints of before, sometimes called the skip the nearest rule. [00:10:23] Speaker A: Skip the nearest meaning the enhancer doesn't always regulate the gene Right next door on the DNA strand. [00:10:28] Speaker B: Exactly. In fact, across all five pancreatic cell types, they looked at over 80% of those direct E1 enhancers completely bypassed closer genes to loop to a more distant target promoter. [00:10:41] Speaker A: 80%. So proximity on the linear DNA is often irrelevant for function. [00:10:46] Speaker B: It seems largely irrelevant for which gene gets regulated. Function dictates the connection, not linear distance. [00:10:52] Speaker A: And did those distally looped genes have special properties? [00:10:55] Speaker B: They absolutely did. The genes targeted by These long range E1 loops showed significantly higher expression levels and very importantly, much higher cell type specificity compared to the genes that were skipped over. [00:11:07] Speaker A: Ah, so the long range connections seem to prioritize the genes that are really crucial for defining what that cell is and what it does. [00:11:14] Speaker B: That's the interpretation. Yes. It ensures these key identity genes get robust, stable expression regardless of where they happen to sit linearly in the genome. Like the system is hardwired to focus on the essentials. [00:11:26] Speaker A: And did they see any patterns related to how many enhancers connect to a gene? [00:11:30] Speaker B: Yes. Another important finding. Genes that were connected to multiple E1 enhancers tended to have even higher expression specificity and abundance. [00:11:38] Speaker A: So having multiple direct switches provides backup, Maybe redundancy. [00:11:43] Speaker B: It suggests exactly that multiple enhancers working together, perhaps additively, to ensure that critical genes are reliably expressed at the right levels. It builds robustness into the system. [00:11:53] Speaker A: And did the experimental validation with EPYC back this up? You mentioned PCSK1 and PCSK2 earlier. [00:12:00] Speaker B: It did beautifully. For example, they looked at PCSK1, which is important in beta cells, and EPIC predicted different effect sizes for various enhancers controlling it. [00:12:09] Speaker A: Right. [00:12:09] Speaker B: When they used CRISPR to activate the enhancer, EPIC ranked as having the highest predicted effect size. They saw a significantly larger increase in PCSK1 gene expression compared when they activated a lower ranking enhancer. [00:12:21] Speaker A: So epic's predictions matched the real biological impact. [00:12:24] Speaker B: The relative impacts matched really well. Yes. And for PCSK2 and alpha cells, they tried activating multiple enhancers simultaneously. They found that the increase in gene expression was roughly the sum of the effects of activating each enhancer individually. [00:12:40] Speaker A: Near additive effects. [00:12:41] Speaker B: Exactly. Supporting that idea of functional redundancy and collective action among these enhancers. [00:12:46] Speaker A: Okay, this detailed cell type specific map sounds like a huge leap. How did it help connect the dots for actual human diseases like diabetes and cancer? [00:12:56] Speaker B: This is where it gets really powerful. For interpreting those GTIs results we talked about earlier, they could now take known disease risk and see which enhancer trees in which cell types they were statistically enriched in. [00:13:08] Speaker A: Did it confirm things we Already expected it did. [00:13:11] Speaker B: As you'd anticipate, genetic variants associated with type 2 diabetes and related glycemic traits showed significant enrichment, specifically within the enhancer trees active in the islet cells, particularly the beta cells. [00:13:22] Speaker A: Makes sense. Beta cells make insulin. [00:13:24] Speaker B: Exactly. It validated the approach. The framework correctly links known diabetes risk variants and to their relevant cell types regulatory network. [00:13:32] Speaker A: But you mentioned a twist earlier, something challenging dogma, particularly for pancreatic cancer. Pdsc. [00:13:38] Speaker B: Yes. This was arguably the most striking finding. The conventional wisdom for decades really has been that PDSC primarily originates from the ductal cells of the pancreas. [00:13:48] Speaker A: That's what I've always heard. [00:13:49] Speaker B: Right. But when this team overlay the known PDAC risk SNPs onto their cell type specific enhancer maps, the strongest signal wasn't in the duct cells. The PDAC variants showed significantly stronger enrichment in the regulatory regions active in acinar cells. [00:14:04] Speaker A: Wait, in the acinar cells? The ones making digestive enzymes, not the duct cells. [00:14:10] Speaker B: That's what the data showed. And the statistics were very strong. PFE 1.47 by 105 for acinar versus P 3.65 by 103 for duct. A much more significant enrichment in the acinar enhancer landscape. That is. [00:14:22] Speaker A: Wow, that really challenges the standard model of PDAC origin, doesn't it? [00:14:25] Speaker B: It fundamentally does. It adds very strong weight to an emerging body of evidence suggesting that acinar cells, perhaps through a process called acinartal metaplasia, can actually give rise to the precursor lesions that lead to pdac. [00:14:39] Speaker A: So the genetic predisposition might actually be rooted more in the acinar cell's regulatory machinery. Even if the final cancer looks ductile. [00:14:47] Speaker B: That's what this strongly suggests. It means we might need to refocus some of our efforts in early detection and prevention towards understanding what goes wrong in acinar cells. [00:14:57] Speaker A: And did the EPIC algorithm help pinpoint specific risk enhancers for PDC? [00:15:01] Speaker B: It did. Across all the traits they tested, type 1 diabetes, type 2 PDAC enhancers that physically overlapped with known disease SNPs consistently ranked higher in EPIC's predicted functional effect size. [00:15:13] Speaker A: So the algorithm correctly flagged the disease relevant enhancers as being more important? [00:15:18] Speaker B: Yes. For instance, looking at PDAC risk and ASNAR cells, EPIC highlighted a specific high impact enhancer near the gene XBP1. And when they went back to the lab and used CRISPRI to specifically silence that enhancer in acinar cells, they saw a significant drop in XBP1 gene expression. It provided a direct functional link PDAC, SMP specific acinar enhancer, XBP1 gene regulation. [00:15:44] Speaker A: Incredible. Connecting all those pieces, did they find anything else about how different cell types might interact in disease? [00:15:50] Speaker B: Yes. Another interesting layer, they found evidence of what they called inner compartment crosstalk. For example, they saw some enrichment of type 2 diabetes SNPs within enhancers active in exocrine cells like acinar cells, specifically around a gene called GATA4. [00:16:07] Speaker A: So a diabetes risk variant influencing a gene in the non diabetes part of the pancreas? [00:16:12] Speaker B: Potentially, yes. It highlights that these diseases might involve complex interplay between the different cell compartments. You might not fully understand diabetes risk by only looking at islet cells. The exocrine neighbors can be playing a role too. [00:16:24] Speaker A: The pancreas really is an interconnected system. [00:16:26] Speaker B: It truly is. Now, it's important to mention the limitation they acknowledged. The HIP data which maps the 3D loops is still an average signal for many cells. It gives you a great snapshot of the common connections, but it doesn't capture potential cell to cell variability or dynamic changes in chromatin structure over time. [00:16:46] Speaker A: So we don't know if these loops are constant in every single cell or if they flicker on and off. [00:16:50] Speaker B: Exactly. Getting that single cell resolution for 3D structure is the next frontier. Technically very challenging, but important for understanding the dynamics. [00:16:59] Speaker A: Understood. So, pulling it all together, what's the main takeaway here? [00:17:03] Speaker B: I think the central insight is that by combining these high resolution cell type specific 3D genomics methods with smart network modeling these enhancer promoter trees, and then adding machine learning like EPIC for prioritization, well, they've built an incredibly powerful framework. [00:17:18] Speaker A: The framework for what specifically? [00:17:20] Speaker B: For finely mapping non coding genetic disease risk, those GWAS SMPs down to the specific functional enhancer in the specific relevant cell type and linking it to its target gene. It cuts through the complexity. [00:17:31] Speaker A: And the key finding about the genome. [00:17:33] Speaker B: Architecture itself is that it seems perhaps more streamlined than previously imagined, dominated by these direct, often long range enhancer promoter contacts that appear to work collectively, maybe additively, to ensure robust and specific gene expression, which is fundamental for maintaining cell identity. Mapping those contacts is like finding a key to unlock complex disease mechanisms. [00:17:58] Speaker A: Okay, that's a clear and powerful message. Before we wrap up any final thought for our listeners to chew on, well, here's something interesting. [00:18:05] Speaker B: EPIC can prioritize enhancers based purely on their predicted functional importance using those chromatin features, even if we don't yet know of any disease SNP located there. [00:18:16] Speaker A: Right. It ranks them based on intrinsic properties. [00:18:19] Speaker B: Exactly. So it makes you wonder how many really important high impact enhancers potentially harboring undiscovered disease variants are still hiding in plain sight within our non coding genome, just waiting for an algorithm like APIC to point them out? [00:18:32] Speaker A: A fascinating question indeed. There's likely much more to uncover in that regulatory landscape. No doubt about it, this episode was based on an Open Access article under the CCBY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description. Thanks for listening and join us next time as we explore more science base by base. Sam.

Other Episodes