Episode 122

August 30, 2025

00:19:57

122: Patient Stratification Reveals the Molecular Basis of Disease Co-Occurrences

Hosted by

Gustavo B Barra
122: Patient Stratification Reveals the Molecular Basis of Disease Co-Occurrences
Base by Base
122: Patient Stratification Reveals the Molecular Basis of Disease Co-Occurrences

Aug 30 2025 | 00:19:57

/

Show Notes


️ Episode 122: Patient Stratification Reveals the Molecular Basis of Disease Co-Occurrences

In this episode of PaperCast Base by Base, we explore a study that investigates the molecular underpinnings of why certain diseases tend to co-occur. By using large-scale RNA sequencing data, the authors present a novel approach to identify disease co-occurrences, revealing a shared molecular basis in many comorbidities, particularly involving the immune system. The study introduces patient stratification based on gene expression profiles, which uncovers known and potential new disease associations, providing a framework for personalized approaches to managing comorbidities.

Study Highlights:
The researchers developed a Disease Similarity Network (DSN) that uses gene expression data to map disease relationships, explaining 64% of known disease co-occurrences. They demonstrate that many comorbidities, such as inflammatory bowel disease (IBD) and various cancers, share common molecular pathways, particularly immune-related processes. The study also identifies previously underdiagnosed comorbidities, offering insights that could inform therapeutic strategies. A web application is provided for exploring these molecular insights and the relationships between diseases and their associated molecular mechanisms.

Conclusion:
This work underscores the importance of patient stratification and molecular profiling in understanding disease co-occurrences, potentially improving diagnosis and treatment by revealing hidden connections between diseases.

Reference:
Urda-García, B., Sánchez-Valle, J., Lepore, R., & Valencia, A. (2025). Patient stratification reveals the molecular basis of disease co-occurrences. *Proceedings of the National Academy of Sciences, 122*(35), e2421060122. https://doi.org/10.1073/pnas.2421060122

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/

Support:
If you'd like to support Base by Base, you can make a one-time or monthly donation here: https://basebybase.castos.com/

On PaperCast Base by Base, you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Keywords: disease co-occurrence, RNA sequencing, patient stratification, immune system, molecular mechanisms.

Chapters

  • (00:00:00) - Deep Dive: The molecular logic of disease co-occurrences
  • (00:04:43) - RNA Sequencing: the game changer
  • (00:06:55) - The Nature of the Disease similarity network
  • (00:09:52) - Discovery of the disease network
  • (00:12:56) - The Social Science Network (SSN) and the Metapat
  • (00:16:56) - Measuring the molecular basis of diseases
View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Foreign. [00:00:14] Speaker B: Welcome to Base by Base, the papercast that brings genomics to you wherever you are. Have you ever noticed how sometimes when you're dealing with one health issue, another one, maybe something totally unexpected seems to pop up? [00:00:28] Speaker A: Yeah, it happens all the time. [00:00:29] Speaker B: Or even weirder, how sometimes having one disease might actually somehow protect you from getting another. [00:00:35] Speaker A: Uh huh. Those negative correlations are fascinating too. [00:00:38] Speaker B: It's almost like our bodies have these. These hidden conversations going on between different conditions. Right. Influencing each other behind the scenes. [00:00:46] Speaker A: Exactly. It's definitely more than just coincidence. [00:00:49] Speaker B: We're talking about these really deep, often invisible molecular dialogues that could explain these complex links. So today we're doing a deep dive. We want to pull back that cushion, look at those fundamental biological connections and important figure out what they mean for your health. Just imagine for a second maybe you're managing a chronic illness, something you live with day to day, and then suddenly a completely new condition appears. Seems unrelated. But is it. [00:01:17] Speaker A: Or you hear about someone with say, disease X who just never seems to get disease Y? [00:01:23] Speaker B: Exactly. These patterns, these disease co occurrences, the technical term is comorbidities. They have a massive impact across the globe. [00:01:31] Speaker A: Oh, absolutely. Huge. They make patient care way more complicated. They sadly increase mortality rates and the. [00:01:38] Speaker B: Strain on healthcare systems immense. [00:01:41] Speaker A: Right. And for such a long time, the basic why behind these patterns, why these specific diseases cluster together. It's been, well, pretty hard to pin down. [00:01:51] Speaker B: We've sort of known what diseases hang out together based on big population study. [00:01:56] Speaker A: The epidemiological evidence has been strong for a while. [00:01:59] Speaker B: But the how, the actual biology, that's been the tricky part. Like you said, knowing two people are friends, but not knowing why they connect precisely. [00:02:06] Speaker A: The molecular mechanisms, the actual biological conversation, that's been the missing piece. [00:02:10] Speaker B: Okay, and this is where today's deep dive gets really exciting. [00:02:13] Speaker A: Yeah. [00:02:14] Speaker B: What if we told you that new breakthroughs are letting us decode these disease relationships? Like actually read the molecular signals and. [00:02:21] Speaker A: Even start predicting new connections just by looking at which genes are active or inactive inside our cells. [00:02:27] Speaker B: It's not just spotting patterns anymore. It's about understanding the biological langu language they're speaking. [00:02:32] Speaker A: Exactly. [00:02:32] Speaker B: So how could this totally change the game? How we think about preventing diseases, diagnosing them earlier, treating them more effectively. For you, for everyone. Today we really want to celebrate the incredible work of the research team led by Beatriz Erda Garcia, Jean Sanchez Vale, Razal Boulipour and Alfonso Valencia. [00:02:52] Speaker A: Yeah, fantastic work. [00:02:53] Speaker B: They're affiliated with places like the Barcelona Supercomputing center and, and Universitat Pompeo fabra. And their research has seriously pushed forward our understanding of the molecular roots of these disease connections. It gets us much closer to answering those why questions. [00:03:08] Speaker A: It really does. And maybe before we dive into how they do it, let's just quickly make sure we're all on the same page with the background. [00:03:13] Speaker B: Good idea. [00:03:14] Speaker A: So comorbidity basically just means having two or more different health conditions at the same time. And like we mentioned, epidemiology, looking at large populations, often using electronic health records, has shown us again and again these pairings aren't random. Some diseases definitely show up together more often than you'd expect by chance, and others, well, they seem to avoid each other. [00:03:38] Speaker B: So we had all this observational data, mountains of it, sometimes pointing out these connections. But the traditional methods, even with all that data, they kind of hit a wall, didn't they? They told us what, but not really how or why. [00:03:50] Speaker A: Biologically, that's the crux of it. What's actually going on at the cellular level, the genetic level that links these conditions. [00:03:58] Speaker B: Right. [00:03:58] Speaker A: Previous attempts tried to bridge that gap. People looked at things like protein, protein interactions, or used older tech like microarrays. But honestly, they didn't have huge success in explaining the big picture. From the epidemiological studies, they could maybe explain a small percentage, sometimes only like 8%, maybe 16% of the known lengths. [00:04:18] Speaker B: So a huge chunk of the Y. [00:04:19] Speaker A: Was just missing pretty much a big gap in our understanding of the fundamental biology. [00:04:24] Speaker B: And this isn't just, you know, an interesting scientific question. It's got real world urgency. [00:04:30] Speaker A: Absolutely. As populations get older, globally, we're seeing more and more people with multiple conditions. [00:04:36] Speaker B: So figuring this out is critical for public health, for improving how we care for individual patients. It's really vital research. [00:04:43] Speaker A: Okay, so if those older methods couldn't quite crack it, how did this team break through? What was their innovative angle? [00:04:51] Speaker B: Well, they really harnessed the power of something called RNA sequencing or RNA seq, specifically, large scale, publicly available RNA SEQ data. [00:05:01] Speaker A: RNA seq tell us a bit more about why that's such a game changer here. [00:05:05] Speaker B: Sure. Compared to older methods like microarrays, RNA SEQ gives you much better sensitivity, it's more reproducible, and it can measure a much wider range of gene activity levels. [00:05:14] Speaker A: So it gives you a much clearer, more detailed picture. [00:05:16] Speaker B: Exactly. A really detailed snapshot of which genes are switched on or off in a particular disease state. It's ideal for mapping out these gene expression signatures. [00:05:25] Speaker A: And they didn't just look at One or two diseases? No, this was ambitious. They analyzed data from over 2,700 samples covering 45 different human diseases. A huge undertaking. And their method was pretty smart. First step was the gene expression analysis. [00:05:40] Speaker B: Okay. [00:05:40] Speaker A: They took the raw rna, SEQ data, gene counts, processed it all uniformly, did a lot of quality checks, normalization, the works. [00:05:48] Speaker B: Standard stuff to make sure the day is reliable. [00:05:50] Speaker A: Right. Then they identified what they call significantly differentially expressed genes. [00:05:56] Speaker B: SDEGs for short, meaning genes that were way more active or way less active in people with the disease compared to healthy people. [00:06:04] Speaker A: Precisely. Those are the key players for each disease's molecular signature. [00:06:08] Speaker B: Okay, so you have this list of important genes for each disease, but how do you know what they actually do, biologically speaking? [00:06:14] Speaker A: Ah, that's the next crucial step. Functional enrichment analysis. They didn't just stop at the gene list. Right. They used methods like gene set enrichment analysis or GSCA. This involves checking those lists of SDEGs against big databases of known biological pathways, things like reactome, kg, gene ontology. [00:06:32] Speaker B: So it's like connecting the dots, seeing if the active genes in a disease are all involved in, say, the immune system or metabolism. [00:06:41] Speaker A: Exactly. It lets you see the bigger picture. Which biological processes are really being disrupted in each disease. Moving from just listing the words, the genes, to understanding the sentences in paragraphs, the pathways. [00:06:54] Speaker B: That makes a lot of sense. Okay, so they've got this incredibly rich molecular data, understanding the pathways involved. How did they use that to map the relationships between diseases? You mentioned two network models. [00:07:06] Speaker A: Yes, two key models. The first one was the disease similarity network, or dsm. [00:07:10] Speaker B: Dsm. Okay. [00:07:11] Speaker A: Think of it as a map. It connects diseases based on how similar their gene expression profiles are. [00:07:16] Speaker B: So if two diseases have lots of the same genes going up or down the same pathways being affected, they get. [00:07:21] Speaker A: A strong link in the network. It shows they share a kind of molecular fingerprint that was a foundational map. [00:07:27] Speaker B: Okay, that seems logical. But then they did something really novel, introducing this concept of metapatients. You mentioned this earlier. Tell us about that. And the second network, the ssn. [00:07:36] Speaker A: Right. The metapatients idea is, I think, really where they broke new ground, and it led to the stratified similarity network, the ssn. [00:07:43] Speaker B: So what is a metapatient? It sounds intriguing. [00:07:45] Speaker A: It tackles a fundamental problem. Patients with the same disease diagnosis can actually be very different at the molecular level. Think about breast cancer. It's not just one thing. [00:07:55] Speaker B: Right, Right. They're different types. Er, positive, triple negative. [00:07:58] Speaker A: Exactly. So instead of lumping all breast cancer patients together, they use clever computer algorithms. Clustering methods to find subgroups within that disease. [00:08:07] Speaker B: Label subgroups based on. [00:08:09] Speaker A: Based on having really similar gene expression patterns. These molecularly defined subgroups are what they called metapatients. [00:08:17] Speaker B: Ah, okay, so it's like refining the categories. Instead of just disease A, you have disease a type 1 profile, disease a type 2 profile based on the actual gene activity. [00:08:27] Speaker A: Precisely. It captures that hidden heterogeneity. [00:08:29] Speaker B: And how did that change the network? The ssn. [00:08:32] Speaker A: The SSN then took this much more detailed granular information and built upon the dsn. It didn't just look for links between whole diseases anymore, it looked deeper. Yeah, it looked for similarities between these specific metapatient groups from different diseases. Or even between a whole disease and a specific metapatient group from another disease. Much more nuanced. [00:08:53] Speaker B: Wow, okay, that sounds much more powerful. But how did they check if these networks, especially the complex ssn, were actually reflecting reality, not just finding patterns in the data? [00:09:06] Speaker A: Great question. Validation was key. They compared the links they found in their DSN and SSN against established epidemiological networks. Like ones based on those big health record studies we talked about, to see. [00:09:18] Speaker B: If the molecular links matched the real world co occurrences. [00:09:22] Speaker A: Exactly. And they compared against other types of molecular networks too. It was about seeing if their findings lined up with what we already know from different lines of evidence. [00:09:31] Speaker B: Makes sense. And they even built a tool for people to explore this. [00:09:34] Speaker A: They did. An interactive web app lets anyone dive in and explore these disease connections and the underlying molecular pathways. We'll definitely put a link to that in the description. Pretty cool. [00:09:43] Speaker B: Fantastic. Okay, so the methodology is clear. Using RNA seq, identifying pathways, building these networks, and introducing meta patients. Let's get to the payoff. What did they actually find? What were the big revelations? [00:09:55] Speaker A: The results were, frankly, quite stunning. Let's start with that first network, the dsn. [00:10:00] Speaker B: The one based just on overall disease profiles. [00:10:02] Speaker A: Right. Just using gene expression similarity, that DSN managed to capture 46.2% of known medically established disease co occurrences. [00:10:11] Speaker B: 46.2%, you said earlier. The older methods were down around 8% or 16%. [00:10:16] Speaker A: Exactly. So this was a massive jump. It strongly suggested that a large chunk of why certain diseases occur together has a real detectable basis in shared molecular activity. Much more than we could see before. [00:10:28] Speaker B: That's incredible. What kind of molecular mechanisms were driving these connections? Were there common themes? [00:10:33] Speaker A: Oh, yeah. One theme stood out dramatically. The immune system. [00:10:36] Speaker B: Really? How so? [00:10:38] Speaker A: Get this. 95.2% of the disease links they captured involved at least one shared overexpressed immune system. [00:10:44] Speaker B: Pathway 95%. [00:10:45] Speaker A: And on average, linked diseases shared over 21 altered immune pathways. It paints a picture where immune system dysfunction is like a central hub connecting many different conditions. Wow. [00:10:58] Speaker B: So inflammation and immunity are key players. Were other pathways involved, too? [00:11:02] Speaker A: Definitely. Things related to the extracellular matrix, the scaffolding around cells. Metabolism and cell signaling pathways were also very commonly shared. Involved in over 90% of the connections. [00:11:13] Speaker B: So it's like shared biological machinery going wrong in similar ways. Can you give some examples of diseases the DSN linked together? [00:11:21] Speaker A: Sure. It picked up connections you might expect. Like different inflammatory bowel diseases grouping together. Or similarities between lung cancer and liver cancer. [00:11:29] Speaker B: Okay, makes sense. [00:11:30] Speaker A: But also caught connections that are clinically known, but maybe less obvious molecularly at first glance. Like Kaposi's sarcoma and hiv. [00:11:39] Speaker B: Ah, right, because HIV weakens the immune system, allowing Kaposi's to develop. [00:11:43] Speaker A: Exactly. So the network was capturing these biologically meaningful links, both the straightforward ones and the more complex ones. [00:11:49] Speaker B: That covers diseases that do co occur. What about the flip side? Did the DSN find diseases that seem to push each other away, that don't tend to occur together? [00:11:58] Speaker A: Yes, and this is just as insightful. They found significant negative correlations, too. [00:12:04] Speaker B: Like what? [00:12:05] Speaker A: Well, a really interesting example was Huntington's disease. It showed consistent negative links with several types of cancer. Liver, lung, breast cancer. [00:12:15] Speaker B: So people with Huntington's were less likely to get these cancers? [00:12:18] Speaker A: The data suggests that. And when they looked at the molecules, it made sense. Cancers usually involve cells dividing too much. Right. Overactive cell cycle genes, lots of gene transcription. [00:12:28] Speaker B: Right. [00:12:28] Speaker A: Huntington's, on the other hand, showed the opposite molecular signature in many ways. Increased cell death, problems with mitochondria, and crucially, negative regulation of gene transcription. [00:12:39] Speaker B: So, completely opposing molecular forces at play. [00:12:43] Speaker A: Pretty much around 85% of the pathways they shared showed opposite directions of change. It suggests a kind of molecular antagonism between the conditions. [00:12:51] Speaker B: Fascinating. A biological reason why they might not coexist easily. Okay, but then came the metapatients and the ssn. You said that unlocked even more. [00:13:01] Speaker A: It really did. The stratified similarity network ssn, the one using metapatients, pushed the boundary even further. It successfully captured 64.1% of known comorbidities. [00:13:11] Speaker B: 64%. So going from 46% with the DSN to over 64%. Just by accounting for patient subgroups. [00:13:18] Speaker A: Yep, A huge jump. Almost an 18% increase in recall. It really drives home how important that patient stratification is. You miss a lot of connections if you treat diseases as uniform blocks. [00:13:29] Speaker B: Because different subgroups within the same disease can have totally different risk profiles for other conditions. [00:13:34] Speaker A: Exactly. Let's go back to breast cancer for an example. The SSN showed that a negative link with multiple sclerosis seems specific to patients in the ER positive and ER negative. [00:13:45] Speaker B: Metapatient groups, but not others, like triple negative. [00:13:48] Speaker A: Apparently not in this analysis. And conversely, positive links with things like autism or bipolar disorder were mainly seen in the triple negative and ER negative metapatients. [00:13:56] Speaker B: Wow. So your specific type of breast cancer down to the molecular level could really influence your likelihood of developing other, seemingly unrelated conditions. [00:14:06] Speaker A: That's what the data strongly suggests. It's not just one disease interacting with another. It's specific molecular subtypes interacting. [00:14:13] Speaker B: That's incredibly precise. And this wasn't just about refining known links. Right. Did the SSN uncover potentially new connections, Things maybe underdiagnosed or overlooked? [00:14:23] Speaker A: Absolutely. That was another major outcome. It started highlighting connections that might not be widely recognized yet. [00:14:29] Speaker B: Such as? [00:14:30] Speaker A: Well, things like links between COPD and schizophrenia or celiac disease and asthma or lupus muscular dystrophy, showing molecular connections to cardiomyopathy or Ms. [00:14:40] Speaker B: Okay, some of those are maybe less surprising than others, but seeing the molecular basis is key. [00:14:45] Speaker A: Right. And take down syndrome. The SSN confirmed known links, like with autism, but it also highlighted that increased risk of childhood leukemia alongside a decreased risk of many solid tumors. [00:14:57] Speaker B: That dual risk profile, again, really complex. [00:15:00] Speaker A: Very complex. And the SSN helped map the molecular patterns underlying both sides of that coin. [00:15:06] Speaker B: So how sure were they about these new potential links? Did they check them out further? [00:15:11] Speaker A: They did. They went back to the literature, used other epidemiological databases for comparison. And after that extra validation, the precision of their predicted links shot up to about 76.6%. [00:15:23] Speaker B: Meaning over three quarters of the connections their SSN predicted seemed to be real, validated comorbidities. [00:15:30] Speaker A: Exactly. Very high confidence, providing strong molecular evidence for these relationships, both known and newly suggested. [00:15:37] Speaker B: Oh, okay. This is huge. Stepping back, what's the big picture here? What does this mean for medicine for understanding health, for. Well, for us? [00:15:45] Speaker A: I think the most profound implication is seeing this really strong agreement between the molecular networks and the real world epidemiological patterns. [00:15:53] Speaker B: And not just numbers matching up. [00:15:54] Speaker A: No, it strongly suggests that these shared molecular mechanisms aren't just associated with comorbidities. They are often fundamentally driving them. [00:16:01] Speaker B: So we're moving beyond correlation to potentially understanding causation. [00:16:05] Speaker A: We're getting much closer. It could be shared underlying problems or one disease literally causing molecular changes that lead to another. Or maybe some Third factor influencing both. But the molecular link is key. [00:16:19] Speaker B: And this idea of metapatience, understanding that molecular variation within a disease, that sounds like the core of personalized medicine, doesn't it? [00:16:28] Speaker A: Absolutely. It's a massive argument for it. If we can understand a patient's specific molecular subtype, their metapatient profile, we could. [00:16:36] Speaker B: Potentially predict which other conditions they're more likely to develop. [00:16:40] Speaker A: Yes, and develop targeted prevention strategies or treatments. Imagine tailoring care based not just on the disease label, but on the patient's unique molecular landscape. That could lead to much earlier interventions, much more effective management. [00:16:53] Speaker B: Proactive, precise, personalized. [00:16:55] Speaker A: That's the goal. [00:16:56] Speaker B: Now, this is obviously incredible work, but like all science, it has limitations. Right. Areas where more research is merit, of. [00:17:02] Speaker A: Course, and the researchers were clear about this. One big challenge is systematically validating those negative comorbidities, the diseases that avoid each other. We just don't have as much solid epidemiological data focused specifically on that yet. [00:17:17] Speaker B: Okay, that makes sense. What else? [00:17:19] Speaker A: Well, many of the existing epidemiological data sets used for comparison are based on older populations in industrialized countries. [00:17:27] Speaker B: So we might be missing patterns important for younger people or people in different parts of the world. [00:17:32] Speaker A: Potentially, yes. We need more diverse data sets. And ideally future studies would have even more detailed patient info. Age, sex, treatments they received, plus data from multiple tissues, not just one. That would add even more richness. [00:17:47] Speaker B: So still plenty of room to build on this. [00:17:49] Speaker A: Oh, definitely. These aren't dead ends, they're exciting next steps. [00:17:52] Speaker B: Which points to a bright future for this kind of research. [00:17:55] Speaker A: Absolutely. People are already thinking about expanding sample sizes, looking specifically at sex differences, digging into the non coding parts of the. [00:18:03] Speaker B: Genome, the dark matter of the genome, kind of. [00:18:06] Speaker A: And integrating other types of omics data, looking at proteins, metabolites, getting a truly complete picture. [00:18:12] Speaker B: And the practical applications, drug development, huge potential there. [00:18:16] Speaker A: Understanding these shared pathways could help us repurpose existing drugs, find medicines already approved for one condition that might work for. [00:18:23] Speaker B: Related comorbidity, or design totally new drugs that target the core mechanisms linking diseases. [00:18:29] Speaker A: Exactly. It's a roadmap towards smarter, more integrated therapies. So we boil it all down. The main takeaway, disease co occurrences aren't random. They're deeply rooted in shared molecular biology, especially in how our genes are expressed. [00:18:43] Speaker B: And crucially, by looking deeper, by stratifying patients into these metapatient groups based on their molecular signatures, we get a much. [00:18:51] Speaker A: Much clearer picture of how diseases relate to each other. And we can even predict new connections with pretty impressive accuracy. [00:18:58] Speaker B: It really does point towards that future of truly personalized medicine. A future where understanding your specific molecular profile could transform how we prevent, diagnose, and treat complex health journeys. What does this mean for how we diagnose and treat complex diseases? Moving from that one size fits all model to something truly tailored based on the unique molecular symphony happening inside each of us. This episode was based on an Open Access article under the CC BY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description. Thanks for listening and join us next time as we explore more science base by base. [00:19:45] Speaker A: Sam.

Other Episodes