Episode Transcript
[00:00:00] Speaker A: Foreign.
Welcome to Base by Bass, the papercast that brings genomics to you wherever you are. Thanks for listening and don't forget to follow and rate us in your podcast. Appreciate.
[00:00:28] Speaker B: I am really looking forward to getting into this one today.
[00:00:31] Speaker A: So today I want to start with a number that should honestly terrify anyone who is waiting for a new medicine.
90%.
[00:00:39] Speaker B: That is a really rough statistic.
[00:00:41] Speaker A: It really is. I mean, that is the failure rate of drug candidates entering clinical trials. Just imagine if an architect built 10 skyscrapers. You know, fully funded the construction, employed thousands of people, and then watched nine of them just collapse into a pile of rubble before a single tenant moved in.
[00:00:59] Speaker B: Uh, it's a staggering inefficiency. And the tragedy isn't just the money lost, you know, it's the time lost for the patients waiting for those drugs.
[00:01:06] Speaker A: Exactly. And the root of the problem isn't that we lack ideas. I mean, we have massive libraries of genetic data, millions of variations linked to diseases. We have become very good at finding the scene of the crime in the
[00:01:17] Speaker B: genome, but we don't have the index.
[00:01:19] Speaker A: We really don't.
We know a genetic variation is present at the crime scene, but we don't know if it's the mastermind, an accomplice, or just some innocent bystander walking their dog.
[00:01:30] Speaker B: Which is a crucial distinction because if you target the bystander with a drug, nothing happens to the actual disease. The building collapses.
[00:01:38] Speaker A: So here's the premise for today's deep dive. What if we could run a clinical trial without dosing a single patient? What if we could use nature's own randomized experiments, spread across 1.2 million people, to pinpoint the exact biological switches that control human health?
[00:01:56] Speaker B: And that brings us to the study we are looking at today. They tested over 30 million potential connections to do exactly that.
[00:02:02] Speaker A: 30 million?
[00:02:03] Speaker B: Yeah. It's a massive undertaking. And they didn't just find new targets. They actually managed to rediscover drugs we already use, but for diseases we never expected them to treat.
[00:02:13] Speaker A: This isn't just some small academic exercise we're talking about, is it?
[00:02:16] Speaker B: Definitely not.
[00:02:17] Speaker A: Today we celebrate the massive collaborative work led by Brian R. Farolito and his colleagues. This is a heavy hitter lineup, it really is.
[00:02:25] Speaker B: We are talking about a major effort involving the Million Veteran Program or mvp. The VA Healthcare System, the Broad Institute of MIT and Harvard, plus collaborators from the UK Biobank and Fingan.
[00:02:39] Speaker A: Basically the avengers of biobanking.
[00:02:42] Speaker B: Pretty much. They are integrating data from the largest biobanks on the planet. And that scale is entirely Non negotiable here.
[00:02:49] Speaker A: Why is the scale so important?
[00:02:51] Speaker B: Because when you are trying to cut through the noise of human biology to find a real causal signal, you need hundreds of thousands of participants. Or the math just doesn't work out.
[00:03:01] Speaker A: Okay, let's set the stage on the problem they're actually trying to solve. For those of you listening, we've discussed GWS genome wide association studies on the show before.
[00:03:10] Speaker B: Right. We've gotten really efficient at scanning the genome and finding those dots that light up for certain diseases.
[00:03:15] Speaker A: So we know where to look.
[00:03:16] Speaker B: Generally, we know the neighborhood. We can say people with this genetic marker are statistically more likely to have high blood pressure.
[00:03:22] Speaker A: But that's the crucial gap, isn't it?
[00:03:24] Speaker B: Yeah, because GWA doesn't give you the mechanism. It tells you there is an association, but it doesn't tell you what to do about it.
[00:03:30] Speaker A: Because a genetic marker is just a signpost.
[00:03:33] Speaker B: Exactly. It doesn't tell you if the gene is cranking up a protein or shutting it down entirely.
[00:03:37] Speaker A: And if you're making a drug, you absolutely need to know that you do.
[00:03:41] Speaker B: Do I block this pathway or do I boost it?
If you guess wrong, you just spent a billion dollars to make a drug that does the exact opposite of what you wanted.
[00:03:50] Speaker A: Which is an incredibly expensive coin flip.
[00:03:52] Speaker B: It is.
So to solve this, we have to integrate GWAs with Olmex, specifically Transcriptomics, which is the RNA or the recipe, and proteomics, the actual proteins or the cake.
[00:04:05] Speaker A: Moving from association to causality.
[00:04:07] Speaker B: That is the whole mission of this deep dive. Harmonizing all that data to prioritize drug targets that have actual genetic validation.
[00:04:15] Speaker A: The statistic that jumped out at me from the reading was about the success rate when you actually have that validation.
[00:04:20] Speaker B: It's wild. Historically, if a drug target has genetic evidence backing it up, it is twice as likely to succeed in clinical trials compared to one that doesn't.
[00:04:30] Speaker A: Doubling your odds in a 90% failure industry is huge.
[00:04:34] Speaker B: It's a total game changer.
[00:04:36] Speaker A: So let's unpack the how. How do you physically do this without running a trial? I guess it starts with the data set size.
[00:04:42] Speaker B: The scale is the engine. Here's they Meta analyzed over 1.2 million individuals across 2003 different phenotypes, which is
[00:04:51] Speaker A: just scientific shorthand for traits or diseases.
[00:04:54] Speaker B: Right. And the methodology they used is Mendelian randomization or mler.
[00:04:58] Speaker A: I've heard this described as nature's clinical trial, but that can sound a bit, you know, abstract. How does it actually work in practice?
[00:05:05] Speaker B: Well, think about a Standard clinical trial, a researcher flips a coin to decide if you get the drug or the placebo.
[00:05:11] Speaker A: And that randomization is magic because it removes bias.
[00:05:14] Speaker B: Exactly. It ensures that the two groups are identical, except for that one drug. It prevents things like, oh, the people taking the drug also happen to exercise more from ruining your data.
[00:05:24] Speaker A: So you know that the result is because of the drug, not because one group was younger or ate healthier.
[00:05:29] Speaker B: Right. Now, in Mendelian randomization, nature flips the
[00:05:33] Speaker A: coin for us at conception when you inherit your DNA.
[00:05:36] Speaker B: Exactly. It's completely random. Some people inherit a variant that naturally raises the level of a specific protein in their blood just a little bit
[00:05:46] Speaker A: higher than average, while others inherit a variant that lowers it.
[00:05:49] Speaker B: Precisely. Okay, so these people have essentially been on a natural dose of that protein their entire lives.
[00:05:56] Speaker A: Wow.
[00:05:56] Speaker B: Yeah. So if we look at the people who naturally have higher levels of protein X because of their genetics, and we see they have a significantly lower risk
[00:06:05] Speaker A: of heart disease, we can infer causality.
[00:06:08] Speaker B: Yes. We can say protein X likely protects against heart disease and it cuts right through the confusion of lifestyle factors. Because your genes don't care if you smoke or jog. They are fixed at birth.
[00:06:19] Speaker A: But they didn't just look at the genome and the disease, they looked at the steps in between.
[00:06:22] Speaker B: Yes. This is where those instruments come in. You can't just draw a straight line from gene to disease. You need the bridge.
[00:06:28] Speaker A: Right.
[00:06:29] Speaker B: So they looked at EQTLs, which are expression quantitative trait loci, that measures how much RNA is being produced.
[00:06:35] Speaker A: The instructions.
[00:06:36] Speaker B: Exactly. And they also looked at PQTLs, protein quantitative trait loci, which measures how much the protein is actually floating in the blood.
[00:06:44] Speaker A: They pulled this from massive databases too, didn't they? Like gtex, Eric Decode.
[00:06:49] Speaker B: Yes. Those are basically the gold standards for this kind of molecular data.
[00:06:52] Speaker A: And the computational lift here must have been absurd.
[00:06:55] Speaker B: It was. They performed two sample Mr. On every single combination.
That is 31.5 million unique gene trait associations tested.
[00:07:05] Speaker A: 31.5 million tests. I mean, if you did that manually, you'd be done in about 3,000 years, easily.
[00:07:11] Speaker B: But you can't just trust every result a computer spits out. Obviously, there is noise in the data, tons of noise.
[00:07:16] Speaker A: That's why they applied strict filtering. They look for concordance.
[00:07:19] Speaker B: Right. Imagine you have three witnesses to a crime. One is the RNA data, one is the protein data from the UK and one is protein data from Iceland.
[00:07:28] Speaker A: And if one says the suspect is tall and the other says the suspect is short, you throw the case Out.
[00:07:33] Speaker B: You only want the cases where everyone agrees on the description.
They only kept signals where different data sources agreed on the direction. Like does more protein equal more disease?
[00:07:45] Speaker A: If the data conflicted, they tossed it.
[00:07:47] Speaker B: Exactly. They needed robust concordant signals.
[00:07:51] Speaker A: Now this is where it gets really cool for me. They didn't just stop at finding links, they. They basically built a drug hunter robot using machine learning.
[00:08:00] Speaker B: They used XGBoost, which is a gradient boosting algorithm.
But the tech itself isn't as important as their strategy.
[00:08:07] Speaker A: What do you mean by their strategy?
[00:08:08] Speaker B: They needed a truth set. So they went to the Chemble database and pulled a list of all approved successful drugs.
[00:08:15] Speaker A: So they gave the computer the answer key?
[00:08:17] Speaker B: In a way, yeah. They told the model. Here is what a winner looks like. Genetically.
These are the genetic patterns of targets that actually became drugs. And then now look at our 31 million new test results and rank them based on how much they resemble these known winners.
[00:08:32] Speaker A: It's like training a sniffer dog. You give it the scent of the target and send it into the woods. So what did the dog find out of those 31 million?
[00:08:39] Speaker B: They narrowed it down to 69,669 gene trait pairs with strong causal evidence.
[00:08:45] Speaker A: That's a massive reduction.
[00:08:47] Speaker B: It is. And the statistical bar was incredibly high. A P value less than 1.6 times
[00:08:53] Speaker A: 10 to the negative 9, which is, well, practically zero.
[00:08:57] Speaker B: It means there is virtually zero chance of it just being a coincidence.
[00:09:00] Speaker A: Okay, but I am going to play the skeptic here for a second.
Anyone can generate a list of 70,000 things and say these are important. How do we know this method actually works in the real world?
[00:09:11] Speaker B: That is the validation step.
[00:09:12] Speaker C: And?
[00:09:13] Speaker B: And it's crucial to prove the model works. They checked if it could find drugs. We already have. It's called rediscovery.
[00:09:20] Speaker A: So if the model is smart, it should be able to look at the data and say, hey, this HMGCR gene looks like a great target for cholesterol.
[00:09:27] Speaker B: Which we know is true because that's what statins target. And it did exactly that. It rediscovered 9% of all approved drug targets purely through this blind data analysis.
[00:09:37] Speaker A: 9%? Initially. That sounds kind of low. I feel like if I built a robot to find cars and it only found 9% of the cars in the parking lot, I'd be pretty worried.
[00:09:44] Speaker B: But context is everything here.
Remember, they are looking at the entire genome blindly.
[00:09:50] Speaker A: Right?
[00:09:50] Speaker B: They aren't looking in a parking lot. They're looking at the whole city from space.
Finding 9% of the entire pharmacopoeia without knowing what you are looking for is actually wildly impressive.
[00:10:02] Speaker A: That makes a lot of sense when you put it that way.
[00:10:04] Speaker B: Plus, look at where it succeeded. It was incredibly good at finding cardiovascular targets because we have excellent data on lipids and. And blood pressure.
[00:10:13] Speaker A: And less good at cancer.
[00:10:14] Speaker B: Right. Cancer is complex, it's localized, and the datasets just aren't as robust for those specific pathways yet.
[00:10:21] Speaker A: But there's a kicker regarding the mechanism, right?
[00:10:23] Speaker B: Oh, definitely. For the drugs. It did rediscover the genetics correctly predicted the mechanism of action, meaning whether it's an inhibitor or an activator. 84% of the time.
[00:10:34] Speaker A: That brings us back to that expensive coin flip. If this method tells you to block a protein, you. You can be 84% confident that blocking is the right move, not boosting.
[00:10:42] Speaker B: Exactly. That saves years of failed experiments. It provides directional certainty, which is just invaluable.
[00:10:49] Speaker A: Let's talk about the treasure hunt aspect of this repurposing. Finding old drugs that can learn new tricks. This seems like the absolute lowest hanging fruit for pharma.
[00:10:59] Speaker B: It is, because these drugs are already proven safe in humans. The study found 3,364 potential repurposing opportunities.
[00:11:08] Speaker A: Give us the highlights. What really stood out to you?
[00:11:10] Speaker B: Well, take metformin. It's the frontline drug for type 2 diabetes. We consume tons of it globally.
[00:11:16] Speaker A: Yeah.
[00:11:16] Speaker B: The data suggests it has a causal link to reducing atrial fibrillation.
[00:11:20] Speaker A: Afib. That's a heart rhythm issue that feels pretty disconnected from blood sugar.
[00:11:24] Speaker B: You would think so, but it suggests there is a metabolic component to heart rhythm that we are underestimating.
[00:11:29] Speaker A: Fascinating.
[00:11:29] Speaker B: Then there is coselizumab. That's an arthritis drug. It targets the IL6 receptor to lower inflammation in joints.
[00:11:36] Speaker A: And the model flagged that for afib, too.
[00:11:39] Speaker B: Yes, it.
[00:11:40] Speaker A: So we are seeing inflammation pathways popping up in heart conditions.
[00:11:43] Speaker B: Precisely. It's blurring the lines between our medical specialties. Inflammation is inflammation, whether it's in your knee or your heart atria.
[00:11:50] Speaker A: The genetics don't care about our medical textbook chapters.
[00:11:52] Speaker B: Exactly. Another really interesting one was enroutenzumab, which is an IL13 antagonist usually looked at for colitis. The data flagged it as a strong candidate for psoriasis.
[00:12:03] Speaker A: It really emphasizes how connected these immune systems are.
[00:12:06] Speaker B: It does.
[00:12:07] Speaker A: Speaking of the heart, the paper had a specific vignette or a case study study on lipids. That seemed to be the proof of concept for their whole approach.
[00:12:16] Speaker B: Yes, the dyslipidemia dive. They wanted to see if they could Find a brand new way to lower cholesterol using this method.
[00:12:23] Speaker A: And they recovered the known hits, like PCSK9 and HMGCR.
[00:12:27] Speaker B: Right. We know PCSK9 is the bad guy. It destroys the receptors that clear bad cholesterol from your blood. And we already have drugs that inhibit it.
[00:12:34] Speaker A: Okay, so that's the control. It found the stuff. We know. But what was the new find?
[00:12:38] Speaker B: They identified a high probability target called ANXA2 or NXNA2. This isn't a drug target we currently use.
[00:12:46] Speaker A: What does ANXA2 actually do?
[00:12:48] Speaker B: The mechanism is really elegant. ANXA2 is a natural inhibitor of PCSK9.
[00:12:53] Speaker A: Wait, let me get this straight. So PCSK9 inhibits the cholesterol cleaners?
[00:12:56] Speaker B: Yes, it destroys them.
[00:12:57] Speaker A: And ANXA2 inhibits PCSK9?
[00:13:01] Speaker B: Correct. It's a double negative. ANXA2 inhibits the inhibitor.
[00:13:04] Speaker A: So it's basically taking the brakes off the cleaners.
[00:13:07] Speaker B: That is a perfect way to look at it. So logically, if you can boost ANXA2 or mimic its effects, you stop PCSK9
[00:13:16] Speaker A: from doing its damage, which lowers the cholesterol.
[00:13:18] Speaker B: Exactly. And the model flagged this as a top tier candidate. It's effectively handing pharma a roadmap for a new class of cholesterol drugs.
[00:13:27] Speaker A: That's the aha moment. Nature essentially already has a drug for high cholesterol inside us. We just need to figure out how to bottle it.
[00:13:34] Speaker B: It validates that the method can find biological logic, not just random statistical spikes.
[00:13:39] Speaker A: So, zooming out a bit. If I am sitting in a boardroom at Pfizer or Merck right now, why does this paper matter to my bottom line?
[00:13:46] Speaker B: It matters because it shifts your betting odds. Instead of guessing based on a mouse model that might not translate to humans at all, you are prioritizing targets that have human genetic validation.
[00:13:56] Speaker A: You are placing smarter bets.
[00:13:58] Speaker B: Exactly. It's an efficiency roadmap.
[00:14:00] Speaker A: But it's not just about finding hits, is it? It's also about avoiding disasters.
[00:14:05] Speaker B: That's the other side of the coin. Safety.
The data reveals risks just as clearly as benefits. For example, the study flagged the target of a drug called Trastuzumab.
[00:14:16] Speaker A: That's a breast cancer drug, right? Extremely effective for HER2 positive cancer, yes.
[00:14:21] Speaker B: But clinically, we know it has a nasty side effect.
Cardiotoxicity. It can cause heart failure.
[00:14:27] Speaker A: And the genetic method picked this up blindly?
[00:14:29] Speaker B: It did. The algorithm flagged the gene associated with Trastuzumab as being causally linked to heart failure.
[00:14:36] Speaker A: That is incredible. It implies that if we had Run this analysis before the drug was ever invented. We would have known about the heart failure risk.
[00:14:43] Speaker B: Exactly. It allows us to anticipate toxicity. If you see a genetic link to a serious side effect, you design your clinical trial differently.
[00:14:51] Speaker A: You monitor hearts from day one.
[00:14:53] Speaker B: Or if the risk is simply too high, you kill the program before you spend a billion dollars. It turns unforeseen side effects into foreseen risks.
[00:15:01] Speaker A: We do have to be realistic, though. This sounds like a crystal ball, but it can't be perfect. Where does it fail?
[00:15:06] Speaker B: Oh, it is absolutely not perfect.
We mentioned directionality earlier. Getting that block versus boost decision right is tricky, even with that 84% success rate.
[00:15:16] Speaker A: Right.
[00:15:17] Speaker B: But the biggest limitation is tissue specificity.
[00:15:20] Speaker A: Because most of this data comes from blood draws.
[00:15:22] Speaker B: Exactly. The PQTL data. The protein levels are mostly from plasma. But biology is local.
[00:15:29] Speaker A: What do you mean by local?
[00:15:30] Speaker B: Well, a protein might be doing one thing in your blood, but something totally different inside your brain or your liver.
[00:15:35] Speaker A: Got it.
[00:15:36] Speaker B: So if we are looking for an Alzheimer's drug, a blood protein level might not reflect what is actually happening behind the blood brain barrier.
[00:15:45] Speaker A: So assuming blood equals body is a dangerous simplification.
[00:15:49] Speaker B: It is. And that's why the authors really emphasized triangulation. They didn't just rely on this one Mr. Method.
[00:15:54] Speaker A: What else did they use?
[00:15:55] Speaker B: They combined the results with databases like omim, which tracks rare Mendelian genetic diseases, and mouse knockout databases.
[00:16:02] Speaker A: Mouse knockouts are where they delete a gene in a mouse to see what breaks, right?
[00:16:07] Speaker B: Yes.
The study found that if a gene hits in their Mr. Analysis and shows up in a mouse model, the odds of it being a valid drug target skyrocket.
[00:16:16] Speaker A: It's like building a legal case.
[00:16:18] Speaker B: Exactly. DNA evidence. The Mr. Is great, but DNA evidence plus a fingerprint, the mouse data plus an eyewitness. The OMIM data is unbeatable.
[00:16:29] Speaker A: It moves genomics from a descriptive science. You know, just cataloging a list of genes to a predictive tool.
It finally answers, what can we actually do with this?
[00:16:40] Speaker B: That is the core shift here. And it leaves us with a really provocative thought. Think about the thousands of failed compounds sitting in pharmaceutical libraries right now.
[00:16:48] Speaker A: The dusty shelves of the Valley.
[00:16:50] Speaker B: Exactly. Compounds that were safe but failed. Efficacy. They simply didn't work for the disease they were tested on.
[00:16:56] Speaker A: But maybe they were just tested on the wrong disease.
[00:16:58] Speaker B: Precisely. This method suggests that the failure wasn't necessarily the molecule. It was the question we asked it.
[00:17:04] Speaker A: Wow.
[00:17:05] Speaker B: Maybe that failed asthma drug is actually a breakthrough for heart disease. And we just never looked at the right genetic map to tell us so
[00:17:11] Speaker A: that is a thought to keep you up at night in a good way. The cures for diseases we consider incurable might already be sitting in a freezer somewhere, just waiting for the right data to unlock them. What a fascinating place to leave it.
[00:17:22] Speaker B: It really is an exciting time for genomics.
[00:17:24] Speaker A: This episode was based on an Open Access article under the CCBY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description now. Stay with us for an original track created especially for this episode and inspired by the article you've just heard about. Thanks for listening and join us next time as we explore more science Base by base.
[00:18:29] Speaker C: Late nights breast cleans the pipeline homes we stitch the traces where the molecules come 69669 gene tray links like constellations over 2000 phenotypes fold into our map Car localization sharpens aim while it denies some light Instruments are Cuba patterns start to write our ranking Clouds Precision Recall AUC 079 Matthew Jeans Light away Turn the noise to the side 6447 voices rising from the data lines nominated targetings Isaiah 2 for lipids in sight we're not the genesis we bring them into life.
We bring them into life.
Blood signals whisper player trophy blurs the trail Harmonize the phenotypes man what might derail the machine Ranks the edges a short list for the brave Repurposing the new paths waiting to be paved A pipeline of discovery charts that glow as well we balance specificity and sensitivity spell from network loops to a prioritized design the future waits where signal meets the sign Maca jeans light the way Turn the noise to science these 447 names rising from the dead alive precision clock still breach the clearer high Wind up the genes and push the dark into light.
And push the dark into light.
Sam.