Episode Transcript
[00:00:11] Speaker A: Midnight on the monitor Lines in neon haze Millions in the numbers.
[00:00:20] Speaker B: Welcome to Base by Bass, the paper cast that brings genomics to you wherever you are. Thanks for listening and don't forget to follow and rate us in your podcast. Appreciate it.
So I want to start by asking a question. What if the key to the next, like, massive blockbuster medical treatment isn't a new synthetic chemical at all?
[00:00:38] Speaker C: Oh, right. What if it's actually finding a completely healthy person who's just walking around with broken DNA?
[00:00:46] Speaker B: Exactly. I mean, it sounds totally contradictory, right? Because we are so conditioned to think that genetic mutations are always the root of disease, not the cure.
[00:00:53] Speaker C: Yeah, we really are. But, you know, identifying people with specific inactivated genes is actually, it's quickly becoming one of the absolute most powerful strategies we have for discovering new therapeutics.
[00:01:05] Speaker B: It really is. And today we celebrate the work of the incredible research team behind a massive new meta analysis that just dropped in the American Journal of Human Genetics. We are taking a deep dive into the hidden world of what geneticists call human knockouts.
[00:01:19] Speaker C: Right? And the sheer scale of what they've done here is just staggering. I mean, they pooled the genetic and electronic health data of nearly a million people globally to hunt for these exact individuals.
[00:01:30] Speaker B: And by doing that, they are basically untangling the.
The really complex mechanisms behind diseases and traits that affect all of us. Because finding just one person who naturally lacks a functioning gene, well, it answers a huge question for pharmaceutical companies.
[00:01:47] Speaker C: Exactly. It tells them, hey, if we develop a drug to intentionally block this specific genetic pathway in a sick patient, is it actually going to be safe to do that? Because, you know, here's a healthy person who already lives without it, right?
[00:01:59] Speaker B: They act as these naturally occurring in vivo experiments. Like, the paper brings up a couple of amazing real world examples right off the bat. There's this gene called PCSK9.
[00:02:09] Speaker C: Oh, yeah, that's a classic one, right?
[00:02:11] Speaker B: Researchers found people with naturally knocked out versions of PCSK9. And these people had, like, exceptional low cholesterol, but otherwise they were completely fine.
[00:02:20] Speaker C: Yeah, they were totally healthy. And that single discovery, I mean, that biological proof of concept directly launched an entire class of blockbuster cholesterol luring drugs.
[00:02:31] Speaker B: And they also mentioned the Hao1 gene, right? Finding just one healthy adult with a broken HOO1 gene gave researchers the, well, the green light to target that exact pathway for a really severe kidney condition called primary hyperoxyluria.
[00:02:46] Speaker C: It's just wild how one person's genome can do that. But to actually find these individuals on a global scale, you have to really Understand the underlying mechanics of how genes work, specifically the difference between additive and recessive genetic effects.
[00:02:59] Speaker B: Yeah, because a lot of complex genetics is focused on additive effects, Right. Where, like, having one variant bumps your risk up a little bit, and a second variant bumps it up a bit more, it just sort of accumulates.
[00:03:10] Speaker C: But recessive effects are totally.
[00:03:11] Speaker B: It's kind of like the launch protocol on a nuclear submarine.
[00:03:14] Speaker C: Oh, I like that analogy. Yeah.
[00:03:16] Speaker B: Right. To actually execute the launch command, you need two separate officers to turn their totally independent keys at the exact same time. And since we inherit two copies of every gene, you know, one from mom, one from Dad. A true knockout requires a recessive effect.
[00:03:32] Speaker C: Exactly. Both copies of the gene have to be broken, because if only one key is turned, the submarine doesn't launch.
The remaining functional copy just picks up the slack, and the body produces enough of the protein to function totally normally.
[00:03:46] Speaker B: So finding the people where both keys are turned is apparently a massive computational hurdle. I guess the easiest scenario to spot is when a person inherits the exact
[00:03:56] Speaker C: same mutation on both copies, which we call being homozygous. The variant on the maternal chromosome is perfectly identical to the variant on the paternal chromosome. It's a clean match.
[00:04:07] Speaker B: But the paper points out that there's a second, much stealthier way to get a knockout. Right, being compound heterozygous.
[00:04:14] Speaker C: Yeah, that's where it gets really tricky. That's when both copies are broken, but the mutations themselves are totally different from each other.
[00:04:20] Speaker B: Like maybe the mom's copy has a premature stop codon, and the dad's copy has a frameshift mutation or something.
[00:04:28] Speaker C: Exactly. Both of those errors destroy the resulting protein, but they do it in completely different ways.
And identifying those compound heterozygous variants, that is where the math gets incredibly complex.
[00:04:40] Speaker B: Okay, but wait, why is that? Because if you run my genome through a sequencer and it flags two severe mutations in the exact same gene, shouldn't the computer just be able to say, oh, both copies are broken. It's a knockout.
[00:04:55] Speaker C: Well, it would be that simple if standard sequencing read your entire chromosome continuously from end to end. But, you know, it doesn't do that at all.
[00:05:03] Speaker B: Oh, because it chops it up.
[00:05:04] Speaker C: Right, Right. Standard short read sequencing chops your DNA into millions of tiny little fragments, reads them, and then tries to map them back to a reference genome. So when the algorithm spots a mutation over here and another mutation nearby, it completely loses the structural context.
[00:05:20] Speaker B: So it literally can't tell if both mutations are sitting on the exact same chromosome, which. What is that called?
[00:05:25] Speaker C: That's called being in CIS configuration. Right.
[00:05:27] Speaker B: In cis or if they are on opposite chromosomes, which would be in trans.
[00:05:32] Speaker C: Exactly. And that distinction is everything because if both mutations are in cis, like they're both just sitting on the maternal chromosome, well, the paternal chromosome is still perfectly intact.
[00:05:42] Speaker B: Ah, so the submarine doesn't launch. The gene still works. You only have a true knockout if the mutations are in trans.
[00:05:49] Speaker C: You nailed it. And since the sequencer jumbles all the fragments, you have to perform this process called phasing, basically reconstructing the parentage of the individual chromosomes.
[00:06:00] Speaker B: And historically, wouldn't you just have to sequence the person's parents to figure that out?
[00:06:04] Speaker C: You would, but doing that on a massive scale is impossible. So the researchers here use this really advanced technique called statistical phasing. Instead of needing the parents DNA, they use massive reference populations and algorithms.
[00:06:18] Speaker B: How does that actually work though, like mathematically?
[00:06:20] Speaker C: Well, the algorithms look for these established patterns of genetic inheritance. They're known as haplotype blocks within global populations. And by comparing the fragmented sequencing data against those known patterns, they use probability to infer which mutation is sitting on which chromosome.
[00:06:36] Speaker B: Wow. So they are statistically guessing the structure. And it works because the paper says by applying this across hundreds of thousands of people, they increase their discovery of these true compound knockouts by an impressive 19%.
[00:06:48] Speaker C: Yeah, a 19% bump is huge in this field.
[00:06:51] Speaker B: Yeah.
[00:06:52] Speaker C: But to actually train an algorithm to do that, you need an unprecedented amount
[00:06:56] Speaker B: of data, which they definitely had. They pulled together a staggering collaborative data set. We're talking 948,690 individuals, almost a million genomes.
[00:07:06] Speaker C: It's incredible.
[00:07:07] Speaker B: Across six totally distinct global biobanks, like the UK Biobank, the All of Us Research Program here in the US Genomics England, Biobank Japan's.
And they evaluated all that against 41 distinct clinical traits.
[00:07:22] Speaker C: And what's really fascinating is just the sheer volume of discovery that that kind of scale enables.
[00:07:27] Speaker A: Yeah.
[00:07:27] Speaker C: They ended up identifying 5563 total gene knockouts.
[00:07:32] Speaker B: That's a lot of broken DNA.
[00:07:34] Speaker C: It is. It actually expanded the known universe of entirely knocked out human genes by nearly 20%.
They found 1,767 totally new Genesis that we didn't even know could be safely inactivated in humans.
[00:07:48] Speaker B: And we have to talk about the demographics of those newly discovered genes, because that is vital. Out of those 1,767 new knockouts, 1,371 of them were found in individuals of non European ancestries.
[00:08:01] Speaker C: Right. Particularly within South Asian subcohorts. In the data.
[00:08:04] Speaker B: Yeah. And it really just exposes this massive blind spot in historical genetic research. I mean, studying only one population, which historically has almost always been European data, it's like, I don't know, trying to catalog all the rare birds on Earth by exclusively walking through a forest in Germany.
[00:08:20] Speaker C: That is a great way to put it. You're never going to find the species native to the Amazon or the Arctic if you don't look there. Different populations have totally different historical bottlenecks and migration patterns.
[00:08:28] Speaker B: Plus, the paper mentions that in some cultures, there are higher rates of endogamy and consanguinity, right?
[00:08:34] Speaker C: Yeah, absolutely. When parents share a recent common ancestor, it dramatically increases the rate of autozygosity. That's where identical chromosomal segments are inherited
[00:08:44] Speaker B: from both sides, which naturally drives up the occurrence of those homozygous knockouts we were talking about earlier.
[00:08:49] Speaker C: Precisely. It makes sequencing these globally diverse populations mathematically essential.
If you don't do it, the statistical power to find novel drug targets just isn't there.
[00:09:02] Speaker B: So, okay, they assemble this incredibly diverse database. They find over 5,500 broken genes.
The next logical step is seeing what actually happens to the health of the people carrying them.
[00:09:13] Speaker C: Right. They cross referenced all those genetic knockouts directly with the individual's electronic health records, and they found 58 significant associations between a broken gene and a specific clinical trait.
[00:09:24] Speaker B: And they were really rigorous about making sure these were true recessive effects, right?
[00:09:28] Speaker C: Very. Out of the 58, they isolated 17 distinct instances where the clinical impact was strictly driven by that two key knockout mechanisms, not just some additive effect masquerading in the data.
[00:09:41] Speaker B: And looking at those 17 associations, the intersection of the genetics and the hospital records is just deeply revealing. Let's talk about the PYGM gene.
[00:09:49] Speaker C: Oh, this is one of my favorite findings in the whole paper.
[00:09:51] Speaker B: It's so wild because when Both copies of PyGM are completely knocked out, the data shows a massive statistical association with elevated AST levels. And AST is an enzyme. Right. So if you go to a hospital for routine blood work and your AST is spiking, the standard medical assumption is that your liver is damaged.
[00:10:11] Speaker C: Right. Doctors look at AST and immediately think, Hepatic issue. But the thing is, PyGM is famously a muscle gene. It has practically nothing to do with the liver.
[00:10:19] Speaker B: Wait, so if it's a muscle gene, why is it causing a liver enzyme to spike in the hospital records?
[00:10:24] Speaker C: Because of the fundamental limitations of how we co diagnoses. The knockout of the PYGM gene actually causes something called McCardell disease, which is a really rare recessive glycogen storage disorder. Okay, so because the muscles can't properly break down glycogen for energy, physical exertion literally damages the muscle cells. They undergo lysis. They basically break open.
[00:10:45] Speaker B: Ah. And when those muscle cells rupture, they
[00:10:48] Speaker C: release all their AST directly into the bloodstream. So the patient shows up at the clinic, the blood panel flags high ast, and the hurried physician just codes it into the electronic health record as a potential liver problem.
[00:11:00] Speaker B: Wow.
So the symptom is captured totally accurately in the record, but the actual biological mechanism is completely misidentified.
[00:11:07] Speaker C: Exactly. And they observed the exact same phenomenon with a gene called OD81 in the health records. People with OD81 knockouts frequently carried a diagnosis of COPD. Chronic obstructive pulmonary disease, which, you know, is incredibly common.
[00:11:22] Speaker B: But I'm guessing an ODA1 mutation doesn't actually cause standard, like smoking induced C.
Nope.
[00:11:28] Speaker C: It causes primary ciliary dyskinesia. The microscopic hair like structures in the respiratory track the cilia. They fail to beat properly, so they can't sweep mucus and pathogens out of the lungs.
[00:11:39] Speaker B: Which leads to chronic airway obstruction. Which clinically looks almost identical to standard copd.
[00:11:45] Speaker C: Right, and if we connect this to the bigger picture, it just fundamentally changes how we view standard medical databases. What this proves is that a significant number of quote unquote common diseases clogging up hospital records are actually rare Mendelian genetic disorders just hiding in plain sight.
[00:12:04] Speaker B: That is mind blowing. They are miscategorized because our diagnostic tools are built around grouping symptoms together rather than identifying the root genetic cause.
[00:12:13] Speaker C: Yeah, and moving from a symptom based diagnostic model to a mechanism based one is going to reveal some incredible biological paradoxes. Like a single genetic mechanism influencing multiple totally contradictory traits.
[00:12:25] Speaker B: Which perfectly sets up the mystery they found with the HBB gene.
[00:12:28] Speaker C: Oh, the HBB paradox. This is fascinating.
[00:12:30] Speaker B: It really is. So the HBB gene is arguably one of the most well known genes in all of genetics. Mutations there are the primary cause of severe red blood cell disorders like sickle cell disease and beta thalassemia.
But when this meta analysis looked at people with knocked out HBB genes, specifically. Specifically in the African and South Asian cohorts, a totally counterintuitive cardiovascular profile emerged.
[00:12:55] Speaker C: Very counterintuitive.
[00:12:56] Speaker B: Right. These individuals exhibited significantly lower cholesterol levels and a lower body mass index, which,
[00:13:02] Speaker C: if you just look at those metrics in complete isolation, you might assume the knockout provides Some robust protective benefit for your heart.
[00:13:09] Speaker B: Except they also displayed a dramatically higher risk of heart failure. Now, my immediate thought reading that was, well, aren't the lipid and heart issues simply secondary side effects of suffering from a really severe chronic illness? Illness.
[00:13:21] Speaker C: The very logical assumption.
[00:13:22] Speaker B: Right. Because if a patient is battling severe hereditary anemia, their physiological stress is immense. Like malnourishment could cause BMI and cholesterol to drop. And chronic hypopsia could just overwork the heart until it fails.
Isn't the algorithm just picking up the systemic fallout of a known disease?
[00:13:42] Speaker C: The researchers anticipated that exact confounder. So to isolate the direct effect of the gene from the secondary effects of the illness, they performed what's called a conditional analysis.
[00:13:52] Speaker B: What does that mean in this context?
[00:13:53] Speaker C: It means they statistically removed every single individual who carried a clinical diagnosis of sickle cell disease or beta thalassemia from the dataset and just erased them from the pool to see if the cardiovascular associations would vanish.
[00:14:05] Speaker B: But the associations held firm even without the diagnosed disease present in the data. The HBV knockout still strongly correlated with lower cholesterol and increased heart failure.
[00:14:15] Speaker C: Exactly. The genetic mechanism itself is driving a very distinct cause. Cardiometabolic profile. The elevated risk of heart failure is likely tied to iron overload. Cardiomyopathy?
[00:14:25] Speaker B: Because the red blood cell turnover is compromised.
[00:14:28] Speaker C: Yeah. Iron starts accumulating in the myocardial tissue of the heart, eventually causing it to fail. And at the exact same time, the body detects the compromised red blood cells and aggressively attempts to synthesize new cell membranes.
[00:14:42] Speaker B: Oh, I see.
[00:14:43] Speaker C: Yeah, that rapid, desperate synthesis acts as a massive metabolic sink.
It pulls huge amounts of available cholesterol right out of the blood plasma to build those membranes. Yeah, so the patient's overall cholesterol score absolutely plummets.
[00:14:57] Speaker B: That is incredible. What does it say to you that a single genetic tweak can protect your cholesterol while simultaneously threatening your heart with iron overload? I mean, it completely shatters the binary concept of a mutation being strictly beneficial or strictly harmful.
[00:15:10] Speaker C: It really highlights the intense pleiotropy of our genome. Yeah. You know, where one gene regulates multiple totally disparate systems, there's rarely a biological free lunch. Altering a fundamental pathway almost always demands a trade off.
[00:15:24] Speaker B: And we actually see that exact same contradiction when we look at physical traits in the study. Like, breaking a gene doesn't always lead to a deficit. The paper analyzed human height and found two fascinating associations.
[00:15:36] Speaker C: Yeah, the height data was super interesting.
[00:15:38] Speaker B: First, they found that knocking out a gene called LECT2 leads to a decrease in height. Okay, makes sense. But then they found an entirely uncharacterized gene. It doesn't even have a formal name yet, just the identifier ensg000000267561.
[00:15:56] Speaker C: Catchy name.
[00:15:57] Speaker B: Very catchy. When this specific gene is knocked out, it actually increases height, which is highly unusual.
[00:16:03] Speaker C: Finding a knockout that enhances a complex polygenic trait like height is really rare.
[00:16:08] Speaker B: Extremely rare, especially when you consider the broader rules of population genetics. The PA notes that autozygosity, you know, inheriting identical broken copies from related parents almost universally decreases human height across the board.
[00:16:22] Speaker C: Right. It's a classic manifestation of inbreeding depression, a general reduction in overall biological fitness.
[00:16:27] Speaker B: So how does breaking this random, unnamed gene manage to do the exact opposite and make people taller?
[00:16:33] Speaker C: Well, it really comes down to the highly localized function of the proteins involved.
Even though this specific gene is uncharacterized, it's situated near genomic regions that we know regulate height.
And there is some strong evidence pointing toward its involvement in selenium metabolism.
[00:16:50] Speaker B: Selenium. How does that affect height?
[00:16:52] Speaker C: Well, selenium plays a really critical role in regulating oxidative stress and thyroid hormone metabolism within chondrocytes.
[00:16:59] Speaker B: Chondrocytes are the cartilage cells, right?
[00:17:01] Speaker C: Exactly. They're the specialized cells that produce and maintain cartilage.
[00:17:05] Speaker B: So if you knock out a gene regulating that special specific metabolic pathway, you alter the cellular environment of the cartilage.
[00:17:13] Speaker C: Precisely. And the growth plates at the ends of our long bones are made of this exact cartilage. Normally, as we age, these plates ossify and fuse, which is what halts our vertical growth.
[00:17:23] Speaker B: Oh, wow. So if disrupting the selenium pathway alters the hormonal signals within those cells, it
[00:17:28] Speaker C: could potentially delay the fusion of the growth plates. A slight delay in fusion gives the bones more time to elongate, resulting in a taller individual. And even though the broader biological system might actually be experiencing inbreeding depression, that
[00:17:43] Speaker B: entirely defies the general rule. Which really underscores why empirical base by base mapping is so vital. We just cannot rely on phenotypic assumptions anymore.
[00:17:53] Speaker C: We really can't.
[00:17:54] Speaker B: So, to synthesize all this for you listening, we started out discussing how difficult and murky genetic diagnosis can be. But the central insight here is that being well informed in the space today means recognizing that the methodologies are finally catching up to the complexity of our biology.
[00:18:10] Speaker C: Absolutely. By combining massive, ancestrally diverse global databases with incredibly clever statistical phasing, researchers can finally cut through the noise, they can
[00:18:20] Speaker B: separate the cyst from the transmutations, locate the compound heterozygotes, and identify the true human knockouts. And this computational heavy lifting is literally the only way to find the specific, specific biological targets that will lead to the next blockbuster treatment that changes millions of lives.
What does this mean for the future of clinical practice in your eyes?
[00:18:40] Speaker C: I think the overarching implication of the Lassen paper is that global collaboration is the absolute prerequisite for the future of medicine. The genetic architecture of human biology is just too intricately woven for any single biobank or any single ancestral population to unravel on its own. It requires a unified international effort.
[00:18:59] Speaker B: It really does. And that leaves me with a final, somewhat provocative thought for you to explore on your own. If a meta analysis of nearly a million genomes proves that standard diseases in our hospital records, things like common COPD or routine liver enzyme spikes, are frequently just rare recessive genetic disorders misdiagnosed by outdated coding, what happens to medicine when we sequence a billion people?
[00:19:21] Speaker C: That's a huge question.
[00:19:22] Speaker B: Yeah, when massive statistical phasing connects a billion genomes to a billion health records, well, the very concept of a common disease cease to exist altogether. We may soon realize there is no such thing as standard asthma or standard heart failure, and that the entire medical dictionary will be replaced by thousands of highly personalized, distinct genetic fingerprints.
[00:19:44] Speaker C: It's going to be a completely different world.
[00:19:47] Speaker B: This episode was based on an Open Access article under the CCBY 4.0 license. You can find a direct link to the paper and the license in our episode Description if you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description now. Stay with us for an original track created especially for this episode and inspired by the article you've just heard about. Thanks for listening and join us next time as we explore more science. Base by base.
[00:20:27] Speaker A: Midnight on the monitor Lines in neon haze Millions in the numbers but I'm tracing hidden ways Two quiet hits in one gene tuck where no one sees when you split the story wrong, you missed the missing piece so I slow it down Let the sequences align put the strands in order let the signals find the time not loud like headlines, just the pattern clicking through if you map the pairs correctly and true comes into view phase it, don't guess it let the haplotypes talk Two small shadows meeting on a long unwinding walk Knock out in the code and the tray begins to move we were counting one by one now we're seeing two on two.
Six big rooms of data different faces in the stream Same old human questions in a high definition dream Some effects are ancient Some are rare and cut so deep but when the copies match in silence that's the secret they will keep not every cohort sings the same not every signal stays but we stitch the scatter echoes into clear brighter waves A catalog of breakpoints where function falls away to light up biology tomorrow from the noise of yesterday Phase it Don't guess it that the haplotypes talk too small Shadows meeting on a long unwinding walk Knock out in the coat and the tray begins to move we were counting one by one, now we're seeing two on two.