Episode 328

March 27, 2026

00:23:39

328: Variant selection boosts R2 for haptoglobin (HP) in cis‑Mendelian randomization

Hosted by

Gustavo B Barra
328: Variant selection boosts R2 for haptoglobin (HP) in cis‑Mendelian randomization
Base by Base
328: Variant selection boosts R2 for haptoglobin (HP) in cis‑Mendelian randomization

Mar 27 2026 | 00:23:39

/

Show Notes

Zhou A et al., Human Genetics and Genomics Advances - Comparing LD‑pruning, COJO, SuSiE and PCA in haptoglobin (HP) cis‑region data, the study finds including non‑lead variants substantially increases variance explained (R2) and MR precision. Key terms: haptoglobin, cis-Mendelian randomization, LD-pruning, SuSiE, COJO.

Study Highlights:
The study analyzed circulating haptoglobin (HP) using Fenland protein GWAS summary statistics with LD from UK Biobank, compared four variant selection methods (modified LD‑pruning, COJO, SuSiE, PCA), and extended results with simulations and 15 additional gene regions. In the HP region, incorporating non‑lead variants produced a median proportional gain in R2 of 145.1% and a median reduction in MR standard error of 36.3% relative to the lead variant alone. In simulations with one or two causal variants the methods recovered the expected genetic variance (≈40%) and, when causal variants were removed, non‑lead‑inclusive methods recovered more variance than lead‑only. The functional implication supported by the data is that including correlated non‑lead variants can materially increase instrument strength and precision in cis‑MR, but may raise risks of pleiotropy and numerical instability.

Conclusion:
Variant selection methods that incorporate correlated non‑lead variants reliably improve instrument strength (R2) and MR precision in cis‑MR compared with the lead‑variant‑only approach; comparisons with the lead variant are advised to detect instability.

Music:
Enjoy the music based on this article at the end of the episode.

Article title:
Variant selection to maximize variance explained in cis-Mendelian randomization

First author:
Zhou A

Journal:
Human Genetics and Genomics Advances

DOI:
10.1016/j.xhgg.2026.100573

Reference:
Zhou A, Karhunen V, Tian H, Pott J, Patel A, Slob EAW, Burgess S. Variant selection to maximize variance explained in cis-Mendelian randomization. Human Genetics and Genomics Advances. 2026 Apr 9;7:100573. https://doi.org/10.1016/j.xhgg.2026.100573.

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/

Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00

Official website https://basebybase.com

On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Episode link: https://basebybase.com/episodes/hp-variant-selection-cis-mr

QC:
This episode was checked against the original article PDF and publication metadata for the episode release published on 2026-03-27.

QC Scope:
- article metadata and core scientific claims from the narration
- excludes analogies, intro/outro, and music
- transcript coverage: Audited the transcript sections describing: (1) the four variant selection methods and their rationale; (2) HP region results including R2 gains and SE reductions; (3) simulation studies with known causal variance (40%); (4) extension to 15 gene regions; (5) pleiotropy concerns and safeguards; (6) practical recommendat
- transcript topics: Four variant selection methods (LD-pruning, COJO, SuSiE, PCA); Modified LD-pruning with adjusted R2 and LD-matrix checks; HP region results: variance explained (R2) gains and MR precision; Simulations with known causal variance (40%); Two-causal-variant scenario and lead variant variance explained; Extension to 15 additional gene regions

QC Summary:
- factual score: 10/10
- metadata score: 10/10
- supported core claims: 8
- claims flagged for review: 0
- metadata checks passed: 4
- metadata issues found: 0

Metadata Audited:
- doi
- article_title
- article_journal
- license
- episode_title

Factual Items Audited:
- Variant selection methods tested: LD-pruning, COJO, SuSiE, and PCA
- HP region results: median proportional gain in R2 of 145.1% and median MR SE reduction of 36.3%
- Simulations recovered about 40% variance explained across methods
- Across 15 gene regions, non-lead variant methods outperformed lead-variant-only approach
- Recommendation to compare multivariant estimates with lead-variant-only baseline
- ABO locus cited as a pleiotropy risk example

QC result: Pass.

Chapters

  • (00:00:20) - How a single matrix can cripple genomics
  • (00:01:34) - Deep Dive: The Search for genetic instruments without breaking the math
  • (00:05:59) - The Hidden Problem with Standard LD Pruning
  • (00:11:13) - The Lead Variants vs Non-Lead variants in disease prediction
  • (00:15:27) - Multivariant Analysis: The Right Mix of Variants
View Full Transcript

Episode Transcript

[00:00:20] Speaker A: Welcome to Base by Bass, the papercast that brings genomics to you wherever you are. Thanks for listening and, and don't forget to follow and rate us in your podcast app. So in genetic research, scientists basically operate on this golden rule, which is that more data means more accuracy, right? [00:00:37] Speaker B: Yeah, volume is usually king. [00:00:38] Speaker A: Exactly. You figure, hey, you feed the algorithm more genetic variants and the picture of human biology just gets sharper, you know? [00:00:45] Speaker B: That is the assumption. Yes. [00:00:46] Speaker A: But I want you to imagine a scenario. What happens when you have so much good information, when you feed all your available data into the system and it literally causes the mathematics to self destruct? [00:00:58] Speaker B: It completely freaks the system out. [00:01:00] Speaker A: Yeah, it's wild. We are looking at a problem today where adding more genetic information causes the most advanced supercomputers to just, well, hit a brick wall. A wall called a singular matrix, which [00:01:12] Speaker B: is a term that strikes fear into the heart of any statistician. [00:01:15] Speaker A: Right. It's this complete mathematical meltdown. And it's triggered just by having too much highly correlated information. [00:01:22] Speaker B: And it really forces us to rethink our entire approach to genetic data. I mean, we always assume volume solves everything, but in genetics, redundancy can actually be fatal to your analysis. [00:01:34] Speaker A: Which brings us to the actual heroes of our deep dive. Today we celebrate the work of Eng Zhu and colleagues who have seriously advanced our understanding of how to carefully select genetic instruments without breaking the math. [00:01:47] Speaker B: It's a fantastic paper. [00:01:49] Speaker A: It really is. The mission for this deep dive is to explore how this research team tested four pretty ingenious methodologies. They wanted to squeeze the absolute maximum predictive power out of our DNA, basically filtering out the noise to find truly independent biological signals. [00:02:06] Speaker B: And to really grasp the solution they came up with. We have to clearly define the clinical problem they're addressing here. [00:02:11] Speaker A: Let's do it. [00:02:12] Speaker B: So this paper is focused on Mendelian randomization, or Minar. Now, Mr. Is essentially nature's randomized clinical trial. We use genetic variants as our instrumental variables. [00:02:22] Speaker A: Because your genes are randomly assigned at birth. [00:02:24] Speaker B: Exactly. They do not change based on your lifestyle or what you eat. So we can use them to see if a specific biological trait, say the level of a certain protein in your blood, actually causes a disease. Rather than just being a random coincidence, [00:02:40] Speaker A: it cuts through all the noise of diet or environment. [00:02:43] Speaker B: It does, but, you know, Zoo's paper focuses on a very specific flavor of this called CIS Mendelian randomization. [00:02:49] Speaker A: Okay. So if I understand this correctly, in sysmr, we aren't looking all over the entire genome like we are restricting Our search for those genetic instruments to just a single gene region equal to one specific locus. Yes, usually because we're trying to figure out how a specific drug might affect a specific protein target. [00:03:07] Speaker B: Right, Exactly. But that geographic restriction, that is the crux of the entire problem. [00:03:12] Speaker A: How so? [00:03:13] Speaker B: Well, when you limit your search to just one neighborhood of the genome, you run headfirst into something called linkage disequilibrium, or ld. [00:03:21] Speaker A: Ah, ld. The pack mentality of genes. [00:03:24] Speaker B: That's a great way to put it. Genetic variants that sit close to each other on a chromosome, they tend to travel as a pack. They are highly correlated. [00:03:31] Speaker A: So if you inherit variant A, you almost certainly inherit variant B and variant C sitting right next to it. [00:03:37] Speaker B: Precisely. They are mathematically tethered together. [00:03:40] Speaker A: Okay, let's unpack this with an analogy. Imagine you are putting together a choir. [00:03:44] Speaker B: Okay, I like where this is going. [00:03:46] Speaker A: If you only pick your single best singer, the sound is going to be a bit thin. You're lacking power, Right? [00:03:52] Speaker B: Right. A solo act can only project so much. [00:03:55] Speaker A: But if you pick 50 singers who all have the exact same voice, the exact same tone, and they all sing the exact same notes at the exact same time, they just drown each other out. [00:04:08] Speaker B: They might even cause audio feedback. [00:04:10] Speaker A: Yeah, it's just noise at that point. [00:04:12] Speaker B: And that choir analogy perfectly maps to the math here. If you throw all 50 of those highly correlated genetic variants into the standard Mr. Equations, you introduce something called multicollinearity. [00:04:24] Speaker A: Multicollinearity. So basically, if they are all inherited together, it sounds like we're just feeding the computer the exact same information 50 times, just in different fonts. [00:04:34] Speaker B: That is exactly what you were doing. [00:04:36] Speaker A: Did the computer realize it's redundant? [00:04:38] Speaker B: It does. And that redundancy is exactly what causes the crash. Because to solve the equation, the algorithm has to invert the genotype correlation matrix. [00:04:48] Speaker A: Okay, getting into the matrix math just briefly. [00:04:51] Speaker B: I promise, if the variants don't offer independent information, the matrix becomes what mathematicians call singular or ill conditioned. [00:05:00] Speaker A: Singular matrix. [00:05:01] Speaker B: Right. Conceptually, it is the matrix equivalent of trying to divide by zero. The equation simply cannot be solved. [00:05:08] Speaker A: Oh, wow. So it just breaks? [00:05:10] Speaker B: It completely breaks. You end up with wildly inaccurate, unstable or completely impossible results. [00:05:16] Speaker A: So what did scientists do before this paper? Did they just avoid the meltdown by keeping with the analogy, firing everyone in the choir except for the loudest singer? [00:05:25] Speaker B: Basically, yes. They used what we call the lead variant approach. They found the single strongest genetic signal in that region and completely ignored all the surrounding variants. [00:05:34] Speaker A: Just one singer? [00:05:35] Speaker B: Just one. And it's statistically safe. Sure. But it leaves a massive amount of statistical power on the table. [00:05:41] Speaker A: Because there's more than one voice in that neighborhood. [00:05:44] Speaker B: Exactly. There is growing recognition that a single gene locus can contain more than one distinct biological signal. You are missing out on the full picture of how that gene region affects the trait simply because you are afraid of breaking the calculator. [00:05:59] Speaker A: Which brings us to the core methodology of this paper. Xu and the team systematically tested four advanced mathematical techniques to intelligently select non lead variants. [00:06:11] Speaker B: They wanted to safely expand the choir. [00:06:13] Speaker A: Right. Methods that find the other good singers who added distinct harmony rather than just singing the exact same note louder. Let's start with the first method they tested, which is a classic LD pruning. [00:06:24] Speaker B: So, standard LD pruning filters out variants based on a strict correlation threshold. If a new variant is too highly correlated with one you've already picked, you prune it. [00:06:33] Speaker A: You just throw it out. [00:06:34] Speaker B: You throw it out. You only keep the ones that are somewhat independent. [00:06:37] Speaker A: But the paper highlights a glaring flaw with the standard version. Right. Because standard pruning only looks at the correlation, it doesn't actually check. If adding that new variant improves your [00:06:47] Speaker B: prediction, that is the major weakness. And the researchers showed just how dangerous standard pruning can be when they tested it on the haptoglobin gene. Or the HP gene. [00:06:57] Speaker A: Yeah, the HP gene test was incredibly revealing. What happened there? [00:07:01] Speaker B: Well, when they applied standard LD pruning to predict circulating HP protein levels, the multicollinearity was still so severe that the math went totally crazy. [00:07:10] Speaker A: Like a singular matrix crash? [00:07:12] Speaker B: Worse, actually. It gave an answer, but the results showed a reverse direction of effect. [00:07:17] Speaker A: Wait, really? [00:07:18] Speaker B: Yes. It suggested the protein did the exact opposite of what it actually does in reality. And the worst part, it presented this impossible result with a tiny error margin. [00:07:28] Speaker A: Oh, man. So it was confidently wrong. [00:07:30] Speaker B: Dangerously confidently wrong. Which is why Zhu and the team introduced a modified LD pruning algorithm. They added a crucial safety check. [00:07:39] Speaker A: Okay, how does the modified version work? [00:07:42] Speaker B: Modified pruning only keeps a variant if it genuinely increases something called the adjusted R squared. [00:07:47] Speaker A: Let's define that. Because R squared comes up a lot. [00:07:49] Speaker B: Sure. Regular R squared just measures how much variance your model explains. The problem is it artificially goes up every time you add a new variable, even if that variable is totally useless. [00:08:00] Speaker A: So it tricks you into thinking your model's getting better. [00:08:02] Speaker B: Right. But adjusted R squared mathematically penalizes you for adding fluff. [00:08:08] Speaker A: I love that. [00:08:09] Speaker B: It forces the algorithm to ask, you know, does this new genetic variant actually add real predictive value? Or is it just making the data [00:08:16] Speaker A: look busier so it has to earn its spot in the choir? [00:08:19] Speaker B: Exactly. And furthermore, the modified algorithm constantly checks the matrix condition number to ensure the new variant won't trigger that singular matrix meltdown we talked about. [00:08:28] Speaker A: It's a brilliant upgrade, but pruning is ultimately just a filtering process, isn't it? It's defensive. [00:08:34] Speaker B: Yes, it is. About what you throw away. [00:08:36] Speaker A: Right. So the second and third methods the researchers tested go on the offensive. They belong to a family of techniques called fine mappers. We have Hojo and Suzy. Let's start with kojo. That stands for conditional and joint analysis. [00:08:49] Speaker B: So Kojo is more of an iterative hunting process. It picks the strongest variant first, and then it scans all the remaining ones, asking, are any of these independently associated with the trait and conditional on the one I just picked. [00:09:02] Speaker A: By conditional, you mean. It essentially subtracts the effect of the first singer and then listens to see if there's any new melody left over in the room. [00:09:10] Speaker B: That is a highly accurate way to visualize it. It isolates the residual effect. It finds the next best independent signal as it and repeats the process. [00:09:18] Speaker A: Okay, so that's IoJo. Now, Suzy, on the other hand, stands for sum of single effects, and it abandons that step by step approach completely. Right. It uses Bayesian logic. [00:09:29] Speaker B: It does, and it's a very elegant solution. [00:09:31] Speaker A: I want to break down that Bayesian approach because, honestly, it can sound like a black box to a lot of people. If Kojo is interviewing singers one by one, how does Suzy work under the hood? [00:09:41] Speaker B: Well, instead of looking for absolute certainty one variant at a time, Suzy surveys the entire genetic region simultaneously. Bayesian logic is all about weighing probabilities. [00:09:52] Speaker A: Okay, probabilities. [00:09:53] Speaker B: So the algorithm groups the variants into what it calls credible sets. A Krettle set is a cluster of highly correlated variants where the algorithm is mathematically confident, usually about 95% confident, that at least one true causal signal exists somewhere inside that specific cluster. [00:10:11] Speaker A: So it finds the general neighborhood of the signal. [00:10:13] Speaker B: Yes. And once it maps out these distinct clusters, these credible sets, it simply picks one representative variant from each set to serve as the instrument. [00:10:23] Speaker A: Okay, back to the choir. It identifies the different sections, the sopranos, the altos, the tenors, and just pulls one representative from each section. [00:10:30] Speaker B: Exactly. It ensures you have one voice from each essential harmony group. [00:10:33] Speaker A: That makes perfect sense. So we have modified LD printing, we have Kojo, and we have Susie. All very clever ways of picking physical, individual genetic variants on the chromosome. But the fourth method, they Tested pca, or Principal Component Analysis. That takes a totally different philosophical approach. [00:10:51] Speaker B: It really does. PCA doesn't pick specific genetic variants at all. [00:10:55] Speaker A: What does it do instead? [00:10:56] Speaker B: It takes the entire massive data set of correlated genetic variants and mathematically squishes it down. It transforms the physical data into a smaller number of completely independent uncorrelated variables called principal components. [00:11:10] Speaker A: Okay, wait, wait, wait. I have to push back here. [00:11:12] Speaker B: Go for it. [00:11:13] Speaker A: We are studying human biology, right? DNA, physical as, Cs, Ts and GS sitting on an actual chromosome. You are telling me PCA creates these principal components that don't physically exist in the human genome and we are using those mathematical ghosts to predict actual disease risk. [00:11:30] Speaker B: I know, I know, it sounds completely counterintuitive. [00:11:33] Speaker A: How can a researcher use a ghost as a biological instrument? [00:11:36] Speaker B: Well, in the realm of Mendelian randomization, it is entirely mathematically sound. You have to remember an instrumental variable does not have to be the physical biological cause. [00:11:46] Speaker A: It doesn't? [00:11:47] Speaker B: No. It merely has to be a reliable proxy. As long as these new abstract principal components strongly correlate with the exposure we are studying, like the protein level. And they don't affect the disease outcome through some completely separate pathway. They satisfy the core statistical assumptions. [00:12:04] Speaker A: So they act as perfect, independent mathematical summaries of the biological reality. [00:12:08] Speaker B: Exactly. They capture the essence of the data without the redundancy. [00:12:12] Speaker A: Okay, so we have our four contenders. Modified Pruning, Ciozho, Suzy and pca. The researchers put them to the test using data from over 10,000 people in the Finland study and linkage data from over 350,000 people in the UK Biobank. [00:12:29] Speaker B: A very robust data set. [00:12:31] Speaker A: Huge. So let's look at the key findings from these proving grounds. We already mentioned the HP gene, where standard pruning failed so miserably. How did the four advanced methods do on that exact same gene? [00:12:43] Speaker B: The results were a massive validation of the multivariant approach. [00:12:46] Speaker A: Yeah. [00:12:46] Speaker B: Oh, yeah. Compared to using just the lead variant alone, the four advanced methods produced a median proportional gain in variance explained of 145.1%. [00:12:55] Speaker A: Wow. A 145% gain in statistical power just by efficiently utilizing the data we were previously throwing in the trash. [00:13:02] Speaker B: Exactly. And the precision of the results improved dramatically too. They shrank the standard error of the Mr. Estimate by a median of 36.3%. [00:13:10] Speaker A: So the target became much, much clearer. [00:13:12] Speaker B: Significantly clearer. But you know, observational data can be messy always. Right? So to truly stress test these algorithms, the researchers turned to simulations. They created synthetic biological traits where they dictated the underlying math. [00:13:27] Speaker A: So they knew the answer key before taking the test. [00:13:29] Speaker B: Precisely. They knew for an absolute fact that the genetic variance of the synthetic trait was exactly 40%. The challenge was whether the algorithms could find that 40% hidden in all the noise. [00:13:42] Speaker A: And all four methods successfully recovered that 40% variance. Which is great. But the most revealing part of the simulation was a specific scenario they ran where they intentionally crippled the dataset. Right. [00:13:53] Speaker B: Oh, this was brilliant. They completely hid the true causal variant. [00:13:57] Speaker A: They deleted the star singer from the dataset entirely. [00:13:59] Speaker B: Yes. And this is where the utilization of non lead variants shows its true value. If you only use the lead variant approach in that scenario, your analysis just [00:14:07] Speaker A: collapses because the main driver is missing. [00:14:09] Speaker B: Right. You capture very little of the genetic variants. But the advanced methods, the ones that selected multiple non lead variants, they recovered significantly more of the genetic variants even [00:14:21] Speaker A: without the true cause being there. Because those extra variants, even if they weren't the main cause, they. They held echoes of the missing true variant. [00:14:29] Speaker B: Exactly. They were correlated just enough with the missing star to capture the missing information, but distinct enough from each other to avoid breaking the math. [00:14:37] Speaker A: It's like you can reconstruct the song if you have five backup singers who learned by listening to the star. [00:14:43] Speaker B: They pieced together the missing biological signal from the surrounding statistical noise. It's really elegant. [00:14:50] Speaker A: And just to prove this wasn't a fluke consigned to one or two genes, the researchers expanded their analysis, right? [00:14:55] Speaker B: Yes. They applied these methods across 15 other gene regions, looking at 15 different proteins. A true cross genome stress test. [00:15:03] Speaker A: And across the board, the methods using non lead variants consistently crushed the lead variant only approach. In terms of variants explained. [00:15:10] Speaker B: They did. And when they ranked them, modified LD pruning usually came out on top, closely followed by Kojo. [00:15:17] Speaker A: So we have a clear consensus from the data. Incorporating correlated non lead variants reliably and dramatically increases your instrument strength undeniably. Which brings us to the implications for future research. And this is where it gets tricky. If the multivariant methods win every single time, does this mean researchers should just jam every non lead variant they can filter into their models? [00:15:41] Speaker B: That is the temptation. But the paper offers a very stark warning against doing that. [00:15:45] Speaker A: Why? [00:15:46] Speaker B: Because while you gain statistical power, introducing more variants into your instrument increases the risk of a phenomenon called horizontal pleiotropy. [00:15:54] Speaker A: Okay, let's bring back our choir analogy for this one. Horizontal pleiotropy is like hiring an extra singer for your choir because they have a great independent voice, but you don't realize they also have a side job as the sound engineer. [00:16:06] Speaker B: I like this. [00:16:07] Speaker A: Right. So while they are singing, they are secretly messing with the volume dials on the mixing board. They are affecting the final audio output through a completely separate pathway. [00:16:17] Speaker B: That is exactly what happens in genetics. A pleiotropic variant might influence your disease outcome not through the specific protein you are trying to study, but through a completely separate biological mechanism. [00:16:28] Speaker A: Sneaky. [00:16:29] Speaker B: Very. The paper uses the ABO locus as a prime example of this danger. [00:16:34] Speaker A: The genes that determine our blood type. [00:16:36] Speaker C: Yes. [00:16:37] Speaker B: Variants in the ABO region are notoriously highly pleiotropic. They are known to affect 14 different plasma proteins. [00:16:44] Speaker A: Oh wow, 14 at least. [00:16:47] Speaker B: So if you blindly grab an ABO variant to study, say protein A's effect on heart disease, you might get a really strong association. But that variant might actually be secretly causing the heart disease by manipulating protein B or protein C. So your causal [00:17:01] Speaker A: map is completely compromised entirely. [00:17:03] Speaker B: By adding more variants just to get more statistical power, you are increasing the surface area for these circumstances. Secret side jobs to sneak into your data and bias your results. [00:17:12] Speaker A: Which leads to the researcher's final critical recommendation for clinical practice and future the ultimate sanity check. [00:17:19] Speaker B: And it's a very pragmatic recommendation, Right? [00:17:22] Speaker A: They argue that yes, researchers absolutely must use methods like modified pruning or SUSI to boost statistical power. But you must always run the old school lead variant only estimate alongside it as the baseline comparison. [00:17:37] Speaker B: It acts as your anchor. If your multivariant result is essentially the same as your single variant baseline, just with the tighter, more precise confidence interval, then you are on solid ground. [00:17:47] Speaker A: But if the result is wildly different, if the point estimate shifts dramatically, like [00:17:52] Speaker B: the reversed effect we saw with the failed standard pruning on the HP gene. [00:17:56] Speaker A: Precisely. If it deviates drastically from the baseline, investigators must assume that numerical instability or horizontal pleiotropy is is polluting the data. [00:18:05] Speaker B: You have to step back and reevaluate. [00:18:06] Speaker A: The paper also warns investigators to ensure their F statistic remains above 10. That sounds like a statistical check engine light, but what is it physically measuring in the math? [00:18:16] Speaker B: Think of the F statistic as the signal to noise ratio of your genetic instrument. It measures how strongly your selected variants actually predict the exposure. Okay, if that number drops below 10, your instrument is considered statistically weak. [00:18:31] Speaker A: And what happens if it's weak? [00:18:32] Speaker B: Well, when an instrument is weak, any tiny unmeasured bias in your observational sample doesn't just sit there. It gets magnified by the math. It completely skews your final results. It is a vital threshold, especially when working with smaller sample sizes. [00:18:47] Speaker A: So what this paper ultimately delivers is a framework for balance maximizing statistical power without sacrificing scientific validity. [00:18:56] Speaker B: It's a tightrope walk. [00:18:57] Speaker A: It really is. The take home message here is that when analyzing complex genetic data, relying on a solo act, just the lead variant is safe but highly inefficient. A well managed choir of intelligently selected non lead variants is far more powerful. [00:19:13] Speaker B: But you have to use strict mathematically sound algorithm. [00:19:16] Speaker A: Right? You have to ensure those voices are actually complementing each other rather than causing a system crash or sneaking in pleiotropic melodies that ruin the entire study. [00:19:24] Speaker B: Beautifully summarized. [00:19:25] Speaker A: Thank you. So what does this mean for how we view data outside the laboratory? [00:19:30] Speaker B: Well, you know, this raises an important question for you to consider applying these concepts to our everyday lives. Yeah. Think about linkage disequilibrium. This biological phenomenon of highly redundant correlated information traveling in unexamined packs. We face that every single day outside of genetics. Exactly. Look at our news feeds, our financial data or social media algorithms. They are saturated with highly correlated echoes of the exact same information, constantly presenting themselves of multiple independent sources when really [00:20:02] Speaker A: they're just singing the exact same note. [00:20:04] Speaker B: Right? So if world class scientists have to invent advanced algorithms like modified LD pruning to carefully filter out these echoes and find truly independent signals in our DNA, how might we apply these exact same principles to our own lives? [00:20:18] Speaker A: That's a great point. [00:20:18] Speaker B: How can we actively seek out conditionally independent voices and to maximize the variance explained in our daily media diets instead of just listening to an echo chamber? [00:20:28] Speaker A: This episode was based on an Open Access article under the CCBY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description Now. Stay with us for an original track created especially for this episode and inspired by the article you've just heard about. Thanks for listening and join us next time as we explore more science. Bass by bass. [00:21:14] Speaker C: Late night numbers on a bright screen glow One loud signal doesn't tell you what you know in the shadow of the strongest others hide side by side in patterns braided in the tide. Don't cut the chorus down to one clear tone there's strength in the harmony you've never known hold the link steady, let the math stay true Pull the quiet threads that tighten up the view more than believe More than a single lie we take the whole skyline and sharpen the side Higher power, tighter lines Less doubt to read when we listen to the neighbors More than belief Prune it smarter Condition what remains Separate the voices running through the veins Single effects are clean Set on the page O Let components turn the network into stage Til we double check the mirrors for a bend Side roads can trick you Drift you from the end if the matrix wobbles, slow it down Reset Keep the simple answer close Compare the net More than the leaf More than a single life we take the whole skyline and sharpen the side Stronger instruments smaller era truer red with the signals working with us More than the leaf. Sam.

Other Episodes