Episode 300

February 24, 2026

00:21:48

300: Population-scale WGS links MHC class II antigen presentation to persistent Epstein–Barr virus (EBV) DNA

Hosted by

Gustavo B Barra
300: Population-scale WGS links MHC class II antigen presentation to persistent Epstein–Barr virus (EBV) DNA
Base by Base
300: Population-scale WGS links MHC class II antigen presentation to persistent Epstein–Barr virus (EBV) DNA

Feb 24 2026 | 00:21:48

/

Show Notes

Nyeo SS et al., Nature - Population-scale WGS reanalysis quantifies persistent EBV DNA and shows MHC class II–mediated antigen presentation predicts EBV DNAemia and links to autoimmune and respiratory disease.

Study Highlights:
Using whole-genome sequencing from UK Biobank (n≈490,560) and All of Us (n≈245,394), the authors extracted chrEBV-mapping reads, masked low-mappability regions, and defined EBV DNAemia (>1.2 genomes per 10^4 cells) in 9.7–11.9% of donors. They performed PheWAS, GWAS and ExWAS and identified 22 genome-wide significant loci and 686 missense variants across 148 genes with heritability enrichment in immune regulatory regions and B cells/antigen-presenting cells. Single-cell module scoring, pathway analyses and NetMHCpan/NetMHCIIpan peptide-presentation modeling implicated variable antigen processing and MHC class II presentation as primary determinants of EBV persistence, with stronger predicted presentation linked to lower EBV DNAemia. EBV DNAemia was reproducibly associated with autoimmune, respiratory, neurological and cardiovascular phenotypes across cohorts.

Conclusion:
Reanalysis of population-scale WGS demonstrates that host genetic variation—predominantly in antigen processing and MHC class II peptide presentation—modulates persistent EBV DNA in blood and associates with multiple complex diseases.

Music:
Enjoy the music based on this article at the end of the episode.

Reference:
Nyeo SS, Cumming EM, Burren OS, Pagadala MS, Gutierrez JC, Ali TA, Kida LC, Chen Y, Chu H, Hu F, Zou XZ, Hollis B, Fabre MA, MacArthur S, Wang Q, Ludwig LS, Dey KK, Petrovski S, Dhindsa RS & Lareau CA. Population-scale sequencing resolves determinants of persistent EBV DNA. Nature. 2026 Feb 19;650:664–672. https://doi.org/10.1038/s41586-025-10020-2

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) - https://creativecommons.org/licenses/by/4.0/

Support:
Base by Base – Stripe donations: https://donate.stripe.com/7sY4gz71B2sN3RWac5gEg00

Official website https://basebybase.com

On PaperCast Base by Base you’ll discover the latest in genomics, functional genomics, structural genomics, and proteomics.

Episode link: https://basebybase.com/episodes/ebv-mhc-class-ii

Chapters

  • (00:00:00) - Base by Base
  • (00:00:28) - A viral ghost in my body
  • (00:03:32) - Herpes virus: When it's active, how to spot it
  • (00:05:42) - The Hidden EBVD Genome
  • (00:08:45) - The smoking gun in chronic fatigue
  • (00:14:01) - Does Your Genetic Lock Fit With EBV?
  • (00:19:01) - Finding the Secret of the Immune Program
View Full Transcript

Episode Transcript

[00:00:00] Speaker A: Hundreds of cycles scattered nuisance whispers in the chain, Heritability written in the places memory remains linked to breathing joints and mind. Small echoes that persist, a pattern in the crowd. [00:00:20] Speaker B: Welcome to base by base, the papercast that brings genomics to you wherever you are. Thanks for listening and don't forget to follow and rate us in your podcast. Appreciate. Today I want to start by asking you to think about your own body. We tend to view ourselves as these solitary biological fortresses. [00:00:36] Speaker C: Right? Just us in here. [00:00:37] Speaker B: Exactly. Our cells, our DNA, our machinery. But that isn't. Well, it's not entirely true. What if I told you that you almost certainly have a silent roommate, A squatter, effectively someone living inside your cells right now, utilizing your machinery, potentially for decades without paying a cent of rent. [00:00:55] Speaker C: That is a somewhat unsettling, albeit accurate, way to start a deep dive. But statistically speaking, you are almost certainly correct. [00:01:04] Speaker B: I'm talking about a specific virus that has managed to achieve something almost no other pathogen has. It has successfully infected over 90% of the adult population on Earth. Most of us know it as the cause of the kissing disease. [00:01:16] Speaker C: Right. Mononucleosis. You might remember it as a rite of passage in high school or college. She you get sick, you get swollen lymph nodes, you feel terrible for a few weeks and then you get better. Or at least you think you do. [00:01:28] Speaker B: And that. That is the crux of the issue, isn't it? You recover from the symptoms, but the virus doesn't leave. It moves in. It hides. And usually it stays hidden. That's the sort of biological contract we strike with it. But not always. [00:01:41] Speaker C: No, not always. And that not always is where the biology gets incredibly complex. And honestly, the stakes get very high. Because while this virus is sitting quietly in most of us for an unlucky subset of it is linked to devastating cancers. [00:01:56] Speaker B: Like Burkitt lymphoma. [00:01:57] Speaker C: Exactly. Burkitt lymphoma, mesopharyngeal carcinoma, and even severe life altering autoimmune disorders like multiple sclerosis [00:02:05] Speaker B: and lupus, which creates a massive paradox. I mean, if nine out of 10 people walking down the street have this virus hiding in their B cells, why do most of us live completely normal lives while others develop these catastrophic diseases? What's the variable? [00:02:17] Speaker C: It is the difference between a dormant tenant who stays quiet and reads a book and a destructive invader who decides to just burn the house down. [00:02:26] Speaker B: Right? And to find the answer, we aren't looking at a standard clinical trial of 50 people here. We aren't even looking at a Thousand. We are talking about hunting for viral ghosts floating in the blood of nearly three quarters of a million people. It is a massive detective story played out on a genomic scale. [00:02:43] Speaker C: It really is data mining at its absolute finest. It's looking for a needle in a haystack, where the haystack is the entire human genome. [00:02:52] Speaker B: Before we dive into exactly how they found these ghosts and what it means for you, we need to acknowledge the scale of this effort. This isn't a solo project in a basement lab. [00:03:00] Speaker C: Not at all. [00:03:01] Speaker B: Today we celebrate the work of researchers from Memorial Sloan Kettering Cancer Center, Weill Cornell Medicine, and AstraZeneca, alongside the broader scientific community, utilizing the UK Biobank and all of us research programs. [00:03:14] Speaker C: A powerhouse collaboration. And what they've done is fundamentally advance our understanding of host viral interactions. They didn't just look at the virus in a petri dish. They looked at how the virus interacts with our specific DNA in the real world, across diverse populations. It's a study of the ecosystem inside us. [00:03:32] Speaker B: So let's unpack the antagonist here. We are talking about the Epstein Barr virus, or EBV, right? [00:03:37] Speaker C: EBV. It's a herpes virus. Herpes virus 4, to be exact. First discovered back in 1964. And like all herpes viruses, think chickenpox or cold sores. Its signature move is latency. It's a master of playing dead. [00:03:51] Speaker B: When you say latency, you mean it essentially goes to sleep? [00:03:55] Speaker C: Essentially, yes. It shuts down most of its gene expression to become invisible to the immune system. It spreads via saliva, hence the kissing disease nickname. Infects the oral cells in your mouth and throat and then moves deeper into the B cells. Precisely. And B cells are the immune cells responsible for making antibodies. So it's hiding inside the very police station meant to arrest it. It establishes a dormant infection for life. Now, usually this is asymptomatic. The virus replicates just enough to spread to a new host now and then, but not enough to hurt you. But the problem is EBV is not always a polite house guest. It causes between 130,000 and 200,000 cancer deaths annually worldwide. That is not insignificant. [00:04:37] Speaker B: That is a staggering number for something we consider dormant. [00:04:40] Speaker C: It is. And the knowledge gap has always been how do we measure the risk? For decades, we relied on serology, checking for antibodies. [00:04:47] Speaker B: Right? You go to the doctor, they draw blood, they see if you have antibodies against evv. [00:04:51] Speaker C: Exactly. But antibodies only tell you that you were infected at some point. It's a binary. Yes or no. It doesn't Tell you if the virus is currently active, replicating, or how much of it is lurking in your system right now. [00:05:03] Speaker B: So it's the difference between seeing a wanted poster for a bandit from 10 years ago versus seeing the bandit actually walking down Main street today with a sack of cash. [00:05:11] Speaker C: That is a brilliant analogy. Yes, yes. Serology is the wanted poster. It's a historical record. But this study was looking for the bandit. They were looking for something called ebv. [00:05:22] Speaker B: Dnemia dianemia. Let's define that for everyone. [00:05:25] Speaker C: It means actual viral DNA floating in the bloodstream. If you find dianemia, it means the virus isn't just sleeping, it's likely replicating or cells are bursting open and releasing it. It's a sign of active persistence or a failure of the immune system to keep it suppressed. [00:05:42] Speaker B: Okay, so here is my question on the how the researchers used these massive biobanks, UK Biobank and the all of Us program. These programs were designed to sequence humans. They spent millions of dollars to read human DNA to understand human traits. How do you find viral DNA in a dataset that was specifically designed to ignore it? [00:06:00] Speaker C: That is the real innovation of this study. They used a data mining approach on existing whole genome sequencing data. They didn't go out and swab new patients. They took petabytes of data that already existed and looked at the traps. In bioinformatics terms, yes. When scientists sequence a human genome from blood, the computer algorithms are trained to map the DNA reads to the human reference genome. It's like a giant puzzle. If a piece fits the human puzzle, it gets kept and analyzed. [00:06:26] Speaker B: And if it doesn't? [00:06:27] Speaker C: Anything that doesn't match the human map is usually termed unmapped reeds and is often discarded or ignored. It's considered noise contamination or just bacterial junk. [00:06:37] Speaker B: But one man's noise is another man's high impact discovery. [00:06:41] Speaker C: Exactly. This team realized that within those trash reeds were the genomes of the viruses infecting those people. They used the EPV reference genome as a sort of digital sink to catch these off target reads. They poured the trash bucket through a sieve designed to catch only Epstein Barr virus. [00:06:58] Speaker B: That sounds incredibly clever, but I imagine it wasn't quite that simple. If it was, someone else would have done it years ago. [00:07:03] Speaker C: It never is that simple. They hit a major technical challenge that likely stopped previous researchers in their tracks. When they ran the initial analysis, they found that the EBV genome has these highly repetitive regions, specifically regions called IR1 and W repeats. [00:07:19] Speaker B: Repetitive in what way? Like a stutter in the genetic Code? [00:07:22] Speaker C: Exactly. Imagine a book where one page just says the word the 5,000 times in a row. If you get a snippet that says the, you have no idea where it came from or how many copies there actually are. When they sequenced people, these repetitive regions were generating massive amounts of noise. False positives everywhere. [00:07:41] Speaker B: It looked like everyone had high viral loads. Just because the sequencer was getting confused by these repeats. [00:07:45] Speaker C: Yes, like a record skipping and playing the same note over and over, making the song sound louder than it actually is. And critically, if they hadn't fixed it. The data showed only a weak link to whether the person actually had antibodies. It didn't match reality. [00:08:00] Speaker B: So what was the fix? [00:08:01] Speaker C: The fix was to mask or digitally remove those specific repetitive regions from the analysis. They basically told the computer, ignore the stutter. Just look at the unique sentences. [00:08:11] Speaker B: And once they ignored the stutter, the [00:08:13] Speaker C: signal to noise ratio skyrocketed. They achieved a massive improvement and established a clean, reliable metric for EBVD anemia. They set a threshold at roughly 1.2 viral genomes per 10,000 human cells. [00:08:28] Speaker B: Which sounds small. 1.2 viruses for 10,000 cells. But when you think about how many billions of cells are in a vial of blood, that represents a significant viral burden. It's like they were panning for gold in a river of data that everyone else had already declared empty. [00:08:43] Speaker C: And they absolutely found gold. [00:08:45] Speaker B: So let's get into the findings. They have this clean metric now. They apply it to nearly 750,000 people across the UK and the US. What did they find? [00:08:54] Speaker C: First, they resolved a huge discrepancy. Remember, over 90% of people are SARA positive. They have the antibodies indicating past infection. But this study found that only about 10% of participants had this EBV dnemia. [00:09:07] Speaker B: Detectable levels of viral DNA floating in their blood. [00:09:11] Speaker C: Correct. [00:09:11] Speaker B: Okay, pause there. That's the pivot point. 90% of us have the wanted poster, the memory of the virus. But only 10% have the bandit walking around in broad daylight. [00:09:20] Speaker C: Precisely. And that 10% is the group we need to worry about. [00:09:24] Speaker B: So who are the 10%? Is it just a random distribution? [00:09:28] Speaker C: Not entirely. Demographically high viral loads were more common in males, older individuals, and unsurprisingly, people on immunosuppressive drugs. [00:09:37] Speaker B: Which makes sense if you dampen the immune system medically, the virus wakes up. [00:09:41] Speaker C: That checks out biologically. But the real insight came when they did a phewas, a phenomena wide association study. [00:09:48] Speaker B: Just to clarify for everyone listening, a FEAS is basically taking that viral status and running it against the person's entire medical history. Every diagnosis code, to see what tops up. Right? [00:09:58] Speaker C: Exactly. It's a hypothesis free look at diseases. They weren't looking for anything specific. They let the data tell the story. They confirmed some things we expected. High viral lows were strongly linked to rheumatoid arthritis, COPD and systemic lupus erythematosis. But then they found something that made me sit up and take notice. [00:10:16] Speaker B: This is the part that I think will resonate with a lot of people who have struggled with unexplained symptoms. [00:10:20] Speaker C: They found a significant association with malaise and fatigue. Yes, and not just I'm tired after work fatigue. We are talking clinically significant malaise. This is a big deal because there has been a long debated hypothesis linking EBV to me. Cfs, Chronic fatigue syndrome. [00:10:38] Speaker B: Right. For years, patients have reported viral like onsets to their fatigue. They get sick with something like mono and never quite recover. [00:10:46] Speaker C: But the blood work often came back normal because doctors were just checking for antibodies, which, as we said, everyone has. [00:10:53] Speaker B: So a doctor sees antibodies and says, well, you had mono 10 years ago, but you're fine now. Your tests are normal. But this study suggests that maybe those patients actually have higher levels of persistent viral DNA that the standard tests were completely missing. [00:11:07] Speaker C: It certainly supports that hypothesis. It gives a biological basis to a symptom that is often dismissed as psychological or psychosomatic. It validates the patient experience. They also found links to rare neurological conditions like neuromyelitis optica. It's a rare disease, but the signal linking it to viral load was incredibly strong. [00:11:28] Speaker B: So we have this subgroup of people, about 1 in 10, who can't seem to keep the virus suppressed. They have viral ghosts haunting their blood and they are prone to fatigue and autoimmune issues. The million dollar question is, why them? Is it bad luck? Is it environmental exposure or is it written in their genes? [00:11:48] Speaker C: It appears to be heavily written in their genes. They ran a genome wide association study, or GWS. Basically, they scanned the human DNA of these 750,000 people to see if any human genetic variants predicted who had the high viral loads. [00:12:03] Speaker B: And did they find a smoking gun? [00:12:04] Speaker C: They found an arsenal. They identified 22 independent regions in the human genome associated with EBV D anemia. [00:12:11] Speaker B: 22 regions, but I'm guessing there was one that stood out above the rest. Usually in these immune studies, all roads lead to one place. [00:12:17] Speaker C: You guessed it. The strongest signals were all clustered in the HLA region on chromosome 6. Specifically the MHC class II genes. [00:12:25] Speaker B: MHC class II. We need to do a little bio 101 here because this is the mechanism. What is MHC Class ii, and why does it matter for a virus hiding in a B cell? [00:12:33] Speaker C: Think of the MHC molecules as the security guards of your immune system, or more accurately, the informants. Their job is to constantly grab pieces of proteins from inside the cell and hold them up on the surface for the T cells to inspect. [00:12:48] Speaker B: Like showing an ID card? [00:12:50] Speaker C: More like showing a piece of evidence. The MHC holds up a piece of the virus and says to the T cells, hey, look what I found inside. This doesn't belong here. Attack this. [00:12:59] Speaker B: Okay, so the T cells are the SWAT team, but they can't see inside the building. The MHC is the one bringing the evidence out to the sidewalk. [00:13:06] Speaker C: Exactly. If the MHC doesn't show the evidence, the SWAT team drives right by. Now here is the crucial part. We all have different variations of these MHC genes. It is one of the most variable parts of the human genome. And this study found that the specific shape of your MHC molecules determines how well they can hold onto pieces of the Epstein Barr virus. [00:13:26] Speaker B: So it's a physical fit, like a puzzle piece. [00:13:29] Speaker C: It's molecular geometry. If you have an MHC molecule that binds tightly to the EBV peptides, think of it like a sticky glove catching a baseball. Your immune system gets a clear, prolonged look at the enemy. It mounts a strong defense, clears the activirus, and keeps it in deep latency. [00:13:47] Speaker B: Result, low viral DNA. [00:13:48] Speaker C: And if I have a slippery mhc? [00:13:51] Speaker B: If your specific genetic variant creates a shape that doesn't grip the EBV peptide, well, the evidence falls out of the glove. The T cells don't get a good look. The virus slips under the radar. It persists. It replicates. Result, high viral DNA, or dnemia. [00:14:04] Speaker C: It's literally a lock and key match between our specific immune genes and the specific viral proteins. [00:14:09] Speaker B: It is. And they didn't just infer this. They proved it. Computationally, they used a tool called NETMHC to simulate the binding affinity between every single viral protein and the different human HLA alleles found in the population. [00:14:21] Speaker C: They simulated the molecular handshake, and the correlation was stunning. People with HLA alleles predicted to bind EBV peptides strongly, the sticky ants had significantly lower viral loads. Conversely, specific alleles like HLAA0301 were major risk factors. Others, like HLA DRB 112.01, were protective. [00:14:44] Speaker B: That is fascinating. It's not just is your immune system strong or Weak. It's. Does your specific genetic key fit this specific viral lock? [00:14:52] Speaker C: Precisely. And this clears up another huge debate in the field. For a long time, people wondered if the virus was mutating. Maybe some people had a super strain of EBV that caused cancer or autoimmune issues. [00:15:03] Speaker B: Right. Like a more aggressive variant. We see that with flu or Covid. We always worry about the new strain. [00:15:08] Speaker C: But this study looked at the viral genomes, too. They found that most of the viral variants, the ones previously thought to be cancer drivers and nasopharyngeal carcinoma, were actually just common geographic variants. [00:15:19] Speaker B: There's regional differences. [00:15:20] Speaker C: Right. A strain from Asia looks different from a strain from Europe, but not necessarily because it's more dangerous. The main driver of persistence wasn't the virus's mutation. It was the host's inability to see it. [00:15:31] Speaker B: So it's not that the burglar is a master of disguise, it's that the security guard has bad glasses. [00:15:37] Speaker C: That is a very apt way to put it. [00:15:38] Speaker B: This feels like a massive paradigm shift. We've always treated susceptibility to EBV as this binary thing. You have it or you don't. But this implies a spectrum of susceptibility based on our unique genetic architecture. [00:15:51] Speaker C: It does. And if we connect this to the bigger picture, it explains why we see such different outcomes in autoimmune diseases. We've known for years that HLA genes are linked to things like multiple sclerosis, but we didn't know why. This study suggests that the reason for that link might be the failure to control ebv. [00:16:10] Speaker B: Walk me through that. How does failing to control the virus lead to attacking yourself? [00:16:14] Speaker C: If your MHC is slippery, the virus persists. The immune system knows something is wrong. There are distress signals everywhere. But it can't quite target the virus efficiently because the presentation is weak. So it stays in a state of chronic agitation, constantly firing. [00:16:28] Speaker B: It's just blindly swinging in the dark. [00:16:30] Speaker C: Exactly. Eventually, in that confusion and chronic inflammation, it starts making mistakes. It starts attacking the body's own tissues. The immune dysregulation starts with the slippery MHC failing to do its job. [00:16:42] Speaker B: That is a profound connection. It connects the dot between genetics, viral persistence, and chronic disease in a way that is incredibly logical. [00:16:51] Speaker C: It supports the persistent driver theories of autoimmunity with the virus as the engine. And it suggests that treating the virus might actually help treat the autoimmune disease. [00:17:00] Speaker B: So we have covered a lot of ground in this deep dive. We have mined the trash data, found the ghosts and identified the genetic culprit. What is the ultimate take home message here? What does this mean for the future? [00:17:12] Speaker C: The central insight is that we can now quantify viral persistence using existing population scale genomic data. We have proven that host genetics, specifically the efficiency of MHC class II antigen presentation, is a primary determinant of whether EBV remains latent or proliferates. [00:17:28] Speaker B: And surely this framework isn't just for ebv. [00:17:31] Speaker C: Absolutely not. That is the most exciting part. This can be applied to the human virome globally. We can map how our genes interact with the entire ecosystem of viruses we carry throughout our lives. [00:17:43] Speaker B: Which leaves us with a pretty thought provoking prompt for you to consider. What does this mean for the future of personalized medicine? Could we eventually screen your genome to predict exactly which viruses your immune system has a blind spot for and vaccinate you differently based on your HLA profile? [00:17:59] Speaker C: It's a real possibility. Designing vaccines that account for your specific genetic blind spots, we could potentially prevent the autoimmune cascade before it even starts. [00:18:08] Speaker B: From trash reads to personalized viral defense, that is science at its best. [00:18:12] Speaker C: I couldn't agree more. [00:18:13] Speaker B: This episode was based on an Open Access article under the CCBY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description Now. Stay with us for an original track created especially for this episode and inspired by the article you've just heard about. Thanks for listening and join us next time as we explore more science base by base. [00:19:01] Speaker A: Long nights and bright screens reading the smallest of signs A whisper of virus threaded through the human lines we pull the code apart to find who keeps the echo near Tiny traces in the stream A secret that clings to the [00:19:21] Speaker B: clear [00:19:24] Speaker A: maps of genes like constellations Markers in the skin Antigen dose and the keys they hide A tug beneath the din A fragile balance tipping where the immune story begins. We found the locks and the keys in the language of ourselves MHC Class 2 showing the pieces the tale the body tells A pulse becomes a beacon Turning myster into light population voices rising from shadow into sight. Hundreds of signals scattered distance Whispers in the chain Readability written in the places memory remains linked to breathing joints and minds Small echoes that persist A pattern in the crowd invisible the. 22 Lighthouses marked along the genome shore. Immune programs waking in be and the doors they were stronger Presentation A slingering song of magic mechanistic core so raise the strings and light the sky Let the chorus swell and hold population scale Searching turns the quiet into gold from mas repeats to open maps the hidden becomes known Echoes in the blood now and so [00:21:28] Speaker C: al. [00:21:32] Speaker A: We are not alone.

Other Episodes