147: Comprehensive Annotation of Complete ABO Alleles and Resolution of ABO Variants

Show Notes

️ Episode 147: Comprehensive Annotation of Complete ABO Alleles and Resolution of ABO Variants

In this episode of PaperCast Base by Base, we explore a groundbreaking study that introduces an improved long-read sequencing method to fully resolve ABO haplotypes, spanning from the 5′ to the 3′ untranslated regions. This work addresses a major gap in blood group genomics by delivering the most comprehensive annotation of complete ABO alleles to date.

Study Highlights:
Researchers analyzed specimens from 79 blood donors and 47 ABO variants using an optimized ultra-long-range PCR method combined with PacBio SMRT sequencing. They successfully amplified and sequenced the full 26.1 kb ABO gene without splicing, achieving complete coverage including the regulatory 5′ and 3′ UTRs. The study provided detailed haplotype sequences of predominant alleles in a Chinese population, revealing structural variations, recombination events, and previously unknown subtypes. Importantly, this method also resolved complex variants, including large deletions, chimeras, and intronic regulatory motifs, offering new insights into ABO allele diversity and molecular mechanisms.

Conclusion:
This comprehensive full-length ABO haplotype sequencing approach advances transfusion medicine by improving variant resolution, refining allele classification, and enabling more accurate genomic analysis for clinical and evolutionary applications.

Reference:
Ying Y, Zhang J, Hong X, Yuan W, Ma K, Huang X, Xu X, Zhu F. Comprehensive Annotation of Complete ABO Alleles and Resolution of ABO Variants by an Improved Full-Length ABO Haplotype Sequencing. *Clinical Chemistry*. 2025;71(4):510–519. https://doi.org/10.1093/clinchem/hvaf015

License:
This episode is based on an open-access article published under the Creative Commons Attribution 4.0 International License (CC BY 4.0) – https://creativecommons.org/licenses/by/4.0/

Support:
If you'd like to support Base by Base, you can make a one-time or monthly donation here: https://basebybase.castos.com/

Chapters

(00:00:00) - Base by Base: The mystery of blood types
(00:01:27) - Full-length ABO haplotype sequencing
(00:06:05) - Anatomy 5, Chinese genetic variation
(00:06:54) - The structural variations of the ABO genome
(00:09:38) - ULRPCR: The Bigger Alga

Episode Transcript

[00:00:00] Speaker A: Foreign. [00:00:14] Speaker B: Welcome to Base by Base, the papercast that brings genomics to you. Wherever you are, think about something really fundamental to medicine. Blood transfusions. It all hinges on that basic ABO system, right? But what if that system just doesn't give a clear answer? It happens all the time. Lab techs get these confusing results. Maybe mixed field agglutination, where it looks like multiple types are present, or sometimes, you know, someone has an antigen that's just incredibly weak, like the Bell phenotype. It makes reliable typing really difficult. And the thing is, it's usually not just a lab mistake. These puzzles, they're often buried deep in the genetics of the ABO gene itself. We've gotten pretty good at sequencing the main part, the coding bits. But the real action driving these strange blood types, it seems to be hiding out in the genes. Dark matter, these long stretches at the beginning and end, the five prime and three prime untranslated regions, the UTRs. And for ages, we just couldn't get a clean look at the whole picture. The entire 26,000 base pair blueprint in one go. We couldn't link those regulatory switches in the UTRs to what was actually happening in the patient. But this deep dive today, it's about a technical breakthrough. [00:01:17] Speaker C: A way to finally map the complete ABO gene. And it's uncovering everything from, like, subtle regulatory tweaks to, frankly, enormous deletions we just couldn't see before. Okay, let's get into it. [00:01:28] Speaker D: Absolutely. And today we really should celebrate the fantastic work from the research team over at the Blood center of Zhejiang Province, their Blood Transfusion Medicine Research Institute in Hangzhou, China. They've really pushed our understanding forward on the complete ABO gene structure and crucially, how we can finally resolve some of these really complex variants. [00:01:49] Speaker C: Right. So we're diving into their paper. Comprehensive annotation of complete ABO alleles and resolution of ABO variants by an improved full length ABO haplotype sequencing. It's by Yaling Ying and colleagues, published in Clinical Chemistry, Volume 71, Issue 4 in 2025. [00:02:06] Speaker D: And the Clinical need here is just huge. I mean, the ABO system isn't just critical for safe transfusions and transplants, we're also finding links to susceptibility for various diseases. So if we don't grasp the full genetic picture, especially for patients with weird, it's like we're operating with only half the instructions, you know? [00:02:24] Speaker A: Yeah. [00:02:24] Speaker C: And historically, everyone just zoomed in on the coding DNA sequence, the cds, which. Okay, makes sense. That's the Part that codes for the protein. But genetics isn't always neat and tidy, is it? These really important events, big insertions, deletions, key regulatory changes, they often happen way outside those protein coding exons. [00:02:42] Speaker D: Exactly. And those UTRs, they're not just junk DNA, they are functional powerhouses. The five prime UTR plays a big role in starting protein translation. And the three prime UTR that controls things like how stable the messenger RNA is, how quickly it gets broken down. So if a change in the three prime UTR messes with the expression of, say, the A antigen, then the person. [00:03:04] Speaker C: Might look like they have type O blood. Or just a very weak A. [00:03:07] Speaker D: Precisely. Or they show that really frustrating weak phenotype. And that's been the big technical hurdle. Right. We kind of knew important stuff was happening in the UTRs, but actually capturing it all together linked to the right allele, that was tough. [00:03:18] Speaker C: Right, because getting that whole 26.1 kilobat haplotype sequence, UTRs and everything in one clean piece is, well, it's technically brutal. You got this huge distance, plus these really repetitive bits of sequence, especially in the 3 Prime UTR with those variable repeats. So trying to sequence it in smaller chunks and then stitch them together like older methods did, that created what the paper calls phasing ambiguity. You basically lose track of which variation belongs to which copy of the gene. [00:03:46] Speaker D: Yeah, you couldn't tell if a specific variant you found way out in the regulatory region was actually on the chromosome carrying the A allele or the one carrying the O allele, for instance. That linkage information is critical. [00:03:56] Speaker C: Okay, so if standard long range PCR kind of hits a wall around 10,000 bases, how on earth did they manage to amplify a single piece that's over two and a half times that size? 26.1 kilobands. [00:04:07] Speaker D: Ah, that's the really clever part. They weren't happy with just sticking fragments together. They developed this improved method, a one step ultra long range PCR. They call it ULRPCR. [00:04:18] Speaker C: Okay, ULR, PCR. What's the trick? How do they get past that 10 kilobatt barrier without it just falling apart or amplifying junk? [00:04:27] Speaker D: Well, the key innovation was using a specific pair of primers called PCR suppression primers or PS primers. Think of it like this. In a normal pcr, for a really long target, lots of shorter, random bits of DNA tend to amplify much faster. They kind of hijack the reaction. [00:04:44] Speaker C: Right. The easy stuff gets made first and uses up all the resources, crowding out the difficult Long product you actually want. [00:04:50] Speaker D: Exactly. So these PS primers, they're designed to to specifically latch onto and block the amplification of those shorter unwanted fragments. It's like putting suppressors on the nasy bits. By shutting down that background amplification, they create the space, chemically speaking, for the reaction to successfully build that one single massive 26.1 kilobatter target amplicon. [00:05:11] Speaker C: Huh, that's actually quite elegant. It's like noise cancellation for pcr. So you ensure only the desired long molecule gets amplified, which solves the linkage problem right there. You get the whole instruction manual in one piece. Five prime UTR to three prime UTR. No breaks. [00:05:24] Speaker D: Precisely. And that single 26.1 kiloban amplicon they generated, it is the longest reported complete ABO gene fragment ever amplified and sequenced. Once they had that huge clean product, they used single molecule real time sequencing, SMRT sequencing. That's a long read technology essential for accurately reading such a massive piece of DNA from end to end. [00:05:47] Speaker C: And they didn't just test it on one or two samples. They used, what was it, 79 healthy blood donors plus 47 cases with known complex ABO issues, all from the Chinese population. That gives the findings real robustness. [00:06:00] Speaker D: Yes, a very solid cohort to establish baseline patterns and test the method on challenging cases. [00:06:05] Speaker C: Okay, let's get into the findings. So, first big thing, they got the complete sequence. The full annotation finally filled in that big blank map. Especially for the 3 Prime UTR, which was mostly guesswork before. [00:06:15] Speaker D: Right, and having that complete sequence immediately let them see finer details within the common ABO alleles. Those five main alleles you see often in the Chinese population, like a 1-01-B00101001. They could suddenly subdivide them into distinct. [00:06:28] Speaker C: Subtypes based on coding changes. [00:06:30] Speaker D: No, that's the interesting part. Based on variations they found in the non coding regions, the introns and those newly mapped UTRs. For instance, they found two specific variants linked together in Intron 6 that consistently marked out different subtypes of the A1,02 alle. These non coding variations act like unique identifiers for these allele subgroups. [00:06:52] Speaker C: Okay, non coding markers, that's cool. Now you mentioned structural variations. Since they sequenced the whole thing, they could see big changes, Right? Like in intron 1. [00:07:00] Speaker D: Yes, exactly. They found significant structural variation in intron 1. Specifically something called a VNTR, a variable number of tandem repeats. In this case it was repeats of. [00:07:11] Speaker C: The sequence TA and variable means the Number of repeats change between alleles? [00:07:15] Speaker D: Drastically. The number of TA repeats, which they called N, ranged all the way from 11 copies up to 26 copies. And there was a clear pattern. Alleles like A1.02 and B1 01 tended to have more repeats around 21 on average. But the O alleles, they generally had fewer, averaging around 13 repeats. [00:07:33] Speaker C: Hmm. So is that just a handy marker like the Intron 6 variants, or does the actual length of that repeat section maybe do something? [00:07:40] Speaker D: Functionally, it's highly unlikely to be just a passive marker. We're talking about a pretty big difference in physical length right near the start of the gene. It's plausible that the number of repeats could influence how the DNA is packaged, or maybe how easily the transcription machinery can access or move along the gene. It suggests these intronic structural differences might be fundamentally tied to how the allele functions, not just a random tag. [00:08:04] Speaker C: Fascinating. Okay, and what about that mysterious three prime utr? The one we barely knew anything about? [00:08:10] Speaker D: They nailed it. They sequenced the whole 1.6 kilobatt region, confirmed it's full of repetitive sequences, especially CA repeats, and managed to categorize this complexity into 14 distinct structural units numbered 1 through 14. [00:08:24] Speaker C: Okay, 14 units. But here's the kicker, right? How did this compare to the official reference sequence? [00:08:29] Speaker D: Ugh, this is a huge point for genetics globally. They found that the standard ABO reference sequence, the one with the identifier NG00066692, used by researchers everywhere, was actually incomplete in this three prime UTR region. [00:08:44] Speaker C: Incomplete? [00:08:45] Speaker D: Hell, it was completely missing three of those structural unit units, 11, 12 and 13, compared to what they consistently found in their large Chinese cohort. [00:08:54] Speaker C: Wait, wait, the global reference sequence? [00:08:56] Speaker D: Yeah. [00:08:56] Speaker C: The one people use for designing diagnostics for research? [00:08:59] Speaker D: Yeah. [00:09:00] Speaker C: It was just missing three whole sections. [00:09:01] Speaker D: That's. [00:09:02] Speaker C: That's pretty major. [00:09:03] Speaker D: It is. It really highlights why getting this full length population specific data is so critical. You could be doing experiments on gene regulation based on a reference that's missing key parts of the regulatory machinery. [00:09:13] Speaker C: Unbelievable. Okay, so they built a better map. Now the clinical payoff. Solving those confusing cases we started with. How did the method do there? [00:09:21] Speaker D: Extremely well. It was highly effective at finding complex structural variations or SVs, beyond just mapping known complexity. They identified three brand new variants within the coding sequence itself, plus 32 previously unreported variations in the introns and promoter regions. [00:09:36] Speaker C: And the really big stuff. Tell us about that Bell phenotype patient, the one with this super weak B antigen. What did they find? [00:09:43] Speaker D: They found something Truly massive. A deletion of 7396 base pairs gone. Just completely missing from the gene in that patient. [00:09:52] Speaker A: Whoa. [00:09:53] Speaker C: Seven thousand and standard test just saw a weak B. [00:09:56] Speaker D: Exactly. Conventional methods might hint at reduced function, but they couldn't possibly see the scale of this underlying structural catastrophe. And crucially, this enormous deletion wiped out the entire enhancer region specifically responsible for boosting ABO expression in red blood cells. [00:10:14] Speaker C: Ah, so you remove the main volume. [00:10:16] Speaker D: Control and you get a very, very quiet. The Bell phenotype. This is by far the largest dilution ever reported for the ABO gene. It fundamentally changes how we should think about weak blood types. It's not always a subtle tweak. Sometimes it's a massive structural failure. [00:10:30] Speaker C: Okay, that's mind blowing. What about the mixed field case, the microchimerism? That's a nightmare scenario, especially after transplants. [00:10:37] Speaker D: Right, it's common after things like stem cell transplants. So in the case they described, traditional sequencing methods probably only picked up the two most abundant ABO types. Let's say the patient's original type and the main donor type, maybe B01 and O0102. [00:10:53] Speaker C: But the blood bank sees mixed agglutination. Suggesting something else is going on. [00:10:57] Speaker D: Exactly. Because this ultra long sequencing method has the resolution to phase and quantify everything. It found not two, but three distinct allele populations. There was the B01, the O0102, but also a small but persistent third population, about 12% A1,02 cells. That small fraction of A cells mixing with the B cells was causing the confusing serology. The ULRPCR method could cleanly separate and quantify that minor chimeric population which standard PCR sang or sequencing completely missed or couldn't resolve properly. [00:11:31] Speaker C: So it directly solved the clinical puzzle. [00:11:33] Speaker D: It did. Stepping back, this ability to get the full length haplotype sequence is really powerful. It lets us finally move beyond just relying on the antibody reactions or just the coding sequence to define these variants. [00:11:43] Speaker C: And it provides functional clues straight away. Right, like that detailed 3 Prime UTR map isn't just academic, it's a resource for figuring out how MRNA stability is controlled. Maybe explaining why some alleles have weaker expression. [00:11:55] Speaker D: Precisely. Those newly mapped repetitive bits in the 3 Prime UTR might well contain elements that negatively regulate expression or stability. We can now test those hypotheses, and diagnostically, it's a leap forward for accuracy, especially linking variations. Remember the phasing problem? Now they can definitively say, for example, that a variant way at position Cato287AG in the FAR5PRIME UTR is physically linked on the same DNA molecule to another variant, C155.5GA found much later in an intron, specifically in certain abob03 alleles. [00:12:30] Speaker A: Wow. [00:12:30] Speaker C: Connecting dots across 26,000 base pairs. Yeah, that linkage information must be invaluable for tricky cases like you said, ABO incompatible transplants where you're tracking different cell populations over time. [00:12:41] Speaker D: Absolutely critical. And thinking bigger picture, this level of detail impacts how we even name ABO alleles. Finding these reliable non coding markers, specific SNVs, those VNTR links, means we could move towards a much more sophisticated, maybe multi level naming system, something more like the very detailed system used for HLA alleles. You know, it would bring much needed clarity and precision. [00:13:04] Speaker C: That makes a lot of sense. Seems like the way forward clinically. But let me push on the practical side, this ULR PCR sounds amazing, but is it, you know, ready for prime time in a regular hospital lab? Is it cost effective yet or is it still mainly a research tool for the really tough nuts to crack? [00:13:21] Speaker D: That's a really important question. And for your average everyday high volume blood typing, no standard methods are still going to be the workhorse for now. As the researchers themselves point out, this ultra long PCR needs replacement. Really good quality DNA, high molecular weight and not degraded. That can be a challenge with routine clinical samples, especially older ones. [00:13:39] Speaker C: Okay, so sample quality is a limitation. [00:13:41] Speaker D: It is. But for those complex cases, the ones where the serology just doesn't match the genotype from standard tests, the ones that pose a real risk if you get the interpretation wrong, for those, this technique is becoming essential. It provides answers nothing else currently can. [00:13:55] Speaker C: Got it. So a powerful tool for specific challenging situations right now. [00:14:00] Speaker D: Exactly. So to kind of wrap up the core findings, this study delivered the first truly complete 26.1 kilobase genetic blueprint for the ABO gene. They used this novel ULR PCR combined with long read sequencing and this revealed a whole new layer of complexity. Structural variations, regulatory elements in the UTRs and introns that we just couldn't see properly before. And crucially, this allows for the accurate diagnosis of really challenging variants like those huge deletions or subtle microchimerism. [00:14:28] Speaker C: So thinking about what this means for you, the listeners, this successful mapping, this integration of the full genomic picture, it really feels like it's pushing us into a new era. Maybe call it blood transfusion olmics. And it makes you wonder, right, if abo, arguably the most studied blood group system, was hiding this much complexity in its non coding regions? What secrets are lurking in other important systems like RH or Duffy or Kell? What clinically vital structural variations might only be revealed by this kind of ultra long sequencing approach? [00:15:02] Speaker D: This episode was based on an Open Access article under the CC BY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. If you'd like to support our work, use the donation link in the description. Thanks for listening and join us next time as we explore more science base by base. [00:15:27] Speaker A: Sam.

Previous Episode Next Episode

147: Comprehensive Annotation of Complete ABO Alleles and Resolution of ABO Variants

Show Notes

Chapters

Episode Transcript

Other Episodes

️ 95: Mitochondria on the Move: Biotechnological Strategies for Transfer and Transplantation

️ 47: Selectivity and Promiscuity — Decoding the Human Chemokine-GPCR Network

230: MIDEAS Y654S hyperactivates MiDAC in a dominant neurodevelopmental syndrome