Episode Transcript
[00:00:14] Speaker A: Welcome to Base by Base, the papercast that brings genomics to you wherever you are.
Today we are doing a deep dive into, well, one of the most fundamental questions. Questions in evolution, I think.
How does a totally new functional gene actually appear, seemingly out of thin air?
[00:00:34] Speaker B: Yeah, it sounds almost philosophical, doesn't it? Yeah, but it's actually a really hard mathematical problem. If you just imagine a pretty short gene, maybe 150 nucleotides long.
[00:00:45] Speaker A: Okay, that's tiny.
[00:00:46] Speaker B: It is, but the number of possible sequences you could make of that length is, well, it's astronomical.
[00:00:52] Speaker A: Something like 10 to 9, 10 to the 90th power. I can't even visualize that number.
[00:00:56] Speaker B: Nobody can really. I mean, just for context, that number is way, way bigger than the estimated total number of microbial cells that have ever existed on our planet, which is.
[00:01:05] Speaker A: Already a huge number. Something like 10, 4K.
[00:01:07] Speaker B: Exactly. So if the space of possibilities is that enormous, finding a sequence that actually does something useful, that functions should be, statistically speaking, almost impossible.
[00:01:16] Speaker A: Right. It feels like you're looking for one specific atom in the entire universe, and yet biology innovates all the time. Organisms do get new functions.
[00:01:27] Speaker B: And that's the paradox, isn't it? The central question we're digging into today. What if functional sequences aren't these impossibly rare needles in a cosmic haystack? What if maybe they're actually much more common, more accessible than we thought?
[00:01:42] Speaker A: Like, what if the seemingly random junk DNA in a microbes genome isn't junk at all, but sort of a.
A constant testing ground.
[00:01:50] Speaker B: Precisely. A source of raw material, sequences that could potentially be useful, just waiting for the right pressure, the right moment to be selected.
[00:01:58] Speaker A: Okay, so that's the mission for this deep dive then. We're looking at this really clever experiment where researchers basically force random bits of DNA in E. Coli to invent a defense against, well, their biggest enemy, bacteriophages. Viruses that eat bacteria.
[00:02:11] Speaker B: Yeah, talk about intense selective pressure. It's literally life or death.
[00:02:15] Speaker A: And the thinking is, if random sequences can come up with something useful under that kind of pressure, maybe the rules of how evolution creates novelty are, I don't know, more flexible, faster than we typically assume.
[00:02:29] Speaker B: It really gets at the very beginning of complexity how new things arise.
[00:02:33] Speaker A: Before we dive deeper into the how, we really should give a shout out here. This work looking into gene birth and microbial evolution, it was a big collaborative effort.
[00:02:42] Speaker B: Absolutely. Really cutting edge stuff. It involved teams from mit, Tel Aviv University and also the University of Montana.
[00:02:51] Speaker A: Yeah, major kudos to them. They designed this incredibly sensitive way to actually catch evolution, like right in the act of creating something new.
[00:02:58] Speaker B: And their focus was squarely on this idea of de novo gene birth. Maybe we should just quickly define that for listeners who know the basics, but maybe don't work on this specific area.
[00:03:07] Speaker A: Good idea. So what exactly is de novo gene birth? How's it different from say, the usual way we think about new genes evolving?
[00:03:15] Speaker B: Right. So often when we talk about new gene functions, we mean an existing gene gets accidentally copied, duplicated, and then you have two copies, one keeps doing the original job and the spare one is kind of free to mutate and over long periods maybe stumble upon a new function.
That's duplication and divergence.
[00:03:35] Speaker A: The standard model kind of pretty much.
[00:03:36] Speaker B: De novo birth is totally different. It means a functional gene emerges when, well, basically from scratch, from DNA that previously wasn't a gene at all. Just, you know, random seeming sequence non coding stuff.
[00:03:49] Speaker A: Wow. And this isn't just some theoretical edge case, this actually happens.
[00:03:52] Speaker B: Oh absolutely. We see the result everywhere, especially in microbes. It seems to be a major source of new genes driving diversity. You hear about orphan genes.
[00:04:02] Speaker A: Uh huh. Genes that are only found in one specific species. Or maybe a closely related group.
[00:04:07] Speaker B: Exactly.
There are hundreds of thousands of these known across bacteria, for instance. Many of them look like they might have originated de novo. The big challenge though, has always been proving it directly, showing that these brand new random sequences can actually give a strong enough benefit, a fitness advantage, to be selected and stick around.
[00:04:25] Speaker A: And that's where the phages come in. You need a really strong reason for the cell to keep one of these random sequences around.
[00:04:31] Speaker B: Precisely. T4 phage infection is about as strong a pressure as you can get for E. Coli.
Survive or die.
[00:04:37] Speaker A: So how did they tackle that huge sequence space problem? That 10 number? You can't test them all.
[00:04:43] Speaker B: No, you definitely can't. So they got clever. They created these huge libraries, basically pools of DNA containing short semi random sequences. They were designed to be about 50 codons long, which is 150 nucleotides, ready to be potentially read as a gene.
[00:05:01] Speaker A: And they didn't just make one library. Right. There were two different types.
[00:05:04] Speaker B: Correct. And this difference is really important. It turns out the first library called NNg, was designed so the random sequences would on average produce amino acids in proportions similar to what you find in typical natural proteins.
[00:05:17] Speaker A: Okay. Sort of a baseline randomness.
[00:05:19] Speaker B: Yeah. Reflecting natural composition. But the second library, nyn, was the twist. They deliberately biased this one so it was more likely to code for hydrophobic Amino acids hydrophobic.
[00:05:30] Speaker A: Water repelling.
[00:05:31] Speaker B: Exactly. Think oily residues. They made about 100 million different random sequences for each of these libraries, put them on plasmids and introduced them into E. Coli.
[00:05:40] Speaker A: Why that bias towards hydrophobic residues in the NYN library? What's the thinking there? What does being hydrophobic suggest about a potential protein's job?
[00:05:49] Speaker B: That's a great question. Hydrophobic bits tend to not like being in the watery environment of the cell's cytoplasm. They often prefer to associate with other hydrophobic things like lipids, which make up cell membranes. Right. So proteins involved in interacting with membranes, maybe for signaling or transport or even defense at the cell surface, often need hydrophobic regions to embed or cross those membranes. By biasing the library this way, they were sort of hedging their bets, increasing the odds that some random sequences might fold into a structure that could interact usefully with the cell envelope.
[00:06:24] Speaker A: Interesting. Kind of pre selecting for a certain type of physical property.
[00:06:28] Speaker B: Sort of nudging randomness in a potentially use.
[00:06:31] Speaker A: Okay, so they have these E. Coli cells, each carrying potentially millions of different random sequences.
Then they unleash the T4 phage.
How did they find the survivors? The ones with the working defense?
[00:06:43] Speaker B: This is the other really smart part of the experiment. Instead of just growing everything in liquid.
[00:06:48] Speaker A: Broth, where only the super resistant ones might survive and take over.
[00:06:51] Speaker B: Exactly. Liquid culture is awesome. In all or nothing selection, they used a soft agar method instead. It's like a plaque assay. You spread the bacteria on a plate, add the phage, and you look for spots where bacteria survive, even when surrounded by phages, killing everything else.
[00:07:08] Speaker A: Ah, so you can see even partial protection. A cell that's just slowing the phages down a bit might still form a small colony.
[00:07:14] Speaker B: Precisely. It allowed them to detect a much wider range of defense strengths, including weaker or intermediate ones. They weren't just looking for a silver bullet. They wanted to know if any kind of functional defense could pop up from random sequences.
[00:07:27] Speaker A: And did it? What did they find?
[00:07:29] Speaker B: Oh, they found it big time. Not just a few lucky hits, but thousands of different random sequences that provided some level of phage resistance.
[00:07:37] Speaker A: Thousands? Really?
[00:07:38] Speaker B: Yeah. In that first natural composition N and B Library, they got 358 functional hits. Which is already pretty cool.
[00:07:45] Speaker A: Okay, yeah, proof of concept.
[00:07:46] Speaker B: But in the NYL library, the one biased towards hydrophobic residues, they found 4516 hits.
[00:07:54] Speaker A: Whoa.
That's more than ten times as many.
[00:07:57] Speaker B: Ten times higher success rate. Yeah, just from that slight bias towards hydrophobicity, that's incredible.
[00:08:02] Speaker A: It really suggests that finding some kind of function isn't astronomically hard. Maybe just finding sequences with basic physical properties like folding or sticking to membranes, is enough to get started.
[00:08:14] Speaker B: It certainly points that way. And what's more, when they looked closely at the functional proteins that came out of the unbiased NNB library, the ones.
[00:08:21] Speaker A: That worked despite having a normal composition to start with.
[00:08:24] Speaker B: Right. They found that natural selection had already pushed them in the same direction. The successful ones from the NNB library were enriched for hydrophobic residues like leucine and phenylanine. And they had fewer charged residues compared to the non functional random sequences.
[00:08:39] Speaker A: So evolution immediately selected for those same properties instantly.
[00:08:43] Speaker B: And these functional proteins were also predicted to be more structured, specifically forming alpha helices more often than the random sequences that didn't work.
Function started shaping form right away.
[00:08:54] Speaker A: Okay, this is fascinating. So thousands of hits. Did they figure out how these random sequences were actually protecting the bacteria? What were the mechanisms?
[00:09:04] Speaker B: They dug into that. Yeah. And they found basically two main quite different strategies that these de novo genes had invented.
[00:09:10] Speaker A: Two different paths to survival.
[00:09:12] Speaker B: Exactly. The first group, they called ripgenes. That stands for random inhibitors of phage infection.
[00:09:18] Speaker A: RIP genes. Okay. What were they like?
[00:09:20] Speaker B: These were definitely small proteins. They confirmed that because if you mutated the start codon, the signal to start making a protein, the protective function disappeared.
[00:09:29] Speaker A: Got it. So it's the protein itself doing the work.
[00:09:31] Speaker B: Yes. And these RIP proteins were generally hydrophobic, helical and found inside the cell and the cytosol.
[00:09:38] Speaker A: Okay, and how did they inhibit the phage? Was it specific to T4?
[00:09:42] Speaker B: That's what's really remarkable about the rips. No, their protection was broad spectrum. They worked not just against T4, but against a whole range of different phages like Lambdavir P5, even one called Sec 517 different families of viruses.
[00:09:56] Speaker A: Wow. How does a tiny random protein manage that?
[00:10:00] Speaker B: It does it indirectly by triggering a known bacterial defense pathway. The RC's stress response.
[00:10:06] Speaker A: RC says isn't that normally triggered by damage to the cell wall or membrane? Like physical stress?
[00:10:12] Speaker B: Usually, yes. It signals that the cell envelope is compromised.
But somehow these random little RIP proteins manage to switch on this RCS pathway from the inside without the usual external trigger.
[00:10:23] Speaker A: Huh. They're like hacking the cell's own alarm system.
[00:10:27] Speaker B: Kind of, yeah. And activating RCS has a very specific downstream effect. It ramps up the production and secretion of something called cholanic acid.
[00:10:35] Speaker A: Cholanic acid. That's like a sugary slime, right? Forms a capsule Exactly.
[00:10:38] Speaker B: It's a thick, protective extracellular capsule. The bacteria expressing these RIP proteins actually looked slimy. They had this mucoid appearance on the plates.
[00:10:46] Speaker A: So the random protein triggers an alarm. The cell puts out this slimy capsule.
[00:10:51] Speaker B: And that capsule physically prevents many different kinds of phages from even attaching to the cell surface in the first place.
[00:10:59] Speaker A: It's like the bacteria suddenly put on a thick coat of armor.
[00:11:02] Speaker B: Pretty good analogy. Yeah.
[00:11:03] Speaker A: And did this defense come at a cost? Did making all that slime slow the bacteria down?
[00:11:08] Speaker B: That's another key finding. Apparently not.
At least not in a way they could easily measure during normal exponential growth.
Unlike a lot of engineered or even natural resistance mechanisms that can impair growth, activating RCS this way seem to provide a strong benefit without a significant baseline fitness cost.
[00:11:26] Speaker A: That's huge.
An effective broad spectrum defense invented from scratch with no obvious downside. That's quite the evolutionary win.
[00:11:34] Speaker B: It really is a masterstroke from a random sequence.
[00:11:37] Speaker A: Okay, so that's mechanism one, the RIPS hijacking the RC's pathway for broad protection. You said there were two paths, right?
[00:11:44] Speaker B: So the researchers wondered, are we only finding things that work through rcs? To check that, they repeated the whole screening process, but this time they used E. Coli that were genetically engineered to lack a functional RCS pathway.
[00:11:56] Speaker A: Ah, clever. Taking that solution off the table to see what else might emerge.
[00:12:00] Speaker B: Exactly. And doing that led them to the second group of hits, which they called RTP genes for random T4 inhibitor products.
[00:12:08] Speaker A: RTP genes. Okay, how were these different?
[00:12:11] Speaker B: Well, first off, they were T4 specific. They protected against T4 and also a few other phages known to use the same entry receptor, REMPC. But they didn't work against phages like lambda or T5 that used different doors.
[00:12:24] Speaker A: So a much more targeted defense this time.
[00:12:26] Speaker B: Right, and here's where it gets really intriguing. For at least two of these RTP2 and RTP4, the evidence started pointing away from them being proteins at all.
[00:12:34] Speaker A: Wait, not proteins. But they came from the same libraries designed to be read as proteins.
[00:12:39] Speaker B: They did, but experiments like changing the DNA sequence, which destroyed the protective function, versus mutating the protein start codons, which didn't destroy the function.
[00:12:48] Speaker A: Ah. If messing up the start signal doesn't matter, then maybe the cell isn't actually making a protein from it.
[00:12:54] Speaker B: That's the implication. It suggests these particular OTPs might be functioning as RNA molecules, instead acting as regulatory RNAs.
[00:13:02] Speaker A: Wow. So a random sequence intended to be maybe a protein ends up working as a regulatory rna?
[00:13:09] Speaker B: It looks that way for some of them.
[00:13:10] Speaker A: Okay, hold on. You have potentially four different random sequences, the RDPs, some possibly acting as RNA, some maybe protein, maybe different mechanisms, but they all ended up doing the Same specific inhibiting T4. How did they converge on a single cell cellular target?
[00:13:27] Speaker B: They did. It's quite remarkable. Despite being unrelated sequences, all the different RDP hits caused the same key change in the cell. A severe downregulation of the OBPC protein.
[00:13:37] Speaker A: OMPC. That's the outer membrane protein that T4 uses as its primary docking site. Its receptor.
[00:13:42] Speaker B: That's the one. It's the keyhole T4 uses to get in. The RDP sequences somehow cause the levels of OMPC protein in the outer membrane to plummet down to as low as 5% to 55% of normal levels.
[00:13:53] Speaker A: Less keyhole, less chance for the phage to unlock the cell. Makes sense. How did they do it? How did these random RNAs or proteins manage to turn down OMPC production?
[00:14:03] Speaker B: They figured that out too. It wasn't by blocking the protein after it was made or degrading the messenger rna. It was transcriptional repression.
[00:14:11] Speaker A: Meaning they stopped the cell from even making the OMPC message in the first place.
[00:14:15] Speaker B: Exactly, yeah. They showed that these OMPLP sequences somehow interfere with the OMPOC gene's promoter, the on switch for the gene. They effectively block the gene from being transcribed into rna.
[00:14:27] Speaker A: So they're shutting it down at the source?
[00:14:29] Speaker B: Pretty much. And they confirmed it wasn't some indirect effect through known small RNA pathways. Because the effect still worked even in bacteria, missing the key machinery for those pathways like hfq. These random sequences seem to be acting directly or very closely on the ON PC promoter itself.
[00:14:45] Speaker A: That's kind of amazing. You have completely random unrelated sequences, some proteins, some rna, all converging on this very specific regulatory outcome. Shutting down OMPC transcription.
[00:14:56] Speaker B: It really is stunning. It suggests the cell's regulatory network might be, I don't know, more hackable than we thought.
That it's surprisingly permissive, allowing these novel randomly generated regulators to integrate and have a potent fact.
[00:15:13] Speaker A: And it connects back to that fitness cost idea again, doesn't it?
[00:15:15] Speaker B: How so?
[00:15:16] Speaker A: Well, completely deleting ompc, the gene OMPSI might be a defense, but OMPC probably does other useful things for the cell. Getting rid of it entirely might cause broader problems.
[00:15:26] Speaker B: That's true. Deleting essential membrane proteins often triggers large scale stress responses and transcriptional changes that can be costly.
[00:15:32] Speaker A: Right. But these four DP sequences didn't delete OMP they just downregulated it. They turned the volume down, not off. So they get the defense benefit while.
[00:15:41] Speaker B: Potentially avoiding the major costs associated with completely lacking the protein. It's a more subtle, perhaps more evolutionarily savvy approach.
[00:15:49] Speaker A: Low cost defense found completely by chance. Okay, so you have these two cool mechanisms, Rip and Shardy P providing defense. Did the story end there, or did the phages fight back?
[00:15:59] Speaker B: Oh, the arms race is never over. That was another crucial part of the study. They showed this wasn't just some artificial lab trick. It was biologically relevant because the 2, 4 phage immediately started evolving to overcome the RETI defense.
[00:16:12] Speaker A: How did they see that?
[00:16:13] Speaker B: On the AG plates where the RT carrying bacteria were surviving, they started seeing tiny little clear spots appearing within the resistant bacterial lawn plaques. Meaning some phages were breaking through.
[00:16:26] Speaker A: The phages adapted.
[00:16:27] Speaker B: They did. They isolated these evolved phages and found they were much better at infecting the ERT resistant bacteria compared to the original T4. They had regained the ability to adsorb to attach effectively, even with low MPC levels.
[00:16:41] Speaker A: Wow. And did they find the mutations in the phage responsible?
[00:16:44] Speaker B: Yep. They sequenced the evolved phages and found mutations concentrated in genes that code for the phage's base plate and its long tail fibers. Specifically, genes like GP6, GP7 and GP34.
[00:16:55] Speaker A: Which are exactly the parts of the phage machine that are responsible for recognizing the host's cell surface and initiating the attachment and injection process.
[00:17:03] Speaker B: Precisely. The bacteria invented a fence by hiding the keyhole, and the phage immediately evolves changes to its key or its lockpick to get in.
[00:17:13] Speaker A: Anyway, it's a perfect snapshot of that co evolutionary dynamic. But initiated by a gene that moments before was just random junk DNA.
[00:17:22] Speaker B: Exactly. It validates the whole system beautifully.
[00:17:24] Speaker A: So let's try and wrap this up. What's the really big picture here? The main takeaway message from all this?
[00:17:29] Speaker B: For me, the central insight is that functional novelty finding sequences that do something useful might not be the near impossible statistical miracle we sometimes think it is. It seems much more accessible, a more common feature of sequence space than expected.
[00:17:44] Speaker A: Random sequences aren't just inert background noise. They're constantly being sampled by the cell providing raw material.
[00:17:50] Speaker B: Right. They can rapidly integrate into really complex, essential cellular pathways like stress response or membrane protein regulation, and provide immediate, meaningful fitness benefits, like fighting off a lethal virus.
[00:18:02] Speaker A: It definitely changes how you might think about the speed of evolution, especially in microbes facing new threats.
[00:18:08] Speaker B: Yeah, maybe we've underestimated just how much potential innovation is already sitting there waiting in the non coding regions. The cellular control systems might be easier to tweak, easier to hack with novel regulators than we assumed.
[00:18:20] Speaker A: So maybe a final thought to leave our listeners with given how, I guess easy it seems for these bacteria to invent new regulatory tools from junk DNA in response to phage, what does that suggest about how microbes might respond to other strong immediate pressures like say, the introduction of a new antibiotic or rapid environmental change?
[00:18:41] Speaker B: It suggests we should probably expect rapid, often subtle evolutionary responses. They might not always be inventing brand new enzymes immediately, but finding quick regulatory workarounds, ways to tweak existing systems using these readily available random sequences. That seems entirely plausible, maybe even common. The potential for innovation is always bubbling just beneath the surface.
[00:19:03] Speaker A: Always expect the unexpected workaround A fascinating look at the origins of novelty this episode was based on an Open Access article under the CCBY 4.0 license. You can find a direct link to the paper and the license in our episode description. If you enjoyed this, follow or subscribe in your podcast app and leave a five star rating. You'd like to support our work, use the donation link in the description. Thanks for listening and join us next time as we explore more science base by base.