Field of Science

Kinky evolution: did we evolve from PVC?

PVC may have played a big part in our evolution...

But, no I'm not talking about polyvinyl chloride (sorry to dissapoint!), I'm talking about the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum. And I'm talking waaaay back, when the Chlamydiae were much more innocent and hadn't got into that whole sexually transmitted disease scene.

In a Science article, Devos and Reynaud discuss the possibility that that The PVC bugs, which appear to be a monophyletic group forming their own "superphylum" are in fact the most likely candidates for the bacterial ancestor of archaea and eukaryotes. The evidence for this comes from the fact that some (but not all) PVC members share several features in common with eukaryotes, such as subcellular compartmentalisation,  and membrane bound DNA. Some of these features are also shared with archaea, such as loss of the FtsZ protein. PVC bacteria, archaea and eukaryotes also share another protein absense... but that will be discussed in one of our forthcoming papers.

Of the PVC bacteria, Gemmata obscuriglobus is perhaps the most interesting. It surrounds its DNA in a membrane, reminiscent of the eukaryotic nucleus, and seems to undergo a process similar to endocytosis for uptake of extracellular material.

Devos and Reynaud suggest that the PVC bacteria are evolutionary intermediates on the road to eukaryotes and archaea, but as far as I know there is no phylogenomic evidence that suggests PVC bacteria are more closely related to eukaryotes and archaea than the rest of bacteria. Instead, I wonder whether these features are instead relics from LUCA, which as I mentioned in a previous post, might have been surprisingy complex and eukaryote-like. Could the lack of some of these features in other bacteria in fact be a derived, rather than ancestral state?


It seems the hypothesis doesn't hold up to scrutiny, and in fact "all of the PVC traits that are currently cited as evidence for aspiring eukaryoticity are either analogous (the result of convergent evolution), not homologous, to eukaryotic traits; or else they are the result of horizontal gene transfers." 
Planctomycetes and eukaryotes: A case of analogy not homology.


Devos, D., & Reynaud, E. (2010). Intermediate Steps Science, 330 (6008), 1187-1188 DOI: 10.1126/science.1196720

Fuerst JA, & Webb RI (1991). Membrane-bounded nucleoid in the eubacterium Gemmatata obscuriglobus. Proceedings of the National Academy of Sciences of the United States of America, 88 (18), 8184-8 PMID: 11607213

Lonhienne, T., Sagulenko, E., Webb, R., Lee, K., Franke, J., Devos, D., Nouwens, A., Carroll, B., & Fuerst, J. (2010). From the Cover: Endocytosis-like protein uptake in the bacterium Gemmata obscuriglobus Proceedings of the National Academy of Sciences, 107 (29), 12883-12888 DOI: 10.1073/pnas.1001085107

An ancient family of SelB elongation factor-like proteins with a broad but disjunct distribution across archaea

Our newest paper has just been published in BMC Evolutionary Biology! Click here to read!

It describes our phylogenetic analysis of aSelBL, a strange group of proteins in archaea that are close relatives of the selenocysteine-specific GTPase elongation factor SelB. Intriguingly, they're found in archaea that don't use selenocysteine, and they have a very much disrupted GTPase domain. Their function is quite a mystery! We can only speculate from our in silico analyses, but hopefully one day some brave souls will tackle the question experimentally!

Bacterial contamination in eukaryotic genomes

A few times recently, I've been blasting and retrieving sequences from NCBI RefSeq, making phylogenetic trees, and then being shocked to find the odd sequence from Xenopus (frog), Ixodes (tick) or Nematostella (Sea Anemone) sequences nested deeply within the bacterial part of the tree. Also with branches comparable to the length of bacterial branches. Inititally, when I got these sequences, I was very excited. I probably exclaimed something like "Bloody hell! Ticks have another mitochondrial elongation factor EF-Tu and it looks like very recent horizontal gene transfer from beta-proteobacteria!", cos that's what it looks like- the nesting suggesting HGT and the lack of this version in close tick relatives suggesting it's recent. Of course such a find would be astounding: HGT into a vertebrate, wow. But then I checked Entrez gene and blasted several upstream and downstream genes. They ALL hit bacterial sequences before eukaryotic ones. The whole scaffold was bacterial. I've found the same in these Xenopus and Nematostella cases. So it looks like contamination from bacteria, rather than HGT - some bacteria hanging out with the eukaryote of interest inadvertently got parts of their genome sequenced, and these sequences got included in the genome release, under the name of the eukaryote. And this ends up in NCBI RefSeq:

"The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq is a foundation for medical, functional, and diversity studies; they provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis (especially RefSeqGene records), expression studies, and comparative analyses."
So RefSeq should be reliable. At least the sequence you're looking at should be from the organism it says so in the record. These sort of contaminants are not too much of a problem for me, because I don't do high-throughput genome comparisons and I check my trees and follow the trail of funny-looking sequences.  However some people do do high throughput genome comparisons, and unless they are able to check the reliability of each sequence, or have other methods of filtering out possibly dubious sequences, they may be falling foul of such contaminants. I'm not sure just how common these cases are, but I've found around 5 examples in just a couple of proteins in the last few months.

I'm not an expert in sequencing and assembling genomes, so I'm not sure if this is an easy thing to avoid or not. But it seems like it wouldn't be too hard to scan the genome for cases where the whole scaffold is bacteria-like, and remove those sequences until they can be checked.

Don't get me wrong, RefSeq and all the NCBI databases are amazing resources, and I use them daily. But, it would be really great if the parties who are submitting genome sequences could do a bit more of quality control to make the resource as reliable a stable reference as it sets out to be.


Edit: here's an example from the Xenopus tropicalis genome. I'd love to get some comments on this... would removing this kind of contamination be easy, and should it be done? Or is it up to the people who use the sequence for research to check?

The nature of our last common ancestor: simple and streamlined or complex and flabby?

This post was chosen as an Editor's Selection for

What is the ancestor of all life on Earth? It's one of the biggest questions in Biology... no, in Science... no in Life. I may be biased, being an evolutionary biologist, but I'm pretty sure that along with the origin of the universe, and the nature of consciousness, it's one of the biggies.

The name of this legendary ancestral creature is LUCA (I have Suzanne Vega singing in my head now), which stands for the Last Universal Common Ancestor.

Bacteria, Archaea or Eukaryote?

So what sort of creature was LUCA? There are three main types of organisms alive today: eukaryotes, bacteria and archaea. The latter two can be grouped together under the name prokaryotes, which are all single celled organisms with diverse habitats from the human gut to deep sea vents. We are eukaryotes, and eukaryotes whether multicellular like us, or single cellular like the malaria parasite Plasmodium, are on the whole more "complex" than prokaryotes. We tend to have a wider range of molecules in our cells that interact in more ways, forming larger interaction networks. Most known eukaryotes also have mitochondria, enslaved highly reduced bacteria that provide the extra energy we need to be so complex.

Figure from M Gouy & M Chaussidon Nature 451 (2008)

 So it is natural to expect that prokaryotes came first, and eukaryotes later (most likely from archaea rather than bacteria, but the relationship between eukaryotes and archaea is a very contentious one, so I will gloss over it for now, and maybe discuss later in another blog post). It is possible to test the nature of LUCA, by looking at the genes that are conserved among prokaryotes and eukaryotes. If the hypothesis is that LUCA was a prokaryote, the expectation is that the types of proteins in common among prokaryotes and eukaryotes would most resemble those of prokaryotes. In 2007, this question was addressed by Chuck Kurland et al., using so called fold superfamilies or FSFs. These are classifications of proteins, based on their three dimensional structures. This study found that, surprisingly, "the genomes of the last common ancestor (LUCA) encoded a cohort of FSFs not very different from that of modern eukaryotes."

Numbers of FSFs from 19 genomes of eukaryotes, archaea and bacteria. Figure from C.G. Kurland et al.  Biochimie 89 (2007)

"What? How can LUCA be so complex?" I hear you asking. "Doesn't that suggest intelligent design?" Well, the answer is very simple - LUCA herself was the result of many years of evolution, natural selection and extinction. No god-like designer required. A common misconception is that the last common ancestor is the first organism on Earth. In fact, LUCA is simply the only organism we can trace back to given the tiny fraction of lineages that are extant today. A hell of a lot of evolution went on before her, and in parallel to her, as well as after her. So it's possible that LUCA was surprisingly complex, something approaching a single celled eukaryote (although without the mitochondria, that would be quite a paradox). A question that then arises is "how come prokaryotes are so simple then?" Well, actually, prokaryotes may seem simple in that they have very few genes compared to eukaryotes, but in fact reduced complexity is not an easy thing to achieve. They have become streamlined in order to fill their particular niche, which may involve fast reproduction, aided by having small genomes. "Simple" organisms are often just highly specialised and economical in contrast to the flabbyness of eukaryotes with all their complex networks of molecular interactions, which enable them to outperform prokaryotes in their particular niches.

Here's a silly analogy. If they were travelers, bacteria and archaea would be backpackers, packing light for speed and ease of getting around. You never know when you might have to run for a train afterall, and you really don't want to have to pay for excess baggage when you're flying here and there. Eukaryotes on the other hand are those travellers dragging huge suitcases, packing everything they think they might need just in case, hairdryers, snacks, guidebooks, their own mini prokaryotes (mitochondria) etc etc. They didn't neccessarily put as as much effort as the prokaryotes into figuring out the absolute minimum required to survive, so some of the extra stuff is perhaps unnecessary, but some is very, very useful and allow them to flourish in situations where the prokaryotes get left behind. So they get along just as well as the prokaryotes, but in a different way.

By now you may have accepted that LUCA was a pretty cool, advanced creature, almost like a raptor or Professor Brian Cox in fact. Now though, I will be contrary and burst that bubble. There's one very important possible reason why protein fold superfamilies are present across the tree of life: horizontal gene transfer. We like to think that on the whole, genes are inherited vertically from parent to child every generation, and for multicelluar organisms, this is definitely the case. It's rather a relief too, to know that by shaking hands with an acquaintance, patting a dog or eating a bacon sandwich, you probably won't be picking up genes that will pass to your offpring. But for single celled organisms, particularly prokaryotes, horizontal gene transfer or HGT is rampant. So it is hard to be be sure that protein folds in common among all domains of life haven't just snuck in through the back door, so to speak, possibly with the help of viruses. LUCA may have been rather simple after all.

There are lots of things we don't know about LUCA. We can't even be sure that she had a DNA rather than RNA genome. For now, the mysterious LUCA is still slightly beyond our reach. But with more genomes being sequenced, particularly from unusual non-culturable organisms, maybe it will be possible to more precisely sort out the horizontal from the vertical transfers, and our picture of LUCA will become a little less obscure.

Kurland, C., Canbäck, B., & Berg, O. (2007). The origins of modern proteomes Biochimie, 89 (12), 1454-1463 DOI: 10.1016/j.biochi.2007.09.004
Lane N, & Martin W (2010). The energetics of genome complexity. Nature, 467 (7318), 929-34 PMID: 20962839

Back to work

Well, Vasya and I are back at work after the winter break, and we've dived back into it with a vengeance! Before the holidays we were musing about writing a short paper regarding translation initiation factors. After reading up a bit the other day, Vasya found something cool and unexpected, which set me doing some sequence searching and finding out something just as cool and even more unexpected... down the rabbit hole we go! Updates on the story when we emerge!