Field of Science

My creative contributions to the Festive Tree of Life

Last week I got a thick padded envelope from the Wellcome Trust. My colleagues were a bit surprised... I told them it was a grant, and well it kind of was, only not wads of cash, but lumps of modelling clay!

As part of their Festive Tree of Life project, the Wellcome trust sent out free packs of colourful modelling clay in the run up to the festive season. The idea is that you make science-inspired decorations and either hang them on their physical Christmas tree if you're somewhere in the vicinity, or post them on the Festive Tree of Life Flicker page.

I had a fun afternoon the other day making my decorations. Here they are:



Ribosome, with three tRNAs and EF-Tu. The pink thing is supposed to be mRNA, while the string is the nascent polypeptide chain coming out the exit tunnel... scale is overrated anyway.

The clay started to dry up by the time I got to the ribosome, and got less sticky and more tricky to deal with. After a few hours, the tRNAs fell off, and the subunits have now almost dissociated. All of this in the absence of termination and ribosome recycling factors too!

Happy holidays!

Bacterial genes in eukaryotes - function and phylogeny

There have been a couple of interesting papers recently on those eukaryotic genes that are more closely related to bacterial, than archaeal homologues. Such proteins are often organellar (athough they may be encoded in the nucleus), having entered eukaryotes with the bacterial endosymbiosis event that gave rise to the mitochondrion (or the event that gave rise to the chloroplast in the case of plants).

Giant, glowing mitochondria in the Deutsches Museum, Munich

The first paper, published in GBE, considers humans alone:

The human genome retains relics of its prokaryotic ancestry: human genes of archaebacterial and eubacterial origin exhibit remarkable differences.
David Alvarez-Ponce and James O. McInerney

This paper tests whether human genes of different ancestries (bacterial versus archaeal) have different effects on phenotype, essentiality of the gene (as judged by lethality in mice), function, selective constraint, expression and position in protein-protein interaction network (PIN). Proteins were classified as bacteria- or archaea-like based on best hit scores in Blast searches.

They found that human genes of archaeal ancestry, although fewer in number, tend to be have higher and broader expression levels, are more likely to be essential, are involved in core information processes, are under greater selection, and tend to be central in the PIN, as compared with bacteria-like genes.

I don't think they mention whether the archaea-like genes they identified have (more distant) homologues in bacteria too... if they do, then we're likely looking at the characteristics of universal, usually essential, core information processing genes. Whether archaeal-like genes that have been lost in bacteria are just as central in eukaryotes as universal genes, it isn't clear.

It's also not clear just how many of the bacteria-like genes are endosymbiotic in origin. 7,884 human genes were found to be bacteria-like, but the human mitochondrion is predicted to contain only 1000-1500 proteins. Of the remainder, while some are likely to be endosymbiotic in origin, but have acquired non-mitochondrial functions,  an unknown proportion may actually be of archaeal ancestry, but have been lost in archaea, and so are actually nothing to do with mitchondria. As these proteins are not universally essential, it follows that they would have a less central role in the cell... maybe the two gene populations that are considered in this paper are more like essential for life versus non-essential for life.

Anyway, it's a very interesting paper, particularly the finding that archaeal-like genes are less likely to be involved in inherited diseases. It's also surprising just how many genes did not have an identifiable homologue in either bacteria or archaea (58%).

The second paper, published in MBE addresses the evolutionary history of mitochondrial genes from a broad distribution of eukaryotes:

Rooting the eukaryotic tree with mitochondrial and bacterial proteins
Romain Derelle and B. Franz Lang

The idea here is that the endosymbiosis event happened more recently than the divergence of eukaryotes from archaea, and this can be exploited for rooting the eukaryotic tree of life with a less divergent outgroup. Usually eukaryotic phylogenies are made using archaea-like information processing genes, rooted with archaea. However, there is a problem of long branch attraction to the very distant outgroup. This is the phenomenon in molecular phylogenetics where fast evolving, and therefore long branched sequences that should be nested within the tree are pulled down to the base of the tree because of spurious similarities to the outgroup. Using mitochondrial genes to make trees rooted with bacteria theoretically reduces the distance to the outgroup and, therefore, the problem of LBA.

The idea is very neat and I like it in principle. There are a couple of issues though that I think might not help the LBA problem, and in fact might exacerbate the problem.

1. We don't know just how much more recently the mitochondrion was acquired after the divergence of eukaryotes from archaea. Some people might argue that this was the event was involved in the separation of the two lineages.
2. Mitochondrial genes have a faster rate of evolution than their cytoplasmic counterparts. 

Still, its interesting to see the results of rooting the eukaryotic tree in this way. The paper doesn't use best hits as in the above paper, but specifically targets known mitochondrial and mitochondrially targeted genes, such as cytochromes and two of the three universal mitochondrial translational GTPases, mIF2 and mEF-Tu. The third, mEF-G was likely excluded because it does not group with alpha-proteobacteria. Although... come to think of it, I don't see mEF-Tu or mIF2 grouping clearly with alphas in my trees... maybe EF-G was excluded because of its duplication early in eukaryotic evolution... though, mEF-Tu has also been duplicated in its history, and actually mEF-G1 is quite a conservative marker... anyway, this paper isn't about trGTPases specifically so I shouldn't drift off topic.

So, the root. They find the root between monophyletic unikonts (opisthokonts and amoebozoa) and bikonts (other eukaryotes), supporting one of the most popular hypotheses. There seems to good statistical support for this topology using the Bayesian inference method, however, maximum likelihood support is only achieved with much filtering of the dataset. It's an interesting new take on rooting the eukaryotic tree, but not one that will convince everyone.

As is so often the conclusion, we're just going to need more eukaryotic protist genomes!

Alvarez-Ponce D, & McInerney JO (2011). The human genome retains relics of its prokaryotic ancestry: human genes of archaebacterial and eubacterial origin exhibit remarkable differences. Genome biology and evolution, 3, 782-90 PMID: 21795752 Derelle R, & Lang BF (2011). Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Molecular biology and evolution PMID: 22135192

Coevolution, from hummingbirds to proteins

Coevolution (two or more biological objects evolving together) is a common feature of the evolutionary process on all levels from the molecular to the organismal. One of the most beautiful examples is that of hummingbirds and ornithophilous flowers. Hummingbirds feed on the nectar from the flowers, pollinating them in the process. In this mutually beneficial relationship, the plants have evolved flowers that attract the birds with colours that are conspicuous to the bird, and are shaped to perfectly accommodate the bird's beak. This coevolution has happened in a number of hummingbird/plant pairs.

Pic from Wikipedia article on humming birds

For more information on hummingbird/plant coevolution, I direct you to the publications of Ethan Temeles. As usual though, this post will be about proteins, and not whole organisms... and it will include my own crude drawings as usual...

Fig 1. Ta-da! Hummingbird/plant coevolution is a nice analogy for protein receptor/ligand coevolution. Circles show residues directly involved in the interaction.
At the molecular level, an example of coevolution is in the establishment of receptor-ligand interactions  (Fig 1). The receptor protein binding site has evolved in concert with the binding site of the ligand. In Fig 1, variation of the yelow residues in the receptor is correlated with that of the green residues in the ligand. The yellow sites are close together in the structure, but not necessarily neighboring in the sequence. For example, the amino acid sequence backbone of these imaginary proteins might be arranged like this:
Fig 2. Black lines show the amino acid sequence of the protein, within its structural density.

Thus, if the structure of the binding interface is known, it's possible to predict candidate coevolving sites. However from the sequence alone, it's not so straightforward. 

As discussed in a recent paper of Gloor et al in MBE (and references within), there are two explanations for how covarying positions come to be (and these are actually the extremes of the distribution of possible mutational effects):

1. Suppressor mutations. These arise when a mutation with a deleterious phenotype is suppressed by another mutation at a different position.
2. Covarions. These are cases when both the original residue and the mutated residue are functionally compatible, but mutation alters the spectrum of amino acids possible at another location.

Covarying sites may occur in the same protein or in different proteins (Figs 3-4).

Fig 3. Stars show between-protein correlated mutations at two interaction sites

In between-protein coevolution, green sites coevolve with yellow sites in our example. But there is also within-protein coevolution among yellow site residues and among green site residues. Imagine for instance a change of green residue that multiple yellow resides interact with at different times (Fig 4). Or perhaps the middle yellow starred residue in Fig 4 mutating and causing different constraints in what residues the neighboring yellow sites can mutate to. Either way, the three yellow sites will covary.  Remember that those sites are far away from each other in the sequence. So by showing that these sites co-vary, we can predict that they are functionally related, even if we don't have a structure

Fig 4. Correlated mutations can also occur within one protein

Prediction of co-evolving sites can be useful for understanding cases when binding site residues are unconserved in a multiple sequence alignment. It can also be useful for predicting intermolecular interaction sites, and allosteric sites (for example Chen et al., 2006). An allosteric site can remotely affect the evolutionary pressures on a distant site by affecting the structural orientation of the protein (Fig 5).

Fig 5. Correlated mutations among binding site residues and an allosteric site.

Prediction of covarying sites is challenging, not only because they may not always be clustered together in sequence and structure, but because covariation is a combined result of structural and functional constraints and background noise from shared phylogenetic ancestry and random processes.

There are two classes of methods for predicting covarying sites: tree-aware and tree-unaware. Tree aware methods search for sites whose covariation can not be explained by phylogenetic relationships, while tree-unaware methods ignore phylogenetic relationships, instead searching for covarying sites with the strongest signal.  The two classes of methods are discussed in Caporase et al (2008), in which it is concluded that tree-unaware methods perform as well as tree-unaware.

Using a tree-unaware method, Gloor et al. (2010) examine covariation in phosphoglyerate kinase evolution. They identify nonconserved sites that covary, and through mutagenesis show that the sites are important for function and epistatic to each other (mutation in one affects the function of the other). They find that covarying positions are just as as diverse within and between clades as are noncovarying positions, and suggest that most covarying positions arise from processes more like the covarion model, than the suppression mutation model.

The importance of covariation in sequence evolution is of interest to people like myself who use patterns of sequence variation to predict protein function. In studying molecular evolution of function, we largely rely on the assumption that the most functionally important positions are those that are conserved over time. Although this is generally the case, it seems that some important sites that are able to covary may slip through the net.
Recently, I've been experimenting with the tree-unaware code of Dunn et al., (2008) to find covarying sites Preliminary results, based on the RelA family are... confusing. Residues that would be predicted to be interacting from the structure are not flagged up as covarying, while there are many pairs of predicted covaring sites that are physically distant and don't seem likely to be allostric sites from the structure. It seems like as with many real-life case studies, real biology is a little bit more complicated than naive sketches like mine would have you believe! Oh well, time to delve a little deeper into the data set...

References and further reading:

Caporaso, J., Smit, S., Easton, B., Hunter, L., Huttley, G., & Knight, R. (2008). Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics BMC Evolutionary Biology, 8 (1) DOI: 10.1186/1471-2148-8-327

Codoñer FM, & Fares MA (2008). Why should we care about molecular coevolution? Evolutionary bioinformatics online, 4, 29-38 PMID: 19204805

Chen, Y. (2006). Evolutionarily Conserved Allosteric Network in the Cys Loop Family of Ligand-gated Ion Channels Revealed by Statistical Covariance Analyses Journal of Biological Chemistry, 281 (26), 18184-18192 DOI: 10.1074/jbc.M600349200

Dunn, S., Wahl, L., & Gloor, G. (2007). Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction Bioinformatics, 24 (3), 333-340 DOI: 10.1093/bioinformatics/btm604

Gloor, G., Tyagi, G., Abrassart, D., Kingston, A., Fernandes, A., Dunn, S., & Brandl, C. (2010). Functionally Compensating Coevolving Positions Are Neither Homoplasic Nor Conserved in Clades Molecular Biology and Evolution, 27 (5), 1181-1191 DOI: 10.1093/molbev/msq004

PVC bacteria and the prokaryote to eukaryote transition... maybe not.

It was an interesting hypothesis, but it seems the evidence for an origin of eukaryotes in the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum, as proposed by Devos and Reynaud in a Science article doesn't hold up to scrutiny.

In a recent paper by James McInerney et al. in Bioessays, the authors address each of the claimed eukaryote-like features and show that they are all likely to be either analogous (the result of parallel evolution, not shared ancestry), or are the result of horizontal gene transfer (HGT) events. In the words of the authors:

PVC are no more intermediates in the prokaryote-to-eukaryote transition than dragonflies are intermediates in the evolutionary sequence linking bony fish and birds.

The Bioessays paper is an important reminder that for any grand hypotheses about evolution, distinguishing between homologous and analogous characters is critical, as is establishing the direction of inheritance. And by far the best way to address these points is by taking advantage of the mass of genomic data available.

Molecular biology in the light of comparative genomics.

The week before last, I was at a conference at EMBL Heidelberg on Protein Synthesis and Translational control. I found it to be a stimulating, very enjoyable meeting, despite being rather disappointed by it being dominated by eukaryotic mechanisms, with only a few talks devoted to bacterial translation (and from what I remember none on organellar or archaeal translation). Translation is a universal, ancient process, and analyses of its highly conserved components have taught us much of what we know about the evolution and diversity of life on Earth. Despite that, out of ~300 (I would guess) participants, I was the only person presenting evolution stories. I wasn't selected for a talk, but I presented two posters, one on the RelA/SpoT family of ribosome-associated starvation response enzymes and one on gain and loss of mitochondrial translational initiation factors. I'm also pretty sure I was the only blogger and tweeter there too... hmm is that connected? Anyway, that's not the point here. My point is that a little evolution goes a long way in terms of predicting functions and universality of mechanisms, and it seems to me that molecular biologists aren't taking full advantage of that. There were a few talks mentioning new factors. My immediate thoughts were what are they related to? What do sequence comparisons suggest about them? Are they actually in any other organisms than yeast or humans? Model organisms are unfortunately not always representative, and we need evolutionary analyses to put results in perspective.

It's not just at this protein synthesis conference that I was the one and only molecular evolutionary biologist. Except for studies on ribosome origin, which are often low on evidence, high on speculation, and conducted by non-experts in molecular evolution (see my previous blog post), it's the same in the last few conferences I've been to. The protein synthesis field is a very active, exiting field with great people who are not afraid of bioinformatics and collaborating with bioinformaticians. Just, usually not molecular evolutionary biologists. That doesn't mean that people can't be enlightened though. After the first day, a co-attendee chatted with me about my research. He'd come across my paper of EF-G duplication and functional evolution and was excited by it. He said "I thought evolution was really boring, but actually it's interesting and very useful." He went on to propose an interesting collaborative project.

It's not that there are no published comparative genomics studies done at all in the protein synthesis field, just there are very few and though some are good, more often they are done badly (for example claiming orthology seemingly without making phylogenetic trees and getting it wrong), and with the focus on distribution without site-specific or structural analyses to make functional inferences. Specific expertise is required to do it right. The kind of expertise found in evolutionary biologists. However, the latter tend to stick to the evolution field, and it seems to me that important questions don't always get answered that way. I'm not sure, but I would hazard a guess that it's the same in other molecular mechanism fields (eg transcription, cell cycle/replication). There are lots of gaps in our knowledge about core cellular processes that need filling with the help of people competent in comparative genomics.

The scarcity of other molecular evolutionary biologists within the protein synthesis field could be considered to be an advantage for me. I have my own niche afterall, and I have no shortage of work because not all experimentalists overlook the importance of evolution. However, there are many, many more interesting proteins and and questions than I have time to look at. And I don't have a lab of minions as yet.

Here's an example of how wrong conclusions can be propagated in the absence of molecular evolution analyses. It is discussed in a paper that I'm writing up at the moment. I'll blog the full story with the real proteins named in the fullness of time, but this is just for the idea.

So, in an original paper, it was claimed that an insertion in human mitochondrial protein 1 in figure A, above compensates for the function of protein 2, present and essential in E. coli, but lost in humans. The insertion is not at all homologous to protein 2. Nevertheless, if E. coli protein 1 is modified to include the insertion usually only found in humans, protein 2, which is usually essential, becomes dispensable. So the authors of the original study claimed that the insertion evolved in eukaryotes to replace the function of protein 2. This is all very interesting, but I immediately had a question. Since protein 2 is universally absent in eukaryotes (this is already known), is the insertion universally present in eukaryotes? I think this is a very important question, which was not addressed in the original paper. Nevertheless, there has been a whole slew of other papers propagating the conclusion that the insertion evolved to take the place of protein 2. 

I was not content with an evolutionary conclusion drawn on the comparison of three organisms (the third was archaea, also missing the insertion), so I decided to answer this question myself. The result was oops. The insertion is actually only in vertebrates (see figure B, above).  So for millions of years, eukaryotes had been (and many still are) doing fine without either the insertion or protein 2. Maybe the insertion does replace the function of protein 2 in vertebrates, but there must be some other, unknown, possibly more general mechanism(s) compensating. And this is an interesting avenue worth pursuing.

That's one angle to the story, and it's a bit negative, true, but there's also a very positive angle to this particular paper that I'm writing. There is a protein 3, also universal in bacteria, and almost universal in mitochondria. It had never previously been found in yeast, however with some sensitive sequence searching (PSI-blast) I found a homologue, which I confirmed as the orthologue with phylogenetic analysis. This yeast protein 3 is very divergent though, with insertions and deletions relative to the well known ones, so whether it is the functional equivalent of protein 3 was still unknown. But that's when it becomes wonderful to be an evolutionary biologist among experimentalists, because our collaborators have been able to confirm the function of my newly identified protein in vivo.  So we have identified a new protein in yeast, essential for mitochondrial function, and a really nice evolutionary story to go along with it.

In conclusion, it's said so many times that it's becoming a nasty cliche, but Dobzansky's quote "nothing in biology makes sense except in the light of evolution" is spot on.  I would encourage experimentalists to consider whether molecular evolution can help answer some of your questions, or even raise some exciting new ones. I would also urge molecular evolutionary biologists to consider collaborating with experimentalists whenever you can, because it's so exciting and satisfying to get your predictions tested.

OK, now to finish this paper about proteins 1, 2 and 3, which I will blog about when the paper is published and available!

Molecular evolution of RSH proteins, lookouts and messengers of stress signals

A few days ago (the day before my 30th birthday actually), my most recent paper, along with Vasili Hauryliuk and Tanel Tenson, "The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life" was published with PloS ONE. Hurrah!

The RSH proteins comprise a superfamily of enzymes that synthesize and/or hydrolyze the alarmone ppGpp. ppGpp is a nucleotide that acts as an alarm signal, activating the “stringent” response in bacteria during starvation conditions and regulating various other aspects of cellular metabolism, often in response to stress. Vasya's blog has a wealth of information about the stringent response and the molecules involved.

Rel, RelA and SpoT are the classical, most well known “long” RSHs. The carry the ppGpp hydrolase, synthetase, TGS and ACT domain architecture. They have been found across diverse bacteria and plant chloroplasts. Additionally, dedicated single domain ppGpp-synthesizing and -hydrolyzing RSHs have also been discovered in disparate bacteria and animals respectively. However, until now there has been considerable confusion in terms of nomenclature, and no comprehensive phylogenetic and sequence analyses have previously been carried out to classify RSHs on a genomic scale.

To remedy the situation, I carried out high-throughput sensitive sequence searching of over 1000 genomes from across the tree of life, in conjunction with phylogenetic analyses, to identify and classify diverse RSHs in different organisms and unify the terminology for the field. We classify RSHs into 30 subgroups comprising three groups: long RSHs, small alarmone synthetases (SASs), and small alarmone hydrolases (SAHs). That's 19 more subgroups than were previously known. Those previously unidentified RSH subgroups, which are mostly found in bacteria, but sometimes in archaea and eukaryotes, can now be studied experimentally.

What I think is possibly the most interesting result came from comparative sequence analysis of long and small RSHs. I found exposed sites limited in conservation to the long RSHs that seem to be involved in transmitting regulatory signals. These signals may be transmitted via inter-domain interactions, or inter-molecular interactions either among individual RSH molecules or among long RSHs and other binding partners such as the ribosome. These sites in RelA can now be directly targeted with mutagenesis in order to text these predictions.

I have to say I'm disappointed with how the figures look in the PDF version of the paper. Lines are really not as crisp as my uploaded figures. Unfortunately the tables also don't look how they're supposed to due to them having being automatically formatted for the PLoS format. I wasn't given the opportunity to check them in a proofing stage either. Oh well, I'm just happy this story is now out there!

Gemma C. Atkinson, Tanel Tenson, & Vasili Hauryliuk (2011). The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life PLoS ONE, 6 (8)

Drifting towards complexity, or complexity as a crutch

Finally, I will finish this blog post, which I started months ago! I've been sooo busy with various things, including writing a paper (now in submission) and finishing off work for a handful of side projects, that my blog has become seriously neglected. However, now I have a (relatively) spare afternoon that I can devote to a bit of reading and blogging.

The paper that I'm hastily refreshing my memory about is "Non-adaptive origins of interactome complexity". However, I'm not going to blog too much about it, because PsiWaveFunction has written a very detailed piece, that I highly recommend checking out. However, I'm very interested in this paper, so I can't resist blogging just a little bit about it!

In the paper, Ariel Fernández and Michael Lynch consider the effect of population sizes on evolution of complexity, as measured by the number of protein-protein interactions. Multicellular eukaryotes have small popualtion sizes as compared to microbes, which leaves them vulnerable to the phenomenon of genetic drift, where changes get fixed in the population because they fail to get filtered out by efficient selection. These changes can sometimes be mildly deleterious. The type of deleterious mutations considered in this study are those that increase the area of the protein in contact with water (the protein-water interface or PWI), and so reduce the stability of the protein in solution.

The authors find a correlation between drift and protein structural integrity, and suggest "that the emergence of unfavourable PWIs promotes the secondary recruitment of novel protein–protein associations that restore structural stability by reducing PWI". So essentially, proteins are recruited into multi-subunit complexes not to explore some new functional space as is commonly thought, but rather to stabilise decrepit proteins that have evolved through drift, itself caused by small population sizes.

Like I've blogged about previously, evolution does not always lead to the optimal solution. Just as long as a system is good enough to work, that's fine. And if that means employing some elaborate hacky, complex solution, that's not a problem (just as long as you can handle a flabby genome).

I like coming up with silly analogies, and in this case it's complexity as a crutch. Eukaryotic proteins are careless and clumsy, They end up lame, and although they can hobble around enough to get by, its easier with molecular crutches. But the big question is what is the order of events? Was the crutch being used before or after the protein became lame. Lukeš et al. argue that eukaryotic proteins were already messing about with crutches even before they needed them. This is so-called presupression or constructive neutral evolution (CNE).

Presupression is a ratchet-like process, which Lukeš et al explain as follows:

"A biochemical reaction under selection is catalyzed by a cellular component A (nucleic acid or protein) that fortuitously interacts with component B either directly, by binding, or indirectly, through the products of B’s own selected activity... The interaction, though not under selection, permits (suppresses) mutations in A that would otherwise inactivate it. Under these conditions, mutations will unavoidably occur, making A dependent on B."

But actually, both these models are not mutually exclusive. Whether in some cases complexity is a crutch that overcomes the limp of an already hobbling protein, or whether it is a fortuitous accessory which eventually becomes depended upon, we are beginning to understand that increasing complexity is probably largely a non-adaptive phenomenon, and not neccessarily the function builder that was previously thought.

And now I'll direct you over to the fabulous Sceptic Wonder blog by PsiWaveFunction, who's done an astounding job covering the Fernández and Lynch paper.

Also, watch this blog space for further discussion of the Lukeš et al paper, specifically their description of ribosome evolution under CNE.

Fernández A, & Lynch M (2011). Non-adaptive origins of interactome complexity. Nature, 474 (7352), 502-5 PMID: 21593762

Lukeš J, Archibald JM, Keeling PJ, Doolittle WF, & Gray MW (2011). How a neutral evolutionary ratchet can build cellular complexity. IUBMB life, 63 (7), 528-37 PMID: 21698757

Conference on antibiotics and protein synthesis

Next month in Tartu there will be a conference on antibiotics and protein synthesis organized by Tanel Tenson.

Registration is FREE and now open!

Confirmed speakers:

James Williamson (Scripps Research Institute),
Alexander Mankin (University of Illinois at Chicago),
Steven Douthwaite (University of South Denmark),
Daniel Wilson (University of Munich),
Karen Shaw (Trius Therapeutics),
Ada Yonath (Weizmann Institute of Science).
Birte Vester (University of Southern Denmark)
Joyce Sutcliffe (Tetraphase Pharmaceuticals)
Mans Ehrenberg (Uppsala University)
Chaitan Khosla (Stanford University)
Markus Zeitlinger (Medical University of Vienna)

It's also probable that Vasili Hauryliuk and I will be speaking too.

the evolutionary rate of protein–protein interactions

This is my first post for a while, since I've been pretty busy - first I was writing a grant proposal for some research money, then I was on holiday in Bavaria and Austria, after which I was busy finishing off a manuscript on evolution of starvation response enzymes. As of yesterday, the manuscript is with my boss, so time to catch up with the world.

I noticed an interesting upcoming PNAS paper: Measuring the evolutionary rate of protein–protein interaction. This tackles a subject close to my heart - functional evolution of proteins. The authors tackle measuring the rate at which functional changes happen, with function in this case measured by gain and loss of PPIs (protein-protein interactions).

They start by comparing yeast S. cerevisiae, which has abundant PPI data with another yeast Kluyveromyces waltii. These two diverged ∼150 MYA, and they are sort of special relatives since a whole genome duplication occurred in the lineage to S. cerevisiae after the divergence of K. waltii. This worried me that this could affect the rate of PPI change, due to the sudden influx of homologues in the S. cerevisiae lineage inflating the PPI count.  However, the problem of duplicates was surmounted by only considering one to one orthologs (ie proteins related by vertical descent and not gene duplication (which would be paralogues)). In all, 43 proteins passed the yeast 2 hybrid test for PPIs, and all of these were found to be conserved in both yeasts. From this, they estimated that the 95% confidence interval of the total rate of PPI evolution is between 0 and 4.6 × 10−10 per PPI per year

They then went on to consider animals.  Using PPI data from nemtodes, they found two of five  confirmed S. cerevisiae PPIs are conserved in C. elegans.  These two species diverged ~1,300 MYA, so the 95% confidence interval is 1.6 × 10−10 to 2.0 × 10−9. Using transcription factor PPI data from humans and mice, which diverged 90 MYA, they found that six of six mouse PPIs are conserved in humans. From this, they estimate the 95% confidence interval is 0 to 5.5 × 10−9. Using all the dataset together, they arrive at the final value for the rate of PPI change: (2.6 ± 1.6) × 10−10 per PPI per year.

It's great to have a value for the rate of this sort of rare evolutionary change, and the authors are certainly very rigorous is eliminating the possibility of false positives and false positives. However,  I'm left wondering whether after all this filtering, they're left with enough data to be really sure of their estimates. I count 54 PPIs in total, of which only 3 are lost, and that's in one lineage. Is that really enough data to go on? Well, I'm certainly not a statistician, so I can only assume that this was checked out thoroughly by folks much more informed on this kind of thing than I am.

An interesting future route would be to compare protein substitution and PPI rates between lineages. I'm wondering whether organisms with high amino acids substituion rates (like nematodes and other parasites) have a PPI rate that's (relatively) just as high, or whether this is dampened by compensatory mutations in binding interfaces. It'd also be interesting to compare the eukaryotic PPI rate to the bacterial one.

Qian W, He X, Chan E, Xu H, & Zhang J (2011). Measuring the evolutionary rate of protein-protein interaction. Proceedings of the National Academy of Sciences of the United States of America PMID: 21555556

Detecting mutual exclusivity of gene families

Gene networks are popularly used in systems biology to show functional associations among genes within a single genome, taking advantage of available experimental data on intermolecular interactions. Co-evolutionary networks are another way of showing functional associations among genes, in this case using presence/absence patterns of homologous genes across genomes to predict likely interaction partners. A nice example from proteins that I'm interested in are the components of the selenocysteine incorporation machinery for incorporating the amino acid selenocysteine into growing peptides. Not all organisms utilise selenonocysteine, but those who do encode a whole package of genes for its synthesis, charging onto tRNA and delivery to the ribosome. If a gene X was to be found in only that strange collection of (not always closely related) organisms with the selenocysteine machinery, chances are that X either uses selenocysteine or is also involved in its metabolism. As an aside, STRING is a really nice web application for visualising networks of functional associations compiled from various sources of evidence (co-occurence, co-expression, gene neighbourhood, and experiments).

A new paper by Zhang et al. in GBE presents a new and interesting approach for analysing co-evolutionary networks, by detecting Mutually Exclusive Orthologous Modules (MEOMs). In their words: "A MEOM is composed of two sets of gene families, each including gene families that tend to appear in the same organisms, such that the two sets tend to mutually exclude each other (if one set appears in a certain organism the second set does not)."

MEOMs are interesting because they reflect the replacement of one set of genes by another. This could be due to lineage-specific or environment-specific adaptations. The authors analyze a co-evolutionary network based on 383 organisms from across the tree of life and find that MEOMs most often include gene families involved in transport, energy production, metabolism, and translation.  They suggest that changes in the metabolic environment of an organism require adaptation to new sources of energy, and this triggers of replacement of genes, complexes and pathways in individual lineages. They also find many outer membrane proteins in their MEOMs, suggesting that as these proteins interact with the extracellular environment, they are frequently replaced during adaptation.

It's all very interesting, and I hope the authors will consider making a searchable web interface to their database of MEOMs. Their supplementary data is a bit awkward to navigate, and this kind of data is just crying out for visualisation.  I would love to be able to scan proteins in my data sets for potential MEOM membership.

Xiuwei Zhang, Martin Kupiec, Uri Gophna, & Tamir Tuller (2011). Analysis of Co-evolving Gene Families Using Mutually Exclusive Orthologous Modules Genome Biology and Evolution : 10.1093/gbe/evr030

The ancestral ribosome: my reservations

I work on deep evolution of ribosome-associated proteins, so of course I'm very much excited by research on deep evolution of ribosomal RNA. However, I have some concerns about some of the work in this field relating to the composition and structure of the ancestral ribosome, or as sometimes called, proto-ribosome. Actually, it’s less that I have concerns about the work, more that I have some small, but (at least to me) important concerns about the interpretations and subsequent speculations. Anyway, the other day, as I listed to Ada Yonath's and Loren Williamson's talks at the Suddath Symposium on the Ribosome, I was reminded about these concerns, and decided it's probably a good idea to blog about them. 

Before I start moaning, I want to stress that it's really exciting that people are trying to answer such deep evolutionary questions, and I genuinely think they have made some interesting and important discoveries about the relative ages of parts of the ribosome, and about small catalytic RNAs that can behave like ribosomes, I just don't think those catalytic RNAs are ancestral ribosomes. I think they are perhaps some shared component of ancestral and modern ribosomes

What it comes down to is that in general the people working on proto-ribosomes are assuming that evolution proceeds from small and simple to large and complex. In fact, this is not necessarily the case, as I have blogged about previously. Small and perfectly formed is hard to evolve, while big and clumsy with time for optimisation is less hard.

Figure 1. My crude drawing to demonstrate evolutionary progression. Green is the modern ribosome, with the predicted ancient parts in red. Black is sequence nonhomologous to the modern ribosome. A) shows the idea that is often presented: a small protoribosome gets bigger over time. B) shows my hypothesis: continual loss and gain of sequence with some retention of a conserved core.
Ada Yonath’s talk at the symposium on the ancestral peptidyl transferase centre (PTC, the region where peptide bonds are formed between amino acids) really captured my imagination. The PTC is buried right in the middle of the ribosome and consists of  two fragments of rRNA with rotational structural symmetry between the P (peptidyl) site tRNA binding rRNA and the A (acceptor) tRNA binding rRNA. This symmetrical region is highly conserved in sequence (98% identity among organisms), but not between each symmetrical unit. Ada proposes that this symmetrical region is the oldest part of the ribosome, and that this minimal region is a functional machine on its own. In support of this, the CCA-end of tRNA fits in perfectly, and their structural studies indicate it could provide a rotary motion of tRNAs that is required get peptidyl transfer. This is a really nice story, and so far I’m totally in support. What I have problems accepting is that this minimal rRNA dimer IS all that was present of the protoribosome (as in fig 1A). Why could there not have been extra RNA around it that was replaced during evolution (as in fig 1B)? Via the online participation (which was fantastic by the way) for the symposium I asked Ada about this:

Gem: The small symmetrical region might be the only region modern ribosomes have in common with the ancestral proto-ribosome. But it almost seems TOO streamlined. Could the protoribosome actually have been bigger than that core region, and there could have been loss as well as gain of sequence? 

Ada: we haven’t thought of that… but you can speculate anything.

That’s exactly my concern, that you can speculate anything in this field. There are very few clues to go on, and they don’t give anything conclusive. Where evidence dries up, all you can do are thought experiments, based on examples we know of.  And we know from extant ribosomes that there have been lineage-specific loss and gain of sequence.  Good examples are mitochondrial ribosomes which have lost a good deal of rRNA and replaced it with protein.

A similar model of ribosome evolution to Ada's, proposing progressive addition of rRNA onto a minimal but functional PTC frame is presented by Bokov and Steinberg, Nature (2009). In this paper, the authors examine the inter-domain interactions and structural dependencies in the large subunit to infer relative age. Again, this is great, fascinating work, and it is also consistent with a model of replacement and optimisation, rather that the “aggrandizement” that they presume in their model.

After Ada Yonath’s talk in the symposium came Loren Williams, who also works on figuring out the ancestral ribosome. Williams and colleagues compared the sequences and structures of  archaeon H. marismortui and bacterium T. thermophilus ribosomes and found that sequence and conformational similarity of the rRNAs are greatest near the PTC, and diverge smoothly with distance from it.  They show a beautiful figure of the ribosome as an onion, which makes their point perfectly.

Again, these particular results are very clear and interesting, it’s just some of the assumptions about the evolutionary process that I have issues with. I noticed in the talk that Loren consistently equated “conserved” with “old.” In fact, “conserved” usually means “important”. Jamie Williamson who was in the audience also made this point during the talk, and Loren replied that he could not argue with that. In the case of the ribosome, the central parts are not only involved in catalysis, they are also important for maintaining the three dimensional structure. So they are very important. It’s the same reason why proteins show strong conservation of buried amino acids.

Some other evolutionary statements and suppositions by Loren also were a bit iffy, such as: "mitochondrial ribosomes are running evolution backwards." Yikes. Drastically cutting down rRNA and replacing with protein independently in multiple lineages is definitely not running evolution backwards… in fact evolution is never, ever backwards. I also have a problem with supposing things that it isn’t necessary to suppose: Loren hypothesises that the ribosome binding tails of ribosomal proteins are older than the globular domains, and were originally non-coded, they then became fused to globular domains. There really is no evidence for this as far as I can see. The tails and insertions that protrude into the ribosome are very biased in amino acid content, and if they’re anything like ribosome-binding extensions of translation factors such as IF3, they readily appear and vary in length and primary sequence during evolution.  These sort of structures seem easy to add.

Maybe my complaints can be considered to be petty in a field that is necessarily rife with speculations, but I just think it’s important not to push the speculations too far, in order to keep our scientific integrity and not become like the cranks that publish their “evolutionary biology” in the Journal of Cosmology. For example, I loved the first half of Ada’s talk, but she finished it with a discussion of the ability of her two symmetrical fragments to dimerise, and suggested that in a population of these fragments, their non-uniform tendency to dimerise was a kind of "pre darwinian Darwinian” ribosome evolution that took place in the prebiotic world. She also suggested that these fragments may also be proto-tRNAs. For me, this is too far removed from the evidence, and these are speculations too far.

BUT! Having said all that, wild speculation is bloody well fun, so I will offer my own hypothesis (see fig. 1 B). I think the first ribosome could have been big, flabby and clumsy, an amalgamation of RNAs that were perhaps already involved in some other catalysis such as nucleic acid polymerisation, that through chance flopping around, managed to catalyse (probably in a very inefficient way) peptide bond formation. The efficiency of bond formation between particular amino acids may have been influenced by the certain nucleic acids being polymerised in the active site, as in some primitive ‘code’. This protein synthesising proto-machine maybe had nothing recognisably in common with modern ribosomes, but it was subsequently fine-tuned through loss of gain of sequence until it became something resembling the ribosome that we know and love. 

Refs and further reading

Belousoff MJ, Davidovich C, Zimmerman E, Caspi Y, Wekselman I, Rozenszajn L, Shapira T, Sade-Falk O, Taha L, Bashan A, Weiss MS, & Yonath A (2010). Ancient machinery embedded in the contemporary ribosome. Biochemical Society transactions, 38 (2), 422-7 PMID: 20298195

Bokov K, & Steinberg SV (2009). A hierarchical model for evolution of 23S ribosomal RNA. Nature, 457 (7232), 977-80 PMID: 19225518

Hsiao C, Mohan S, Kalahar BK, & Williams LD (2009). Peeling the onion: ribosomes are ancient molecular fossils. Molecular biology and evolution, 26 (11), 2415-25 PMID: 19628620

Promiscuous proteins

Gone are the days when the one protein, one function presumption prevailed. Many proteins are multifunctional and multispecific, that is they have multiple binding partners for carrying out various roles in the cell. Here's a new review by Erijiman et al. in Biochemistry about multispecifity, covering various examples of promiscuous proteins and the different ways in which they achieve their multispecificity.

Proteins can interact with multiple binding partners by having distinct binding interfaces or domains. By this route, it's possible for the protein to optimise each binding site for its specific partner as the interfaces are independent (although there may be some cross-talk). An example of this from the proteins that I'm interested in is the Rel protein of bacteria. This protein has a synthesis domain for producing the alarmone ppGpp, and a hydrolysis domain for degrading it. The interfaces are on different sides of the protein, so are in some sense independent, although binding of a molecule in one site may influence the function of the other site by switching the conformation of the protein.

As an alternative solution, a protein may bind through one interface that is able to interact with multiple partners. An example of this is the archaeal elongation factor EF1A, which delivers aminoacylated tRNA, release factor aRF1 and mRNA decay protein aDom34 to the ribosome, binding all three by overlapping binding sites.

My rather simplistic representation of how a protein's binding interfaces can be distributed. A: independent binding sites eg Rel. B: overlapping binding sites eg aEF1A.

Multispecificity is great for the cell (especially cells with reduced, streamlined genomes) in that from just one gene, you get a lot of functional value. However, it also introduces some compromises for the protein, in terms of optimising its specificity for binding partners (especially true for proteins with overlapping binding sites), and brings about challenges in terms of regulating the different functions. A way to escape these problems is by gene duplication and subfunctionalisation for the different binding functions of the protein. Indeed this has occurred in some organisms for both of my examples above. In proteobacteria, Rel has been duplicated, resulting in RelA and SpoT, specialised for ppGpp synthesis and hydrolysis respectively. Similarly, in eukaryotes, two duplications of EF1A-like proteins have led to eEF1A, eRF3 and Hbs1, specialised for binding aa-tRNA, eRF1 and eDom34 respectively. However, it would be wrong to say that eEF1A now only has one function, as in fact it has many many more functions... but that's another story!

For more info on these proteins, check out my other blog posts:

Erijman A, Aizner Y, & Shifman JM (2011). Multispecific recognition: mechanism, evolution, and design. Biochemistry, 50 (5), 602-11 PMID: 21229991

Hogg T, Mechold U, Malke H, Cashel M, & Hilgenfeld R (2004). Conformational antagonism between opposing active sites in a bifunctional RelA/SpoT homolog modulates (p)ppGpp metabolism during the stringent response [corrected]. Cell, 117 (1), 57-68 PMID: 15066282

Saito K, Kobayashi K, Wada M, Kikuno I, Takusagawa A, Mochizuki M, Uchiumi T, Ishitani R, Nureki O, & Ito K (2010). Omnipotent role of archaeal elongation factor 1 alpha (EF1α in translational elongation and termination, and quality control of protein synthesis. Proceedings of the National Academy of Sciences of the United States of America, 107 (45), 19242-7 PMID: 20974926

Tsunami and earthquake crisis - Non-Believers Giving Aid

Because praying for Japan doesn't help anyone, but donations save lives.

"Non-Believers Giving Aid and the Richard Dawkins Foundation for Reason and Science are once more partnering with the International Committee of the Red Cross to bring much needed help to people whose lives have been torn apart by natural disaster. Every cent and penny of money donated via Non-Believers Giving Aid will be forwarded to the International Red Cross – and if you are in the UK and you complete the Gift Aid Declaration along with your donation, we will pass that on in its entirety too."

Well, this is unexpected! Drosophila mitochondrial translation elongation Factor G1 contains a nuclear localization signal.

Most eukaryote genomes encode two mitochondrial translation elongation factor Gs. I recently had a paper in MBE about the origin and evolution of these factors, and I've blogged about it previously. I spotted a very surprising article in PloS One today: "The Drosophila Mitochondrial Translation Elongation Factor G1 Contains a Nuclear Localization Signal and Inhibits Growth and DPP Signaling." For some reason, mtEFG1 is dual targeted to the nucleus as well as the mitochondrion. The localisation signal is proposed to be found at the C terminus, unlike the mitochondrial transit peptide, which is found at the N terminus. The authors suggest a model in which "if mitochondrial ATP synthesis is low or EF-G1 is overexpressed and import of EF-G1 proteins into mitochondria is a limiting step, some EF-G1 proteins can accumulate outside of mitochondria and translocate into the nucleus, where they inhibit cellular growth and proliferation."

The authors carry out mutagenesis and subcellular localization analysis of mtEFG1, and find that although the Drosophila mtEFG1 gene is essential, it's not required in every tissue. This leads them to suggest that in some tissues, mtEFG2 and not mtEFG1 is the primary translocation factor. This would be very unexpected as neither spirochete or human spd/mtEFG2 can not promote translocation, and instead spd/mtEFG2 is proposed to be specialised for EF-G's role in ribosome recycling. Additionally, the alignment in my paper shows mtEFG2s don't have the conserved amino acids involved in translocation functions, such as interaction with peptidyl-tRNA. However intramolecular and ribosome interaction sites are well conserved in mtEFG2, suggesting it maintains EF-G-like structural integrity and ribosome binding abilities. Maybe this is sufficient to promote translocation in some conditions? In fact, even the more distantly related EF-G2 of Thermus, which belongs to a whole other ancient subfamily is capable of translocation, hinting that although classical EF-G is very well conserved at the primary sequence level, at least in some conditions the ribosome can accommodate and translocate with more divergent homologs that maintain an EF-G-like structure.

The model of mtEFG1 subcellular location being related to mitochondrial ATP synthesis proposed by Trivigno and Haerry presents a paradox, which they acknowledge: "If EF-G2 functioned as an elongation factors in tissues like the heart, mitochondrial translation and ATP synthesis would occur at normal levels, and EF-G1 would be imported into mitochondria and not accumulate in the nucleus. On the other hand, in tissues like the liver, where EF-G2 cannot function as an elongation factor, mitochondrial translation would decrease, ATP levels would drop, EF-G1 import into mitochondria would decrease and accumulation in the nucleus increase, which would further exacerbate the problem."

So, in conclusion, it's all rather surprising and the model just doesn't seem quite right... it's all very well for a bioinformatician to say this I know, but more experiments needed!


Atkinson GC, & Baldauf SL (2011). Evolution of elongation factor g and the origins of mitochondrial and chloroplast forms. Molecular biology and evolution, 28 (3), 1281-92 PMID: 21097998

Trivigno C, & Haerry TE (2011). The Drosophila Mitochondrial Translation Elongation Factor G1 Contains a Nuclear Localization Signal and Inhibits Growth and DPP Signaling. PloS one, 6 (2) PMID: 21364917

Tsuboi, M., Morita, H., Nozaki, Y., Akama, K., Ueda, T., Ito, K., Nierhaus, K., & Takeuchi, N. (2009). EF-G2mt Is an Exclusive Recycling Factor in Mammalian Mitochondrial Protein Synthesis Molecular Cell, 35 (4), 502-510 DOI: 10.1016/j.molcel.2009.06.028

Connell, S., Takemoto, C., Wilson, D., Wang, H., Murayama, K., Terada, T., Shirouzu, M., Rost, M., Schüler, M., & Giesebrecht, J. (2007). Structural Basis for Interaction of the Ribosome with the Switch Regions of GTP-Bound Elongation Factors Molecular Cell, 25 (5), 751-764 DOI: 10.1016/j.molcel.2007.01.027