Field of Science

PhD position available in Molecular Evolution!

Having got my Estonian Science Foundation grant funded recently, I have an open PhD position available! See below.


-----------------
We are seeking a highly motivated PhD candidate to be supervised by Dr Gemma Atkinson within the group of Prof Tanel Tenson in the Institute of Technology, University of Tartu, Estonia.

Dr Atkinson’s research addresses protein functional evolution, using bioinformatic approaches and primarily focusing on the ancient families of proteins involved in translation of mRNA to protein. Members of these families are often essential for life and predate the last common ancestor of all life on earth. Thus by studying these proteins we can gain understanding of the fundamental processes of life, and how these processes have evolved over billions of years.

The PhD project will take advantage of the thousands of whole genome sequences now available for the study of evolution of protein families from the origin of life to the present day. Work will involve sensitive sequence searching to identify the presence and absence of particular proteins across genomes, phylogenetic analyses to reconstruct their emergence and evolution, and sequence analyses to link domain- and site-specific patterns of amino acid substitution with molecular function. Specifically, the proposed PhD project will target the ABC superfamily of ATP-binding enzymes found in all domains of life.  This superfamily comprises enigmatic proteins of diverse, and often unknown functions. Several ABC enzymes have recently been found to have important roles in regulation of translation such as ribosome recycling protein Rli1/ABCE1, yeast-specific elongation factor eEF3 and starvation response enzymes Gcn1 and Gcn20.

From the results of the PhD, it is expected that enzymes with novel roles in protein synthesis will come to light as interesting targets for subsequent experimental study. Dr Atkinson collaborates with the lab Dr Vasili Hauryliuk, also in Prof Tenson’s group, for biochemical and genetic validation of in silico results. If the candidate so wishes, there is an opportunity to gain practical lab experience in Dr Hauryliuk’s lab.

The candidate should have:
  • a Masters degree in a biological or computational discipline
  • a strong interest in, and enthusiasm for molecular evolution
  • familiarity with basic sequence and phylogenetic analyses
  • experience in using a programming language such as Python, Perl, Java etc
  • fluency in spoken and written English

Estonia has a rich culture and beautiful natural environment, with unspoiled forests, meadows and coastlines. Enjoying warm summers and cold winters, the historical city of Tartu is the intellectual capital of Estonia, and its university is the leading research and development institution in the country. The Institute of Technology is a lively, modern centre for biological and technological research.

The PhD will be funded by a monthly stipend, with additional monies available for regular attendance at international conferences and workshops, and for visiting labs abroad. Information on funding is available by request.

Applications should contain:
  • a full CV with detailed description of previous relevant experience
  • a statement of academic interests
  • an electronic version of the Masters thesis
  • the names and contact details of at least 2 referees

The candidate is expected to start at the latest September 2012. Please send applications and informal enquiries to gemma.atkinson@ut.ee

Gemma Atkinson
University of Tartu,
Institute of Technology
Nooruse  1, 50411 Tartu, Estonia

More information about the research of Dr Atkinson can be found here:
http://lepo.it.da.ut.ee/~atkinson/gem_mac/gemma_c_atkinson.html



My creative contributions to the Festive Tree of Life

Last week I got a thick padded envelope from the Wellcome Trust. My colleagues were a bit surprised... I told them it was a grant, and well it kind of was, only not wads of cash, but lumps of modelling clay!

As part of their Festive Tree of Life project, the Wellcome trust sent out free packs of colourful modelling clay in the run up to the festive season. The idea is that you make science-inspired decorations and either hang them on their physical Christmas tree if you're somewhere in the vicinity, or post them on the Festive Tree of Life Flicker page.

I had a fun afternoon the other day making my decorations. Here they are:

Mitochondrion

Chloroplast

Ribosome, with three tRNAs and EF-Tu. The pink thing is supposed to be mRNA, while the string is the nascent polypeptide chain coming out the exit tunnel... scale is overrated anyway.


The clay started to dry up by the time I got to the ribosome, and got less sticky and more tricky to deal with. After a few hours, the tRNAs fell off, and the subunits have now almost dissociated. All of this in the absence of termination and ribosome recycling factors too!

Happy holidays!

Bacterial genes in eukaryotes - function and phylogeny

There have been a couple of interesting papers recently on those eukaryotic genes that are more closely related to bacterial, than archaeal homologues. Such proteins are often organellar (athough they may be encoded in the nucleus), having entered eukaryotes with the bacterial endosymbiosis event that gave rise to the mitochondrion (or the event that gave rise to the chloroplast in the case of plants).

Giant, glowing mitochondria in the Deutsches Museum, Munich


The first paper, published in GBE, considers humans alone:

The human genome retains relics of its prokaryotic ancestry: human genes of archaebacterial and eubacterial origin exhibit remarkable differences.
David Alvarez-Ponce and James O. McInerney

This paper tests whether human genes of different ancestries (bacterial versus archaeal) have different effects on phenotype, essentiality of the gene (as judged by lethality in mice), function, selective constraint, expression and position in protein-protein interaction network (PIN). Proteins were classified as bacteria- or archaea-like based on best hit scores in Blast searches.

They found that human genes of archaeal ancestry, although fewer in number, tend to be have higher and broader expression levels, are more likely to be essential, are involved in core information processes, are under greater selection, and tend to be central in the PIN, as compared with bacteria-like genes.

I don't think they mention whether the archaea-like genes they identified have (more distant) homologues in bacteria too... if they do, then we're likely looking at the characteristics of universal, usually essential, core information processing genes. Whether archaeal-like genes that have been lost in bacteria are just as central in eukaryotes as universal genes, it isn't clear.

It's also not clear just how many of the bacteria-like genes are endosymbiotic in origin. 7,884 human genes were found to be bacteria-like, but the human mitochondrion is predicted to contain only 1000-1500 proteins. Of the remainder, while some are likely to be endosymbiotic in origin, but have acquired non-mitochondrial functions,  an unknown proportion may actually be of archaeal ancestry, but have been lost in archaea, and so are actually nothing to do with mitchondria. As these proteins are not universally essential, it follows that they would have a less central role in the cell... maybe the two gene populations that are considered in this paper are more like essential for life versus non-essential for life.

Anyway, it's a very interesting paper, particularly the finding that archaeal-like genes are less likely to be involved in inherited diseases. It's also surprising just how many genes did not have an identifiable homologue in either bacteria or archaea (58%).

The second paper, published in MBE addresses the evolutionary history of mitochondrial genes from a broad distribution of eukaryotes:

Rooting the eukaryotic tree with mitochondrial and bacterial proteins
Romain Derelle and B. Franz Lang

The idea here is that the endosymbiosis event happened more recently than the divergence of eukaryotes from archaea, and this can be exploited for rooting the eukaryotic tree of life with a less divergent outgroup. Usually eukaryotic phylogenies are made using archaea-like information processing genes, rooted with archaea. However, there is a problem of long branch attraction to the very distant outgroup. This is the phenomenon in molecular phylogenetics where fast evolving, and therefore long branched sequences that should be nested within the tree are pulled down to the base of the tree because of spurious similarities to the outgroup. Using mitochondrial genes to make trees rooted with bacteria theoretically reduces the distance to the outgroup and, therefore, the problem of LBA.

The idea is very neat and I like it in principle. There are a couple of issues though that I think might not help the LBA problem, and in fact might exacerbate the problem.

1. We don't know just how much more recently the mitochondrion was acquired after the divergence of eukaryotes from archaea. Some people might argue that this was the event was involved in the separation of the two lineages.
2. Mitochondrial genes have a faster rate of evolution than their cytoplasmic counterparts. 

Still, its interesting to see the results of rooting the eukaryotic tree in this way. The paper doesn't use best hits as in the above paper, but specifically targets known mitochondrial and mitochondrially targeted genes, such as cytochromes and two of the three universal mitochondrial translational GTPases, mIF2 and mEF-Tu. The third, mEF-G was likely excluded because it does not group with alpha-proteobacteria. Although... come to think of it, I don't see mEF-Tu or mIF2 grouping clearly with alphas in my trees... maybe EF-G was excluded because of its duplication early in eukaryotic evolution... though, mEF-Tu has also been duplicated in its history, and actually mEF-G1 is quite a conservative marker... anyway, this paper isn't about trGTPases specifically so I shouldn't drift off topic.

So, the root. They find the root between monophyletic unikonts (opisthokonts and amoebozoa) and bikonts (other eukaryotes), supporting one of the most popular hypotheses. There seems to good statistical support for this topology using the Bayesian inference method, however, maximum likelihood support is only achieved with much filtering of the dataset. It's an interesting new take on rooting the eukaryotic tree, but not one that will convince everyone.

As is so often the conclusion, we're just going to need more eukaryotic protist genomes!
Refs  

Alvarez-Ponce D, & McInerney JO (2011). The human genome retains relics of its prokaryotic ancestry: human genes of archaebacterial and eubacterial origin exhibit remarkable differences. Genome biology and evolution, 3, 782-90 PMID: 21795752 Derelle R, & Lang BF (2011). Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Molecular biology and evolution PMID: 22135192

Coevolution, from hummingbirds to proteins

Coevolution (two or more biological objects evolving together) is a common feature of the evolutionary process on all levels from the molecular to the organismal. One of the most beautiful examples is that of hummingbirds and ornithophilous flowers. Hummingbirds feed on the nectar from the flowers, pollinating them in the process. In this mutually beneficial relationship, the plants have evolved flowers that attract the birds with colours that are conspicuous to the bird, and are shaped to perfectly accommodate the bird's beak. This coevolution has happened in a number of hummingbird/plant pairs.

Pic from Wikipedia article on humming birds

For more information on hummingbird/plant coevolution, I direct you to the publications of Ethan Temeles. As usual though, this post will be about proteins, and not whole organisms... and it will include my own crude drawings as usual...

Fig 1. Ta-da! Hummingbird/plant coevolution is a nice analogy for protein receptor/ligand coevolution. Circles show residues directly involved in the interaction.
At the molecular level, an example of coevolution is in the establishment of receptor-ligand interactions  (Fig 1). The receptor protein binding site has evolved in concert with the binding site of the ligand. In Fig 1, variation of the yelow residues in the receptor is correlated with that of the green residues in the ligand. The yellow sites are close together in the structure, but not necessarily neighboring in the sequence. For example, the amino acid sequence backbone of these imaginary proteins might be arranged like this:
Fig 2. Black lines show the amino acid sequence of the protein, within its structural density.


Thus, if the structure of the binding interface is known, it's possible to predict candidate coevolving sites. However from the sequence alone, it's not so straightforward. 

As discussed in a recent paper of Gloor et al in MBE (and references within), there are two explanations for how covarying positions come to be (and these are actually the extremes of the distribution of possible mutational effects):

1. Suppressor mutations. These arise when a mutation with a deleterious phenotype is suppressed by another mutation at a different position.
2. Covarions. These are cases when both the original residue and the mutated residue are functionally compatible, but mutation alters the spectrum of amino acids possible at another location.

Covarying sites may occur in the same protein or in different proteins (Figs 3-4).

Fig 3. Stars show between-protein correlated mutations at two interaction sites

In between-protein coevolution, green sites coevolve with yellow sites in our example. But there is also within-protein coevolution among yellow site residues and among green site residues. Imagine for instance a change of green residue that multiple yellow resides interact with at different times (Fig 4). Or perhaps the middle yellow starred residue in Fig 4 mutating and causing different constraints in what residues the neighboring yellow sites can mutate to. Either way, the three yellow sites will covary.  Remember that those sites are far away from each other in the sequence. So by showing that these sites co-vary, we can predict that they are functionally related, even if we don't have a structure

Fig 4. Correlated mutations can also occur within one protein

Prediction of co-evolving sites can be useful for understanding cases when binding site residues are unconserved in a multiple sequence alignment. It can also be useful for predicting intermolecular interaction sites, and allosteric sites (for example Chen et al., 2006). An allosteric site can remotely affect the evolutionary pressures on a distant site by affecting the structural orientation of the protein (Fig 5).


Fig 5. Correlated mutations among binding site residues and an allosteric site.


Prediction of covarying sites is challenging, not only because they may not always be clustered together in sequence and structure, but because covariation is a combined result of structural and functional constraints and background noise from shared phylogenetic ancestry and random processes.

There are two classes of methods for predicting covarying sites: tree-aware and tree-unaware. Tree aware methods search for sites whose covariation can not be explained by phylogenetic relationships, while tree-unaware methods ignore phylogenetic relationships, instead searching for covarying sites with the strongest signal.  The two classes of methods are discussed in Caporase et al (2008), in which it is concluded that tree-unaware methods perform as well as tree-unaware.

Using a tree-unaware method, Gloor et al. (2010) examine covariation in phosphoglyerate kinase evolution. They identify nonconserved sites that covary, and through mutagenesis show that the sites are important for function and epistatic to each other (mutation in one affects the function of the other). They find that covarying positions are just as as diverse within and between clades as are noncovarying positions, and suggest that most covarying positions arise from processes more like the covarion model, than the suppression mutation model.

The importance of covariation in sequence evolution is of interest to people like myself who use patterns of sequence variation to predict protein function. In studying molecular evolution of function, we largely rely on the assumption that the most functionally important positions are those that are conserved over time. Although this is generally the case, it seems that some important sites that are able to covary may slip through the net.
 
Recently, I've been experimenting with the tree-unaware code of Dunn et al., (2008) to find covarying sites Preliminary results, based on the RelA family are... confusing. Residues that would be predicted to be interacting from the structure are not flagged up as covarying, while there are many pairs of predicted covaring sites that are physically distant and don't seem likely to be allostric sites from the structure. It seems like as with many real-life case studies, real biology is a little bit more complicated than naive sketches like mine would have you believe! Oh well, time to delve a little deeper into the data set...

References and further reading:

Caporaso, J., Smit, S., Easton, B., Hunter, L., Huttley, G., & Knight, R. (2008). Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics BMC Evolutionary Biology, 8 (1) DOI: 10.1186/1471-2148-8-327

Codoñer FM, & Fares MA (2008). Why should we care about molecular coevolution? Evolutionary bioinformatics online, 4, 29-38 PMID: 19204805

Chen, Y. (2006). Evolutionarily Conserved Allosteric Network in the Cys Loop Family of Ligand-gated Ion Channels Revealed by Statistical Covariance Analyses Journal of Biological Chemistry, 281 (26), 18184-18192 DOI: 10.1074/jbc.M600349200

Dunn, S., Wahl, L., & Gloor, G. (2007). Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction Bioinformatics, 24 (3), 333-340 DOI: 10.1093/bioinformatics/btm604

Gloor, G., Tyagi, G., Abrassart, D., Kingston, A., Fernandes, A., Dunn, S., & Brandl, C. (2010). Functionally Compensating Coevolving Positions Are Neither Homoplasic Nor Conserved in Clades Molecular Biology and Evolution, 27 (5), 1181-1191 DOI: 10.1093/molbev/msq004

PVC bacteria and the prokaryote to eukaryote transition... maybe not.

It was an interesting hypothesis, but it seems the evidence for an origin of eukaryotes in the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum, as proposed by Devos and Reynaud in a Science article doesn't hold up to scrutiny.

In a recent paper by James McInerney et al. in Bioessays, the authors address each of the claimed eukaryote-like features and show that they are all likely to be either analogous (the result of parallel evolution, not shared ancestry), or are the result of horizontal gene transfer (HGT) events. In the words of the authors:

PVC are no more intermediates in the prokaryote-to-eukaryote transition than dragonflies are intermediates in the evolutionary sequence linking bony fish and birds.

The Bioessays paper is an important reminder that for any grand hypotheses about evolution, distinguishing between homologous and analogous characters is critical, as is establishing the direction of inheritance. And by far the best way to address these points is by taking advantage of the mass of genomic data available.

Molecular biology in the light of comparative genomics.

The week before last, I was at a conference at EMBL Heidelberg on Protein Synthesis and Translational control. I found it to be a stimulating, very enjoyable meeting, despite being rather disappointed by it being dominated by eukaryotic mechanisms, with only a few talks devoted to bacterial translation (and from what I remember none on organellar or archaeal translation). Translation is a universal, ancient process, and analyses of its highly conserved components have taught us much of what we know about the evolution and diversity of life on Earth. Despite that, out of ~300 (I would guess) participants, I was the only person presenting evolution stories. I wasn't selected for a talk, but I presented two posters, one on the RelA/SpoT family of ribosome-associated starvation response enzymes and one on gain and loss of mitochondrial translational initiation factors. I'm also pretty sure I was the only blogger and tweeter there too... hmm is that connected? Anyway, that's not the point here. My point is that a little evolution goes a long way in terms of predicting functions and universality of mechanisms, and it seems to me that molecular biologists aren't taking full advantage of that. There were a few talks mentioning new factors. My immediate thoughts were what are they related to? What do sequence comparisons suggest about them? Are they actually in any other organisms than yeast or humans? Model organisms are unfortunately not always representative, and we need evolutionary analyses to put results in perspective.

It's not just at this protein synthesis conference that I was the one and only molecular evolutionary biologist. Except for studies on ribosome origin, which are often low on evidence, high on speculation, and conducted by non-experts in molecular evolution (see my previous blog post), it's the same in the last few conferences I've been to. The protein synthesis field is a very active, exiting field with great people who are not afraid of bioinformatics and collaborating with bioinformaticians. Just, usually not molecular evolutionary biologists. That doesn't mean that people can't be enlightened though. After the first day, a co-attendee chatted with me about my research. He'd come across my paper of EF-G duplication and functional evolution and was excited by it. He said "I thought evolution was really boring, but actually it's interesting and very useful." He went on to propose an interesting collaborative project.

It's not that there are no published comparative genomics studies done at all in the protein synthesis field, just there are very few and though some are good, more often they are done badly (for example claiming orthology seemingly without making phylogenetic trees and getting it wrong), and with the focus on distribution without site-specific or structural analyses to make functional inferences. Specific expertise is required to do it right. The kind of expertise found in evolutionary biologists. However, the latter tend to stick to the evolution field, and it seems to me that important questions don't always get answered that way. I'm not sure, but I would hazard a guess that it's the same in other molecular mechanism fields (eg transcription, cell cycle/replication). There are lots of gaps in our knowledge about core cellular processes that need filling with the help of people competent in comparative genomics.

The scarcity of other molecular evolutionary biologists within the protein synthesis field could be considered to be an advantage for me. I have my own niche afterall, and I have no shortage of work because not all experimentalists overlook the importance of evolution. However, there are many, many more interesting proteins and and questions than I have time to look at. And I don't have a lab of minions as yet.

Here's an example of how wrong conclusions can be propagated in the absence of molecular evolution analyses. It is discussed in a paper that I'm writing up at the moment. I'll blog the full story with the real proteins named in the fullness of time, but this is just for the idea.




So, in an original paper, it was claimed that an insertion in human mitochondrial protein 1 in figure A, above compensates for the function of protein 2, present and essential in E. coli, but lost in humans. The insertion is not at all homologous to protein 2. Nevertheless, if E. coli protein 1 is modified to include the insertion usually only found in humans, protein 2, which is usually essential, becomes dispensable. So the authors of the original study claimed that the insertion evolved in eukaryotes to replace the function of protein 2. This is all very interesting, but I immediately had a question. Since protein 2 is universally absent in eukaryotes (this is already known), is the insertion universally present in eukaryotes? I think this is a very important question, which was not addressed in the original paper. Nevertheless, there has been a whole slew of other papers propagating the conclusion that the insertion evolved to take the place of protein 2. 

I was not content with an evolutionary conclusion drawn on the comparison of three organisms (the third was archaea, also missing the insertion), so I decided to answer this question myself. The result was oops. The insertion is actually only in vertebrates (see figure B, above).  So for millions of years, eukaryotes had been (and many still are) doing fine without either the insertion or protein 2. Maybe the insertion does replace the function of protein 2 in vertebrates, but there must be some other, unknown, possibly more general mechanism(s) compensating. And this is an interesting avenue worth pursuing.

That's one angle to the story, and it's a bit negative, true, but there's also a very positive angle to this particular paper that I'm writing. There is a protein 3, also universal in bacteria, and almost universal in mitochondria. It had never previously been found in yeast, however with some sensitive sequence searching (PSI-blast) I found a homologue, which I confirmed as the orthologue with phylogenetic analysis. This yeast protein 3 is very divergent though, with insertions and deletions relative to the well known ones, so whether it is the functional equivalent of protein 3 was still unknown. But that's when it becomes wonderful to be an evolutionary biologist among experimentalists, because our collaborators have been able to confirm the function of my newly identified protein in vivo.  So we have identified a new protein in yeast, essential for mitochondrial function, and a really nice evolutionary story to go along with it.

In conclusion, it's said so many times that it's becoming a nasty cliche, but Dobzansky's quote "nothing in biology makes sense except in the light of evolution" is spot on.  I would encourage experimentalists to consider whether molecular evolution can help answer some of your questions, or even raise some exciting new ones. I would also urge molecular evolutionary biologists to consider collaborating with experimentalists whenever you can, because it's so exciting and satisfying to get your predictions tested.

OK, now to finish this paper about proteins 1, 2 and 3, which I will blog about when the paper is published and available!



Molecular evolution of RSH proteins, lookouts and messengers of stress signals

A few days ago (the day before my 30th birthday actually), my most recent paper, along with Vasili Hauryliuk and Tanel Tenson, "The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life" was published with PloS ONE. Hurrah!

The RSH proteins comprise a superfamily of enzymes that synthesize and/or hydrolyze the alarmone ppGpp. ppGpp is a nucleotide that acts as an alarm signal, activating the “stringent” response in bacteria during starvation conditions and regulating various other aspects of cellular metabolism, often in response to stress. Vasya's blog has a wealth of information about the stringent response and the molecules involved.

Rel, RelA and SpoT are the classical, most well known “long” RSHs. The carry the ppGpp hydrolase, synthetase, TGS and ACT domain architecture. They have been found across diverse bacteria and plant chloroplasts. Additionally, dedicated single domain ppGpp-synthesizing and -hydrolyzing RSHs have also been discovered in disparate bacteria and animals respectively. However, until now there has been considerable confusion in terms of nomenclature, and no comprehensive phylogenetic and sequence analyses have previously been carried out to classify RSHs on a genomic scale.

To remedy the situation, I carried out high-throughput sensitive sequence searching of over 1000 genomes from across the tree of life, in conjunction with phylogenetic analyses, to identify and classify diverse RSHs in different organisms and unify the terminology for the field. We classify RSHs into 30 subgroups comprising three groups: long RSHs, small alarmone synthetases (SASs), and small alarmone hydrolases (SAHs). That's 19 more subgroups than were previously known. Those previously unidentified RSH subgroups, which are mostly found in bacteria, but sometimes in archaea and eukaryotes, can now be studied experimentally.


What I think is possibly the most interesting result came from comparative sequence analysis of long and small RSHs. I found exposed sites limited in conservation to the long RSHs that seem to be involved in transmitting regulatory signals. These signals may be transmitted via inter-domain interactions, or inter-molecular interactions either among individual RSH molecules or among long RSHs and other binding partners such as the ribosome. These sites in RelA can now be directly targeted with mutagenesis in order to text these predictions.


I have to say I'm disappointed with how the figures look in the PDF version of the paper. Lines are really not as crisp as my uploaded figures. Unfortunately the tables also don't look how they're supposed to due to them having being automatically formatted for the PLoS format. I wasn't given the opportunity to check them in a proofing stage either. Oh well, I'm just happy this story is now out there!

Gemma C. Atkinson, Tanel Tenson, & Vasili Hauryliuk (2011). The RelA/SpoT Homolog (RSH) Superfamily: Distribution and Functional Evolution of ppGpp Synthetases and Hydrolases across the Tree of Life PLoS ONE, 6 (8)