Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Roles of Solvent Accessibility and Gene Expression in Modeling Protein Sequence Evolution
oleh: Kuangyu Wang, Shuhui Yu, Xiang Ji, Clemens Lakner, Alexander Griffing, Jeffrey L. Thorne
Format: | Article |
---|---|
Diterbitkan: | SAGE Publishing 2015-01-01 |
Deskripsi
Models of protein evolution tend to ignore functional constraints, although structural constraints are sometimes incorporated. Here we propose a probabilistic framework for codon substitution that evaluates joint effects of relative solvent accessibility (RSA), a structural constraint; and gene expression, a functional constraint. First, we explore the relationship between RSA and codon usage at the genomic scale as well as at the individual gene scale. Motivated by these results, we construct our framework by determining how probable is an amino acid, given RSA and gene expression, and then evaluating the relative probability of observing a codon compared to other synonymous codons. We come to the biologically plausible conclusion that both RSA and gene expression are related to amino acid frequencies, but, among synonymous codons, the relative probability of a particular codon is more closely related to gene expression than RSA. To illustrate the potential applications of our framework, we propose a new codon substitution model. Using this model, we obtain estimates of 27 N s , the product of effective population size N , and relative fitness difference of allele s. For a training data set consisting of human proteins with known structures and expression data, 2 N s is estimated separately for synonymous and nonsynonymous substitutions in each protein. We then contrast the patterns of synonymous and nonsynonymous 2 N s estimates across proteins while also taking gene expression levels of the proteins into account. We conclude that our 2 N s estimates are too concentrated around 0, and we discuss potential explanations for this lack of variability.