Where Proteins Go To Work
Predicting protein localization
Joe works in a factory; Jane works in a hospital; protein X works in the Golgi apparatus. Just as one might guess a worker’s job by knowing where he or she is employed, biologists can guess a protein’s function by knowing where it does its job—whether in or near the cell membrane, the endoplasmic reticulum or the Golgi apparatus—some of the important job sites inside a cell.
Determining thousands of proteins’ correct cellular addresses is a daunting task. But a new yeast model takes a pretty good stab at predicting which proteins will wind up in 18 possible destinations inside this single-celled organism. The model is described in the November 2005 issue of PLoS Computational Biology.
“The trafficking and localization of proteins are very fundamental questions in biology,” says Michael Hallett, PhD, a professor at the McGill Centre for Bioinformatics at McGill University in Montreal. But the places where 30 to 50 percent of all cellular proteins settle down to do their tasks are unknown. To get a better handle on this question, Hallett and his colleagues and graduate students at McGill created the Protein Subcellular Localization Tool 2, or PSLT2.
PSLT2 is composed of three modules that predict where a protein will go. The motif module makes predictions based on the presence of particular sequences of amino acids that suggest a protein’s function—a good indication of where it belongs in the cell. The targeting module relies on sequences that act like a known zip code, indicating where the protein should end up—such as mitochondrial targeting peptides and transmembrane domains. And the interaction module concerns itself with the protein’s likely comrades—the other proteins it associates with when doing its task. If protein A always interacts with protein B, and B has a known location in the cell, then A must be active in that vicinity as well.
Each module can individually predict the localization of a protein using Bayesian methods. The combination of the three modules improves the prediction when proteins lack motif and interaction data or traffic through multiple compartments.
For the entire yeast genome, the new tool predicts in which of nine compartments a protein is located with at least 72 percent accuracy. These compartments are mostly organelles but also include the cytosol and cell membrane. PSLT2 also predicts proteins’ sub-compartmentalization— whether they are inside the compart-ment, in its membrane, or associated with its surface. The model places the proteins into 18 sub-compartments correctly 83 percent of the time. The ability to determine sub-compartments is new to this model. “When we use classical techniques for finding the localization of a protein [in, for example, the endoplasmic reticulum (ER)], we can’t use them to tell if a protein is in the ER membrane, in the cytosol, or on the periphery,” Hallettsays. “We need a computational method to pin down where the protein is in the organelle.”
The computational model’s predictions compared well with databases from two high-throughput laboratory experiments, but they didn’t always agree; Hallett and colleagues suggest that the model and two databases should be used in parallel as checks on each other.
According to Mark Gerstein, PhD, an associate professor of biomedical informatics at Yale, the paper goes beyond what has been done before. “In particular,” he says, “people have done a lot of analysis using protein subcellular localization to predict protein-protein interactions. This work turns that around to good effect.”