Boosting protein identification in shotgun proteomics experiments

Supplementary Data

Smriti R. Ramakrishnan*[1], Christine Vogel*[2], John T. Prince[2], Rong Wang [2], Zhihua Li[2], Luiz Penalva[3], Margaret Myers[1], Edward M. Marcotte[2], Daniel P. Miranker[1]
* - equally contributing authors
[1] Department of Computer Sciences, 1 University Station C0500, The University of Texas at Austin, Austin, TX 78712
[2] Center for Systems and Synthetic Biology, Department of Chemistry and Biochemistry & Institute for Cellular and Molecular Biology, 2500 Speedway, The University of Texas at Austin, Austin, TX 78712
[3] Children's Cancer Research Institute; The University of Texas Health Science Center at San Antonio; San Antonio, TX 78229
Contact: DPM (miranker at cs utexas edu) or EMM (marcotte at icmb utexas edu)

Abstract. Motivation: Tandem mass spectrometry (MS/MS) offers fast and reliable characterization of complex protein mixtures, but suffers from low sensitivity in protein identification. In a typical shotgun-proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other information avail-able, e.g. the probability of a protein's presence is likely to correlate with its mRNA concentration. Results: We develop a Bayesian score that estimates the posterior probability of a protein's presence in the sample given its identifica-tion in an MS/MS experiment and its mRNA concentration measured under similar experimental conditions. Our method, MSpresso, sub-stantially increases the number of proteins identified in an MS/MS experiment at the same error rate, e.g. in yeast, MSpresso increases the number of proteins identified by ~40%. We apply MSpresso to data from different MS/MS instruments, experimental conditions, and organisms (E.coli, human), and predict 19 to 63% more proteins across the different datasets. MSpresso demonstrates that incorpo-rating prior knowledge of protein presence into shotgun-proteomics experiments can substantially improve protein identification scores.

This website provides supplementary postprocessed data used in MSpresso analyses.

Supplementary Notes (preliminary file) (PDF)

Software Page


Main paper, table 1: different experiments using self models for P(K|M), and cytosolic proteins only.

Description of directory content and file formats.

Gold standard of protein expression in yeast. We used the intersection of MS-based and non-MS-based experimental datasets that are publically available as reference set for the presence (expression) of proteins in wild-type yeast, growing in rich medium, log phase. The file names list first (and last) author, journal and publication year.

Raw MS data is available at the MS Data Repository.

The yeast, YMD, LCQ data is provided at OPD.

Please contact Smriti (smriti at cs utexas edu) for questions about software. Please contact Smriti (smriti at cs utexas edu) or Christine (cvogel at mail utexas edu) for further information on data, calculations, or results.


Integrating Shotgun Proteomics and mRNA expression data to Improve Protein Identification, Smriti R. Ramakrishnan*, Christine Vogel*, John T. Prince, Rong Wang, Zhihua Li, Luiz O. Penalva, Margaret Myers, Edward M. Marcotte, and Daniel P. Miranker,

Other Links

C. Vogel, cvogel at mail utexas edu
May 2009