MSblender is a statistical tool for merging database search results from multiple database search engines for peptide identification based on a multivariate modelling approach. We will present this work at RECOMB-CP 2011 in March, 2011.
(We tested our codes at Mac OSX (10.5 Leopard) and Ubuntu Linux (10.04 and later). We don't support MS Windows platform yet.) To run MSblender, you should install the following programs/packages on the machine.
- python (2.5 or later)
- gcc (we used version 4.4.3, but we believe that our ANSI-C based codes are not dependent on specific version of gcc).
- GNU Scientific Library (version 1.13 or later)
- If you use ubuntu (or debian) linux, install 'gsl-bin' and 'libgsl0-*' packages.
- (Optional) matplotlib (python graph library). Only required for 'pre/plot-his_list.py' script.
- Download source code from GitHub. Alternatively, you can download it from http://www.marcottelab.org/users/MSblender/src/MSblender-current.tgz .
- Enter to 'c/' directory, and execute './compile' script. You should have GNU Scientific Library before running this script. It will generate 'msblender' and 'msblender.h.gch' files at the same directory.
- That's it. Now you are ready to run MSblender.
How to use
MSblender is working in three steps: pre-processing, modelling and post-processing.
First MSblender converts various search engine results into a unified tab-delimited text file called 'hit_list' format. Then it transfers 'hit_list' to MSblender modelling program input file.
Currently, MSblender supports the following search engine results (and scores).
- SEQUEST, Xcorr (If you get SRF file from Thermo BioWorks, you can convert it to pepxml directly using Mspire, developed by John T. Prince).
- X!Tandem, k-score (a.k.a COMET search engine) based -log(E-value)
- OMSSA, -log(E-value)
- InsPecT, MQscore
- MyriMatch, mvh
- MSGFDB, -log(SpecProb)
For example, you can convert X!Tandem pepxml file to logE_hit_score as below:
$ ../src/MSblender-20110130/pre/sequest_pepxml-to-xcorr_hit_list.py test.sequest.pepxml Write test.sequest.xcorr_hit_list ... $
The hit_list file looks like as below:
# pepxml: test.sequest.pepxml #Spectrum_id Charge PrecursorMz MassDiff Peptide Protein MissedCleavages Score(Xcorr) MSups_5ul.04192.04194.2 2 577.843127 0.006395 MLVVLLQANR ANXA5_HUMAN_UPS|P08758|5000|0.5|319 0 0.524123 MSups_5ul.07228.07228.4 4 689.596178 0.002584 SLLSNVEGDNAVPMQHNNRPTQPLK CAH1_HUMAN_UPS|P00915|5000|50000|260 1 2.518871 MSups_5ul.11647.11647.2 2 592.839464 -0.000197 ADGLAVIGVLMK CAH1_HUMAN_UPS|P00915|5000|50000|260 0 2.787324 MSups_5ul.05651.05651.3 3 549.303576 -0.003018 VWPHKDYPLIPVGK CATA_HUMAN_UPS|P04040|5000|5000|526 1 2.593570 ....
- T. Kwon*, H. Choi*, C. Vogel, A.I. Nesvizhskii, and E.M. Marcotte, MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. Submitted.
- https://github.com/MarcotteLabGit/MSblender (GitHub source repository)