MSblender is a statistical tool for merging database search results from multiple database search engines for peptide identification based on a multivariate modelling approach. We will present this work at RECOMB-CP 2011 in March, 2011.
(We tested our codes at Mac OSX (10.5 Leopard) and Ubuntu Linux (10.04 and later). We don't support MS Windows platform yet.) To run MSblender, you should install the following programs/packages on the machine.
- python (2.5 or later)
- gcc (we used version 4.4.3, but we believe that our ANSI-C based codes are not dependent on specific version of gcc).
- GNU Scientific Library (version 1.13 or later)
- If you use ubuntu (or debian) linux, install 'gsl-bin' and 'libgsl0-*' packages.
- (Optional) matplotlib (python graph library). Only required for 'pre/plot-his_list.py' script.
- Download source code from GitHub. Alternatively, you can download it from http://www.marcottelab.org/users/MSblender/src/MSblender-current.tgz .
- Enter to 'c/' directory, and execute './compile' script. You should have GNU Scientific Library before running this script. It will generate 'msblender' and 'msblender.h.gch' files at the same directory.
- That's it. Now you are ready to run MSblender.
How to use
MSblender is working in three steps: pre-processing, modelling and post-processing.
First MSblender converts various search engine results into a unified tab-delimited text file called 'hit_list' format. Then it transfers 'hit_list' to MSblender modelling program input file.
Currently, MSblender supports the following search engine results (and scores).
- SEQUEST, Xcorr (If you get SRF file from Thermo BioWorks, you can convert it to pepxml directly using Mspire, developed by John T. Prince).
- X!Tandem, k-score (a.k.a COMET search engine) based -log(E-value)
- OMSSA, -log(E-value)
- InsPecT, MQscore
- MyriMatch, mvh
- MSGFDB, -log(SpecProb)
For example, you can convert X!Tandem pepxml file to logE_hit_score as below:
$ ../src/MSblender-20110130/pre/tandem_pepxml-to-logE_hit_list.py test.tandem_k.pepxml Write test.tandem_k.logE_hit_list ...
The hit_list file generated by this looks like as below:
# pepxml: test.tandem_k.pepxml #Spectrum_id Charge PrecursorMz MassDiff Peptide Protein MissedCleavages Score(-log10[E-value]) MSups_5ul.07228.07228.4 4 689.596425 0.004000 SLLSNVEGDNAVPMQHNNRPTQPLK CAH1_HUMAN_UPS|P00915|5000|50000|260 0 1.795880 MSups_5ul.11647.11647.2 2 592.839650 0.000000 ADGLAVIGVLMK CAH1_HUMAN_UPS|P00915|5000|50000|260 0 1.148742 MSups_5ul.06405.06405.2 2 524.279350 0.003000 DLFNAIATGK CATA_HUMAN_UPS|P04040|5000|5000|526 0 0.327902 ....
- T. Kwon*, H. Choi*, C. Vogel, A.I. Nesvizhskii, and E.M. Marcotte, MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. Submitted.
- https://github.com/MarcotteLabGit/MSblender (GitHub source repository)