Difference between revisions of "MSblender"

From Marcotte Lab
Jump to: navigation, search
(Pre-processing)
(How to use)
Line 44: Line 44:
 
MSups_5ul.06405.06405.2 2 524.279350 0.003000 DLFNAIATGK CATA_HUMAN_UPS|P04040|5000|5000|526 0 0.327902
 
MSups_5ul.06405.06405.2 2 524.279350 0.003000 DLFNAIATGK CATA_HUMAN_UPS|P04040|5000|5000|526 0 0.327902
 
....</pre>
 
....</pre>
 +
 +
Some search engines report multiple PSMs from a single spectrum (mainly because of different charge state estimation). For example, in default setting, MyriMatch reports all best hits for both +2 and +3 charge states, so the total number of PSMs is almost two times more than other search engine results. To remove this imbalance, you can choose 'the best' PSM per each spectrum based on the score you defined. And 'select-best-PSM.py' is the script for that.
 +
 +
<pre>$ wc test.myrimatch.mvh_hit_list
 +
  10888  87099 1168772 test.myrimatch.mvh_hit_list
 +
$ ../src/MSblender-20110130/pre/select-best-PSM.py test.myrimatch.mvh_hit_list
 +
$ wc test.myrimatch.mvh_hit_list_best
 +
  5516  44123 598964 test.myrimatch.mvh_hit_list_best</pre>
 +
 +
Then, you can compile multiple 'hit_list' files into msblender input file. You need to have a text conf file as below:
 +
<pre>InsPect        test.inspect.MQscore_hit_list_best
 +
MyriMatch      test.myrimatch.mvh_hit_list_best
 +
SEQUEST        test.sequest.xcorr_hit_list_best
 +
X!Tandem        test.tandem_k.logE_hit_list_best</pre>
 +
 +
Then, run 'make-msblender_in.py' script.
 +
 +
$ ../src/MSblender-20110130/pre/make-msblender_in.py msblender.conf > test.msblender_in
 +
 +
Output looks like this:
 +
<pre>sp_pep_id decoy InsPect_score MyriMatch_score SEQUEST_score X!Tandem_score
 +
MSups_5ul.00439.00439.3.ASLSNTPSIGQ 0 0.031000  NA  NA  NA
 +
MSups_5ul.00439.00439.3.LDELRDEGK 0 NA  18.090108 0.914975  -0.832509
 +
MSups_5ul.00444.00444.1.GQFVK 1 NA  2.598828  NA  NA
 +
MSups_5ul.00446.00446.3.LDELRDEGK 0 NA  13.341218 0.930569  -0.579784
 +
MSups_5ul.00461.00461.3.ADDKETCFAEEGKK  0 NA  16.846330 1.834260  -0.770852
 +
...</pre>
 +
 +
=== Multivariate Modeling ===
 +
Feed 'msbledner_in' file to 'msblender' executive file under 'c/' directory as below:
 +
<pre>$ ~/git/MSblender/c/msblender test.msblender_in 100
 +
1 4469.537280 0.2852
 +
2 67673.492372 0.4619
 +
3 83020.543621 0.5275
 +
4 82494.877698 0.5496
 +
5 82243.485441 0.5601
 +
6 81891.150707 0.5654
 +
7 81745.917044 0.5676
 +
8 81717.914272 0.5684
 +
9 81732.128261 0.5686
 +
10 81756.373959 0.5686
 +
$</pre>
 +
 +
This program will be terminated when it is converged. If the number of iteration reaches to your initial setting (here is 100), try to run the script again with bigger number.
 +
 +
Now you can see the output file named 'test.msblender_in_msblender' in the same directory. The file looks like this:
 +
<pre>Spectrum  Decoy InsPect_score MyriMatch_score SEQUEST_score X!Tandem_score  mvScore
 +
MSups_5ul.00439.00439.3.ASLSNTPSIGQ F 0.03        0.006
 +
MSups_5ul.00439.00439.3.LDELRDEGK F  18.09 0.91  -0.83 1.000
 +
MSups_5ul.00444.00444.1.GQFVK D  2.60      0.085
 +
MSups_5ul.00446.00446.3.LDELRDEGK F  13.34 0.93  -0.58 1.000
 +
MSups_5ul.00461.00461.3.ADDKETCFAEEGKK  F  16.85 1.83  -0.77 1.000
 +
MSups_5ul.00590.00590.2.AAFTECCQAADK  F 4.80  34.62 3.17  1.39  1.000
 +
...</pre>
  
 
== Citation ==
 
== Citation ==

Revision as of 13:37, 30 January 2011

MSblender is a statistical tool for merging database search results from multiple database search engines for peptide identification based on a multivariate modelling approach. We will present this work at RECOMB-CP 2011 in March, 2011.

Contents

Authors

Prerequisites

(We tested our codes at Mac OSX (10.5 Leopard) and Ubuntu Linux (10.04 and later). We don't support MS Windows platform yet.) To run MSblender, you should install the following programs/packages on the machine.

  • python (2.5 or later)
  • gcc (we used version 4.4.3, but we believe that our ANSI-C based codes are not dependent on specific version of gcc).
  • GNU Scientific Library (version 1.13 or later)
    • If you use ubuntu (or debian) linux, install 'gsl-bin' and 'libgsl0-*' packages.
  • (Optional) matplotlib (python graph library). Only required for 'pre/plot-his_list.py' script.

Installation

  • Download source code from GitHub. Alternatively, you can download it from http://www.marcottelab.org/users/MSblender/src/MSblender-current.tgz .
  • Enter to 'c/' directory, and execute './compile' script. You should have GNU Scientific Library before running this script. It will generate 'msblender' and 'msblender.h.gch' files at the same directory.
  • That's it. Now you are ready to run MSblender.

How to use

MSblender is working in three steps: pre-processing, modelling and post-processing.

Pre-processing

First MSblender converts various search engine results into a unified tab-delimited text file called 'hit_list' format. Then it transfers 'hit_list' to MSblender modelling program input file. You can see 'test' dataset and their output at http://www.marcottelab.org/users/MSblender/test/.

Currently, MSblender supports the following search engine results (and scores).

For example, you can convert X!Tandem pepxml file to logE_hit_score as below:

$ ../src/MSblender-20110130/pre/tandem_pepxml-to-logE_hit_list.py test.tandem_k.pepxml 
Write test.tandem_k.logE_hit_list ... 

The hit_list file generated by this looks like as below:

# pepxml: test.tandem_k.pepxml
#Spectrum_id	Charge	PrecursorMz	MassDiff	Peptide	Protein	MissedCleavages	Score(-log10[E-value])
MSups_5ul.07228.07228.4	4	689.596425	0.004000	SLLSNVEGDNAVPMQHNNRPTQPLK	CAH1_HUMAN_UPS|P00915|5000|50000|260	0	1.795880
MSups_5ul.11647.11647.2	2	592.839650	0.000000	ADGLAVIGVLMK	CAH1_HUMAN_UPS|P00915|5000|50000|260	0	1.148742
MSups_5ul.06405.06405.2	2	524.279350	0.003000	DLFNAIATGK	CATA_HUMAN_UPS|P04040|5000|5000|526	0	0.327902
....

Some search engines report multiple PSMs from a single spectrum (mainly because of different charge state estimation). For example, in default setting, MyriMatch reports all best hits for both +2 and +3 charge states, so the total number of PSMs is almost two times more than other search engine results. To remove this imbalance, you can choose 'the best' PSM per each spectrum based on the score you defined. And 'select-best-PSM.py' is the script for that.

$ wc test.myrimatch.mvh_hit_list
  10888   87099 1168772 test.myrimatch.mvh_hit_list
$ ../src/MSblender-20110130/pre/select-best-PSM.py test.myrimatch.mvh_hit_list
$ wc test.myrimatch.mvh_hit_list_best 
  5516  44123 598964 test.myrimatch.mvh_hit_list_best

Then, you can compile multiple 'hit_list' files into msblender input file. You need to have a text conf file as below:

InsPect         test.inspect.MQscore_hit_list_best
MyriMatch       test.myrimatch.mvh_hit_list_best
SEQUEST         test.sequest.xcorr_hit_list_best
X!Tandem        test.tandem_k.logE_hit_list_best

Then, run 'make-msblender_in.py' script.

$ ../src/MSblender-20110130/pre/make-msblender_in.py msblender.conf > test.msblender_in

Output looks like this:

sp_pep_id decoy InsPect_score MyriMatch_score SEQUEST_score X!Tandem_score
MSups_5ul.00439.00439.3.ASLSNTPSIGQ 0 0.031000  NA  NA  NA
MSups_5ul.00439.00439.3.LDELRDEGK 0 NA  18.090108 0.914975  -0.832509
MSups_5ul.00444.00444.1.GQFVK 1 NA  2.598828  NA  NA
MSups_5ul.00446.00446.3.LDELRDEGK 0 NA  13.341218 0.930569  -0.579784
MSups_5ul.00461.00461.3.ADDKETCFAEEGKK  0 NA  16.846330 1.834260  -0.770852
...

Multivariate Modeling

Feed 'msbledner_in' file to 'msblender' executive file under 'c/' directory as below:

$ ~/git/MSblender/c/msblender test.msblender_in 100
1	4469.537280	0.2852
2	67673.492372	0.4619
3	83020.543621	0.5275
4	82494.877698	0.5496
5	82243.485441	0.5601
6	81891.150707	0.5654
7	81745.917044	0.5676
8	81717.914272	0.5684
9	81732.128261	0.5686
10	81756.373959	0.5686
$

This program will be terminated when it is converged. If the number of iteration reaches to your initial setting (here is 100), try to run the script again with bigger number.

Now you can see the output file named 'test.msblender_in_msblender' in the same directory. The file looks like this:

Spectrum  Decoy InsPect_score MyriMatch_score SEQUEST_score X!Tandem_score  mvScore
MSups_5ul.00439.00439.3.ASLSNTPSIGQ F 0.03        0.006
MSups_5ul.00439.00439.3.LDELRDEGK F   18.09 0.91  -0.83 1.000
MSups_5ul.00444.00444.1.GQFVK D   2.60      0.085
MSups_5ul.00446.00446.3.LDELRDEGK F   13.34 0.93  -0.58 1.000
MSups_5ul.00461.00461.3.ADDKETCFAEEGKK  F   16.85 1.83  -0.77 1.000
MSups_5ul.00590.00590.2.AAFTECCQAADK  F 4.80  34.62 3.17  1.39  1.000
...

Citation

  • T. Kwon*, H. Choi*, C. Vogel, A.I. Nesvizhskii, and E.M. Marcotte, MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines. Submitted.

See also