MSblender TACC
From Marcotte Lab
Before you start
- To use this setting, your TACC account needs to be allocated to our lab project('A-cm10'). If you don't have an account, create it at https://portal.tacc.utexas.edu/. Then, ask Edward to assign your account as a member of lab project.
- This document is for 'stampede'.
- Always work at $SCRATCH directory, not at /corral or your $HOME.
Install MSblender (and comet, MSGFDB, X!Tandem)
$ cd ~ $ mkdir git $ cd git $ git clone https://github.com/marcottelab/MSblender.git
Prepare a working space
$ module load python $ cd $SCRATCH $ mkdir myProject $ cd myProject $ mkdir mzXML $ mkdir DB
Prepare database
- Copy your FASTA file to 'myProject/DB' directory.
$ python ~/git/MS-toolbox/bin/fasta-reverse.py XENLA_prot_v4.fasta $ mv XENLA_prot_v4.fasta.target XENLA_prot_v4_combined.fasta $ cat XENLA_prot_v4.fasta.reverse >> XENLA_prot_v4_combined.fasta $ head -n 1 XENLA_prot_v4.fasta >10a1.1|XB-GENE-6077477|AAH55957|33416620 $ head -n 1 XENLA_prot_v4.fasta.reverse >rv_nadkd1|XB-GENE-991229|AAI46629|148921623
DB setup for X!tandem
$~/src.MS/local/bin/fasta_pro.exe (my combined fasta file)
It makes an index file with '.pro' suffix after your FASTA filename.
$~/src.MS/local/bin/fasta_pro.exe XENLA_prot_v4_combined.fasta fasta_pro file conversion utility, v. 2006.09.15 input path = XENLA_prot_v4_combined.fasta output path = XENLA_prot_v4_combined.fasta.pro db type = plain
DB setup for Crux
$~/src.MS/local/bin/crux create-index --enzyme trypsin --missed-cleavages 2 --peptide-list T --decoys none (my combined fasta file) (my index name)
- If you want to use Crux function separately (or other embeded post-processing tool, i.e. percolator or q-ranker), you should use FASTA file with target sequence only, with certain decoy option (default option is protein-shuffle, but peptide-shuffle would be better.)
- 'peptide-list' is optional.
- Trypsin digestion pattern in Crux is '[KR]|{P}', so it does not cut K/R if the next AA is P. If you want to ignore this 'Proline' constraint, you can use '--custom-enzyme "[KR]|[X]"' instead of '--enzyme trypsin'.
DB setup for InsPecT
$~/src.MS/inspect/current/PrepDB.py FASTA (my fasta file)
- It makes an index file with '.trie' suffix after your FASTA filename.
DB setup for MSGFDB
$ java -cp ~/src.MS/MSGFDB/current/MSGFDB.jar msdbsearch.BuildSA -d (my FASTA file) -tda 0
- It generates .canno, .cnlcp, .csarr & .cseq files.
- If you want to use native MS-GFDB function, use -tda 2 (generate target & combined database) with target-only FASTA file.
Copy your mzXML files on this diretory ($SCRATCH/myProject/mzXML).
Prepare search
$ python ~/git/MS-toolbox/bin/prepare-tandemK.py Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK. Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/tandem-taxonomy.xml. Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/tandemK/20110713_XENLA_Egg1_1.tandemK.xml ... TandemK is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-tandemK.sh.
$ python ~/git/MS-toolbox/bin/prepare-inspect.py Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect. Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_1.inspect_in. Write /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/inspect/20110713_XENLA_Egg1_2.inspect_in. ... InsPecT is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-inspect.sh.
$ python ~/git/MS-toolbox/bin/prepare-MSGFDB.py Create /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/MSGFDB. 20110713_XENLA_Egg1_1.mzXML 20110713_XENLA_Egg1_2.mzXML .... MSGFDB is ready. Run /scratch/00992/linusben/xenopus.prot/TXGP_XENLA_Prot_Kwon201109/scripts/run-MSGFDB.sh.
Run search
In a standalone workstation, you can run ./script/run-(search_engine).sh directly to start. But you shouldn't do this in TACC login terminal. Put the following parameters on each run-*.sh script, then submit a job by qsub.
- If you use lonestar, replace '4way 8' to '8way to 24'. See Lonestar user guide and Longhorn user guide for detail.
- Don't forget to put your email address at -M.
- Put short job name to check the status easily.
#!/bin/bash #$ -V # Inherit the submission environment #$ -cwd # Start job in submission directory #$ -j y # Combine stderr and stdout #$ -o $JOB_NAME.o$JOB_ID #$ -pe 4way 8 #$ -q long #$ -l h_rt=24:00:00 # Run time (hh:mm:ss) #$ -M (your email) #$ -m be # Email at Begin and End of job #$ -P hpc set -x #$ -N (job name) (put the remaining part of run-* script after #!/bin/bash line)