Difference between revisions of "Texas Xenopus Genome Project/Species Identification"

From Marcotte Lab
Jump to: navigation, search
Line 1: Line 1:
== Select candidate sequences ==
+
== Selection procedure ==
 
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version).  
 
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version).  
 
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped.
 
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped.
Line 5: Line 5:
 
** [[:xdata:ID/XENTR_CH216.fasta.gz]] 1.2 MB, gzipped. (CHORI-216 sequences. 160 BAC sequences from ''X. tropicalis'' genome)
 
** [[:xdata:ID/XENTR_CH216.fasta.gz]] 1.2 MB, gzipped. (CHORI-216 sequences. 160 BAC sequences from ''X. tropicalis'' genome)
 
** [[:xdata:ID/XENLA_CH219.fasta.gz]] 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from ''X. laeves'' genome)
 
** [[:xdata:ID/XENLA_CH219.fasta.gz]] 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from ''X. laeves'' genome)
* Run BLAT (with default option) to known CHORI BAC sequences.
+
* Run BLAT (version 3.4, with default option) to known CHORI BAC sequences.
 +
*<pre> blat XENTR_CH216.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENTR_CH216.blat_pslx -out=pslx</pre>
 
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped.  
 
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped.  
 
** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped.
 
** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped.
 
* Parse two BLAT output files with the following criteria.  
 
* Parse two BLAT output files with the following criteria.  
 
*# From ''X. tropicalis'' mRNA, only RefSeq (starts sith 'NM_') sequences are considered.  
 
*# From ''X. tropicalis'' mRNA, only RefSeq (starts sith 'NM_') sequences are considered.  
*# Select ''X. tropicalis'' mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit').  
+
*# Select ''X. tropicalis'' mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit'). For CHORI-219 hits, I only consider 10 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').  
 
*# Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from ''X. tropicalis'' mRNA, the target sequence from CHORI-219 BACs (''X. laevis'') and the target sequence from CHORI-216 BACs (''X. tropicalis''). ONE hit block is reported.
 
*# Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from ''X. tropicalis'' mRNA, the target sequence from CHORI-219 BACs (''X. laevis'') and the target sequence from CHORI-216 BACs (''X. tropicalis''). ONE hit block is reported.
 
<pre>
 
<pre>

Revision as of 11:06, 9 December 2009

Selection procedure

  • Download X. tropicalis mRNA sequences from XenBase (Nov. 27, 2009 version).
  • Download CHORI-216 sequences (from XenBase) and CHORI-219 sequences (from NCBI GenBank).
  • Run BLAT (version 3.4, with default option) to known CHORI BAC sequences.
  •  blat XENTR_CH216.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENTR_CH216.blat_pslx -out=pslx
  • Parse two BLAT output files with the following criteria.
    1. From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
    2. Select X. tropicalis mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit'). For CHORI-219 hits, I only consider 10 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
    3. Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from X. tropicalis mRNA, the target sequence from CHORI-219 BACs (X. laevis) and the target sequence from CHORI-216 BACs (X. tropicalis). ONE hit block is reported.
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220|
ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg
cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc
aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg
ccctgggtacccctggaactatagcagggtgac
>XENTR_CH216-2E23_0
tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca
ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca
taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa
atc
>XENLA_CH219-20I13_0
ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg
cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc
aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg
ccctggttacccctggaactatagcagggtgac