Difference between revisions of "Texas Xenopus Genome Project/Species Identification"

From Marcotte Lab
Jump to: navigation, search
Line 1: Line 1:
== How to select ==
+
== Select candidate sequences ==
 
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version).  
 
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version).  
 
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped.
 
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped.
Line 8: Line 8:
 
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped.  
 
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped.  
 
** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped.
 
** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped.
 +
* Parse two BLAT output files with the following criteria.
 +
*# From ''X. tropicalis'' mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
 +
*# Select ''X. tropicalis'' mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit').
 +
*# Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from ''X. tropicalis'' mRNA, the target sequence from CHORI-219 BACs (''X. laevis'') and the target sequence from CHORI-216 BACs (''X. tropicalis''). ONE hit block is reported.
 +
<pre>
 +
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220|
 +
ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg
 +
cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc
 +
aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg
 +
ccctgggtacccctggaactatagcagggtgac
 +
>XENTR_CH216-2E23_0
 +
tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca
 +
ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca
 +
taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa
 +
atc
 +
>XENLA_CH219-20I13_0
 +
ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg
 +
cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc
 +
aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg
 +
ccctggttacccctggaactatagcagggtgac
 +
</pre>
  
 
----
 
----
 
[[Category:XenopusGenome]]
 
[[Category:XenopusGenome]]

Revision as of 11:02, 9 December 2009

Select candidate sequences

  • Download X. tropicalis mRNA sequences from XenBase (Nov. 27, 2009 version).
  • Download CHORI-216 sequences (from XenBase) and CHORI-219 sequences (from NCBI GenBank).
  • Run BLAT (with default option) to known CHORI BAC sequences.
  • Parse two BLAT output files with the following criteria.
    1. From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
    2. Select X. tropicalis mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit').
    3. Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from X. tropicalis mRNA, the target sequence from CHORI-219 BACs (X. laevis) and the target sequence from CHORI-216 BACs (X. tropicalis). ONE hit block is reported.
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220|
ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg
cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc
aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg
ccctgggtacccctggaactatagcagggtgac
>XENTR_CH216-2E23_0
tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca
ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca
taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa
atc
>XENLA_CH219-20I13_0
ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg
cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc
aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg
ccctggttacccctggaactatagcagggtgac