Difference between revisions of "Texas Xenopus Genome Project/Species Identification"

From Marcotte Lab
Jump to: navigation, search
(Selection procedure)
Line 34: Line 34:
 
</pre>
 
</pre>
  
 +
* Run MUSCLE (version 4.0, with default option) for multiple sequence alignment.
 +
<pre> $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle </pre>
 +
<pre>
 +
XENLA_CH219-20I1  1 + ttattt----------------------gtgccctggatacccctggaactatagcagggtgac 42
 +
XENTR_NM_0011422  1 + ttattt----------------------gtgccctgggtacccctggaactatagcggggtgac 42
 +
XENTR_CH216-2E23  1 + tcaccccaaatccccccctaactggccttcaggctgggcccccttag-ctcataacaaggttac 63
 +
                      *.*...                      .....****...***.*.**...***.*..***.**
 +
 +
XENLA_CH219-20I1  43 + tgttaccccaatgtttctatatatctgtaaccttgttattagct-aagggggcccagtctgaag 105
 +
XENTR_NM_0011422  43 + tgttaccccaatgtttctatatatctgtaaccttgttatgggct-aagggggcccagcctgaag 105
 +
XENTR_CH216-2E23  64 + agatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggc---- 123
 +
                      .*.**.....*....*.....**.*.**.***..*.*** .... *.***..***** ..****
 +
 +
XENLA_CH219-20I1 106 + gtcagttagggggagatttggggtgagggcttatttg-----taccctgggtacccctggaact 164
 +
XENTR_NM_0011422 106 + gccagttagggggggatttggggtgagtgcttatttg-----tgccctgggtacccctggaact 164
 +
XENTR_CH216-2E23 124 + -acaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccc-cttagccc 185
 +
                      * **..**.*... .*.......*.*. .*.**..**    ....*****..*****....*.
 +
 +
XENLA_CH219-20I1 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgagctaa-gggg 227
 +
XENTR_NM_0011422 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaa-gggg 227
 +
XENTR_CH216-2E23 186 + ataacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggt 249
 +
                      ***.**.***.**.*.**.....*....*.....**.*.**.***..*.***......* ***.
 +
 +
XENLA_CH219-20I1 228 + gcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtgccctggttaccc 291
 +
XENTR_NM_0011422 228 + gcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtaccc 291
 +
XENTR_CH216-2E23 250 + acccagggca---------------caaataagcact----------------------caccc 276
 +
                      .***** ...***************...**..*...****** *************** .****
 +
 +
XENLA_CH219-20I1 292 + ctggaactatagcagggtgac 312(341)
 +
XENTR_NM_0011422 292 + ctggaactatagcagggtgac 312(341)
 +
XENTR_CH216-2E23 277 + c---------------aaatc 282(341)
 +
                      ****************....*
 +
</pre>
 
----
 
----
 
[[Category:XenopusGenome]]
 
[[Category:XenopusGenome]]

Revision as of 11:10, 9 December 2009

Selection procedure

  • Download CHORI-216 sequences (from XenBase) and CHORI-219 sequences (from NCBI GenBank).
 blat XENTR_CH216.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENTR_CH216.blat_pslx -out=pslx
  • Parse two BLAT output files with the following criteria.
    1. From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
    2. Select X. tropicalis mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit'). For CHORI-219 hits, I only consider 10 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
    3. Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from X. tropicalis mRNA, the target sequence from CHORI-219 BACs (X. laevis) and the target sequence from CHORI-216 BACs (X. tropicalis). ONE hit block is reported.
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220|
ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg
cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc
aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg
ccctgggtacccctggaactatagcagggtgac
>XENTR_CH216-2E23_0
tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca
ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca
taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa
atc
>XENLA_CH219-20I13_0
ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg
cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc
aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg
ccctggttacccctggaactatagcagggtgac
  • Run MUSCLE (version 4.0, with default option) for multiple sequence alignment.
 $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle 
XENLA_CH219-20I1   1 + ttattt----------------------gtgccctggatacccctggaactatagcagggtgac 42
XENTR_NM_0011422   1 + ttattt----------------------gtgccctgggtacccctggaactatagcggggtgac 42
XENTR_CH216-2E23   1 + tcaccccaaatccccccctaactggccttcaggctgggcccccttag-ctcataacaaggttac 63
                       *.*...                      .....****...***.*.**...***.*..***.**

XENLA_CH219-20I1  43 + tgttaccccaatgtttctatatatctgtaaccttgttattagct-aagggggcccagtctgaag 105
XENTR_NM_0011422  43 + tgttaccccaatgtttctatatatctgtaaccttgttatgggct-aagggggcccagcctgaag 105
XENTR_CH216-2E23  64 + agatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggc---- 123
                       .*.**.....*....*.....**.*.**.***..*.*** .... *.***..***** ..****

XENLA_CH219-20I1 106 + gtcagttagggggagatttggggtgagggcttatttg-----taccctgggtacccctggaact 164
XENTR_NM_0011422 106 + gccagttagggggggatttggggtgagtgcttatttg-----tgccctgggtacccctggaact 164
XENTR_CH216-2E23 124 + -acaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccc-cttagccc 185
                       * **..**.*... .*.......*.*. .*.**..**     ....*****..*****....*.

XENLA_CH219-20I1 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgagctaa-gggg 227
XENTR_NM_0011422 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaa-gggg 227
XENTR_CH216-2E23 186 + ataacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggt 249
                       ***.**.***.**.*.**.....*....*.....**.*.**.***..*.***......* ***.

XENLA_CH219-20I1 228 + gcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtgccctggttaccc 291
XENTR_NM_0011422 228 + gcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtaccc 291
XENTR_CH216-2E23 250 + acccagggca---------------caaataagcact----------------------caccc 276
                       .***** ...***************...**..*...****** *************** .****

XENLA_CH219-20I1 292 + ctggaactatagcagggtgac 312(341)
XENTR_NM_0011422 292 + ctggaactatagcagggtgac 312(341)
XENTR_CH216-2E23 277 + c---------------aaatc 282(341)
                       ****************....*