Difference between revisions of "Texas Xenopus Genome Project/Species Identification"

From Marcotte Lab
Jump to: navigation, search
(XENTR_CHORI-219_ID.2 (hoxa5))
Line 1: Line 1:
 
== Target fragments ==
 
== Target fragments ==
 
=== XENTR_CHORI-219_ID.2 (hoxa5) ===
 
=== XENTR_CHORI-219_ID.2 (hoxa5) ===
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.2.fasta FASTA file]][[:xdata:ID/XENTR_CHORI-219_ID.2.muscle MUSCLE file]]
 
* [http://rebase.neb.com/rebase/enz/PsrI.html PsrI] cut only XENTR sequence (position 119, 151).
 
* [http://rebase.neb.com/rebase/enz/PsrI.html PsrI] cut only XENTR sequence (position 119, 151).
 
* PCR primer (designed by Primer3)
 
* PCR primer (designed by Primer3)
Line 9: Line 10:
 
</pre>
 
</pre>
  
* MUSCLE alignment
 
 
<pre>
 
<pre>
 
XENLA_CH219-206K  1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
 
XENLA_CH219-206K  1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64

Revision as of 14:28, 9 December 2009

Contents

Target fragments

XENTR_CHORI-219_ID.2 (hoxa5)

OLIGO            start  len      tm     gc%   any    3' seq 
LEFT PRIMER         24   20   59.99   45.00  2.00  0.00 caagagccacaaatcaagca
RIGHT PRIMER       221   20   59.92   50.00  5.00  1.00 agatccatgccattgtagcc
XENLA_CH219-206K   1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
XENTR_NM_0010114   1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
                       ****************************************************************
                                              >>>>>>>>>>>>>>>>>>>>
XENLA_CH219-206K  65 + gctcttattttgtaaactcattttgcggtcgctatccaaatggcccggactaccagttacataa 128
XENTR_NM_0010114  65 + gctcttattttgtaaactcattttgcggtcgctatccaaatggcccggactaccagttacataa 128
                       ****************************************************************

XENLA_CH219-206K 129 + ttatggagatcagagctcggtgagcgagcaatacagggatacaacgagcatgcattccagcagg 192
XENTR_NM_0010114 129 + ttatggagatcacagctcggtgaacgagcaatacagggatacaacgagcatgcattccagcagg 192
                       ************ ********** ****************************************

XENLA_CH219-206K 193 + tacggatacggctacaatggcatggatctcagcgttgggcgctcagcttccaaccactttagtg 256
XENTR_NM_0010114 193 + tacggatacggctacaatggcatggatctcagcgttgggcgctcagcttccaaccactttagtg 256
                       ****************************************************************
                                <<<<<<<<<<<<<<<<<<<<
XENLA_CH219-206K 257 + ccaatgacagagcaaggaattatccatccaatccgagctcctctacagagccaaggtataacca 320
XENTR_NM_0010114 257 + ccaatgacagagcaaggaattatccatccaatccgagctcctctacagagccaaggtataacca 320
                       ****************************************************************

XENLA_CH219-206K 321 + acctgcagcctct 333(1)
XENTR_NM_0010114 321 + acctgcagcgtct 333(1)
                       ********* ***

XENTR_CHORI-219_ID.3 (hoxa5)

  • MUSCLE alignment
XENLA_CH219-206K   1 + acaacataggagggccagaaggaaagcgagcccggacagcctacacccgctatcaaactctgga 64
XENTR_NM_0010114   1 + acaacataggagggccagaaggaaagcgagcccggacagcctacacccgctaccaaactctgga 64
                       **************************************************** ***********

XENLA_CH219-206K  65 + gctggaaaaggaatttcatttcaacagatacctgacccgcaggaggagaattgaaatagcccat 128
XENTR_NM_0010114  65 + gctggaaaaggaatttcatttcaacagatacctgacccgcaggaggaggattgaaatagcccat 128
                       ************************************************ ***************
                                                                               
XENLA_CH219-206K 129 + gcactctgtctctctgagaggcaaattaaaatctggttccaaaacaggagaatgaagtggaaaa 192
XENTR_NM_0010114 129 + gcactctgtctctctgagaggcaaattaaaatctggttccaaaacaggagaatgaagtggaaaa 192
                       ****************************************************************

XENLA_CH219-206K 193 + aagacaataaactaaagagtatgagcatggcagcggctgggggtgcatttcgtccttaaatttc 256
XENTR_NM_0010114 193 + aagacaataaactaaagagtatgagcatggcagcggctgggggtgcatttcgtccttaaatttc 256
                       ****************************************************************

XENLA_CH219-206K 257 + ccctccctgaagaatgtactgattcagtattagctaaacctgtatattgtcaccaccactttgt 320
XENTR_NM_0010114 257 + ccctccctaaagaatgtactgattcagtattagctaaacctgtatattgtcaccaccactttgt 320
                       ******** *******************************************************
                                                               
XENLA_CH219-206K 321 + ataactcttctctgtgacttctgtgaagataccctcctccccaatctcttgtccaccctttatc 384
XENTR_NM_0010114 321 + ataactcttctctgtgacttctgtgaagataccctcctccctaatatcttgtctaccctttatc 384
                       ***************************************** *** ******* **********

XENLA_CH219-206K 385 + ctagcaatgagccttt 400(1)
XENTR_NM_0010114 385 + ctagcaatgagccttt 400(1)
                       ****************

XENTR_CHORI-219_ID.24 (hoxa3)

  • MUSCLE alignment
XENLA_CH219-206K   1 + gtcggaccagggaggccattggaggaggaacgccacgtgacagaggggtgccaatgttattctt 64
XENTR_NM_0011274   1 + gtctgaccagcgaggccattggaggaggaacgccacgtgacagaggggtgccaatgttattctt 64
                       *** ****** *****************************************************

XENLA_CH219-206K  65 + tacgggtgtcaagaccctgtcagtttgtgaaataaatattgggaaacaacgaaatgcaaaaagc 128
XENTR_NM_0011274  65 + tacgggtgtcaagaccctgtcagtttgtgaaataaatattgggaaacaacgaaatgcaaaaagc 128
                       ****************************************************************

XENLA_CH219-206K 129 + gacctactacgacagcgctgcgatctatggcggatatccctaccaaggagcaaatggtttcact 192
XENTR_NM_0011274 129 + gacctactacgacagctctgcgatctatggtggatatccctaccaaggagcaaatggtttcact 192
                       **************** ************* *********************************

XENLA_CH219-206K 193 + tataatgcaagtcagcagcaatatcctccttcctcatctctgctggaaagtgaatatcatcgac 256
XENTR_NM_0011274 193 + tataatgcaagtcagcagcaatatcctccttcctcatctctgctggaaactgaatatcatcgac 256
                       ************************************************* **************

XENLA_CH219-206K 257 + ctgcctgctccctgcagtcacctggcagcgctgtgccccatctcaaggccaatgacatcaatga 320
XENTR_NM_0011274 257 + ctgcctgctccctgcagtcacctggcagcgcagtgccccatcacaaggccaatgacatcaacga 320
                       ******************************* ********** ****************** **

XENLA_CH219-206K 321 + aagttgtatgagaagtatcagcagtcaatcaagtcaagccccggtcattcccgagcagcagccc 384
XENTR_NM_0011274 321 + aagttgtatgagaaccattaacagtcaatcaaaccaagccccggtcattcccgagcagcagccc 384
                       **************  ** * ***********  ******************************

XENLA_CH219-206K 385 + acaccacaagggccgccaccctctgtgtccccaccacaaaccaccagcaatgcagccacagcct 448
XENTR_NM_0011274 385 + acaccgcaagggccgccaccctctgtgtccccaccacaaaccaccagcaatgcagccacagcct 448
                       ***** **********************************************************

XENLA_CH219-206K 449 + cctccaacaaggccacaggcatcaactcacctaccatgtcaaagcagattttcccttggatgaa 512
XENTR_NM_0011274 449 + cctccaacaaggccacaagcatcacctcacctaccatgtcaaagcagattttcccttggatgaa 512
                       ***************** ****** ***************************************

XENLA_CH219-206K 513 + agagtcccggcagaacacaa 532(1)
XENTR_NM_0011274 513 + agaatcccgacagaacacga 532(1)
                       *** ***** ******** *

XENTR_CHORI-219_ID.25 (hoxa3)

  • MUSCLE alignemtn
XENLA_CH219-206K   1 + gtgagagctgtgcaggagacaaaagtcccccggggcaatcctcttccaaaagggcccgcactgc 64
XENTR_NM_0011274   1 + gtgagagttgtgctggagacaaaagccccccggggcaatcctcttccaagagggcccgcactgc 64
                       ******* ***** *********** *********************** **************

XENLA_CH219-206K  65 + ttacacaagcgctcagctggtagaactggaaaaagagttccactttaacagatacctgtgcaga 128
XENTR_NM_0011274  65 + ttacacaagcgctcagctggtagaactggaaaaagagttccactttaacagatacctgtgcaga 128
                       ****************************************************************

XENLA_CH219-206K 129 + cccaggagggtggagatggccaatctactcaacctcaccgagaggcaaattaagatctggtttc 192
XENTR_NM_0011274 129 + cccaggagggtggagatggccaatctgctcaacctcaccgagaggcaaattaagatctggtttc 192
                       ************************** *************************************

XENLA_CH219-206K 193 + agaacaggcgaatgaaatacaaaaaggatcaaaaagggaaatccatgatgacctcttcaggagg 256
XENTR_NM_0011274 193 + agaacaggcgaatgaaatacaaaaaggatcaaaaagggaaatccatgatgacctcttcaggagg 256
                       ****************************************************************

XENLA_CH219-206K 257 + acagtcaccatgtaggagcccggtgccagctccatctgttggaggttacctaaactctatgcat 320
XENTR_NM_0011274 257 + gcagtcaccatgtaggagcccagtgccgactccatctgttggaggttacctaaactctatgcat 320
                        ******************** *****  ***********************************

XENLA_CH219-206K 321 + tctttggtaaacagtgtcccctatgagcctcagtctccccctg 363(1)
XENTR_NM_0011274 321 + tctttggtaaacagtgtcccctatgagcctcagtctcccccag 363(1)
                       ***************************************** *

XENTR_CHORI-219_ID.26 (hoxa3)

  • MUSCLE alignment
XENLA_CH219-206K   1 + aacaaacaccaccctggcgcctatggtgtgcctgcaccctacccaagcccccacaacagctgtc 64
XENTR_NM_0011274   1 + aacaaacaccaccctagcgcgtatggcgtgcctgcaccctacccaagcccccacaacagctgcc 64
                       *************** **** ***** *********************************** *

XENLA_CH219-206K  65 + ctccccaccaaaaaagatacagcggaactgctgcggtcaccccagaatatgagccccatcctct 128
XENTR_NM_0011274  65 + ctccccaccaaaagagatacagcgggactgctgcggtcacccctgaatatgagccacatcctct 128
                       ************* *********** ***************** *********** ********

XENLA_CH219-206K 129 + ccaacaaggcaacggagcttatgggaatcctcatgtacagggaagccccgtttatgtaggggga 192
XENTR_NM_0011274 129 + ccaacaaagcagcggagcttatgggaatccgcatgtacagggaagccccgtttatgtagggggg 192
                       ******* *** ****************** ********************************

XENLA_CH219-206K 193 + aactatgtggagaccatgactaattctggaccatccatgtttggtttgtctcatctctctcatt 256
XENTR_NM_0011274 193 + aactatgtggagaccatgactaattctggaccatccatgtttggtttgtctcatctctctcatt 256
                       ****************************************************************

XENLA_CH219-206K 257 + cctcatcgaacatggactacagtggagccggacccatgaacagtggtcaccaccatggaccttg 320
XENTR_NM_0011274 257 + cctcatcgaacatggactacagtggagccggacccatgaacagtggtcaccaccatggaccctg 320
                       ************************************************************* **

XENLA_CH219-206K 321 + tgactcccacactacatacacagacttatctgctcaccacaatcctcagggaagaattcaggaa 384
XENTR_NM_0011274 321 + tgactctcaccctacatacacggacttatctgctcaccacaatcctcagggaagaattcaggaa 384
                       ****** *** ********** ******************************************

XENLA_CH219-206K 385 + gcccccaaattaacacatttgtaatggccatggagacaaataatcccctcttttc 439(1)
XENTR_NM_0011274 385 + gcccccaaattaacacatttgtaatgatcgtggagacaaattttccccttttttc 439(1)
                       **************************  * ***********  ****** *****

XENTR_CHORI-219_ID.27 (pitx-1)

  • MUSCLE alignment
XENLA_CH219-166K   1 + agaaagaacgaaccggggagccaaaaggagaggacgggaatggggatgatcccagcaagaaaaa 64
XENTR_NM_0010074   1 + agaaagaacgaagtggggagccaaagggagaggacggcaatggggatgatcccacaaagaaaaa 64
                       ************  *********** *********** ****************  ********

XENLA_CH219-166K  65 + gaagcagaggagacaaaggactcactttaccagccagcagctgcaggagctggaggccactttc 128
XENTR_NM_0010074  65 + gaagcagaggagacaaaggactcactttaccagccagcagctgcaggagctggaggccactttc 128
                       ****************************************************************

XENLA_CH219-166K 129 + cagaggaaccgatatccagacatgagcatgagagaggagattgctgtatggaccaatctgactg 192
XENTR_NM_0010074 129 + cagaggaaccgctacccagacatgagcatgagagaggagatcgctgtatggaccaatctgactg 192
                       *********** ** ************************** **********************

XENLA_CH219-166K 193 + aggccagggtcagggt 208(1)
XENTR_NM_0010074 193 + aggccagggtcagggt 208(1)
                       ****************

XENTR_CHORI-219_ID.28 (pitx-1)

  • MUSCLE alignment
XENLA_CH219-166K   1 + tggttcaagaaccgccgagccaagtggaggaagagggagcggaaccagcagatggacctgtgta 64
XENTR_NM_0010074   1 + tggttcaagaaccgcagagccaagtggaggaagagggagcggaaccagcagatggacctgtgca 64
                       *************** ********************************************** *

XENLA_CH219-166K  65 + agaatggttacgtgccccagttcagcgggctcatgcagccgtacgacgagatgtacgcaggata 128
XENTR_NM_0010074  65 + agaatggctacgtgccccagttcagcggcctgatgcagccctacgatgagatgtacgctggcta 128
                       ******* ******************** ** ******** ***** *********** ** **

XENLA_CH219-166K 129 + cccctacaacaactgggccacaaaaagcctcacccctgcccccctgtccaccaagagcttcacc 192
XENTR_NM_0010074 129 + cccgtacaacaactgggccaccaaaagcctcacccctgcccccctgtccaccaagagcttcacc 192
                       *** ***************** ******************************************

XENLA_CH219-166K 193 + ttcttcaactccatgagtcccttgtcttcccagtccatgttctccggccccagctccatctctt 256
XENTR_NM_0010074 193 + ttcttcaactccatgagtccgctgtcctcccagtccatgttctctggccccagctccatctctt 256
                       ********************  **** ***************** *******************

XENLA_CH219-166K 257 + ccatgagcatgccctccagcatgggtcactctgcggtgccaggcatggccaactc 311(1)
XENTR_NM_0010074 257 + ccatgagcatgccctccagcatgggccactcggcggtgcccggcatgcccaactc 311(1)
                       ************************* ***** ******** ****** *******

XENTR_CHORI-219_ID.29 (pitx-1)

  • MUSCLE alignment
XENLA_CH219-166K   1 + atcagcggctcctccctcaactcagccatgtcttctactggttgtccctatggaccccctggtt 64
XENTR_NM_0010074   1 + atcagcagctcctccctcaactccgccatgtcttctactgcttgtccctatggcccccccggct 64
                       ****** **************** **************** ************ ***** ** *

XENLA_CH219-166K  65 + ccccctacacggtttaccgggacacttgtaactccagtctggcaagcctgagactgaaatccaa 128
XENTR_NM_0010074  65 + ccccatacacggtatacagggacacttgtaactcgagtttggccagcctgagactgaaatccaa 128
                       **** ******** *** **************** *** **** ********************

XENLA_CH219-166K 129 + gcagcactccaccttcggctacagcagcctccagagcccggcctccagcctcaatgcttgccaa 192
XENTR_NM_0010074 129 + gcagcactccacctttggctacagtagcctgcagagcccggcctccagcctcaatgcctgccag 192
                       *************** ******** ***** ************************** *****

XENLA_CH219-166K 193 + tataacagttgatagactcccc 214(1)
XENTR_NM_0010074 193 + tataacagttgatagactcccc 214(1)
                       **********************

Selection procedure

  • Download CHORI-219 sequences (from NCBI GenBank).
 blat XENLA_CH219.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENLA_CH219.blat_pslx -out=pslx
  • Parse two BLAT output files with the following criteria.
    1. From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
    2. Select X. tropicalis mRNA sequences which hit both CHORI-219 (minimum match length is 200 bp to be called as a 'hit'). I only consider 10 CHORI-219 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
    3. Survey each hit blocks. If the hit block is less than 200 bp, discard it. 42 hit blocks from 8 mRNAs are selected.
      • NM_001004837 Unnamed, predicted gene MGC69309 NCBIXenBase
      • NM_001007499 paired-like homeodomain 1 (pitx-1) NCBIXenBase
      • NM_001011405 Homeobox A5 (hoxa5) NCBIXebBase
      • NM_001035121 CCAAT/enhancer binding protein (C/EBP), beta (cebpb) NCBIXenBase
      • NM_001113032 LY6/PLAUR domain containing 6 (lypd6) NCBIXenBase
      • NM_001127429 homeobox A3 (hoxa3) NCBIXenBase
      • NM_001129937 SRY (sex determining region Y)-box 18 (sox18) NCBI XenBase
      • NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) NCBIXenBase
  • Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. I found that most of candidate sequences have duplications. So I filtered them out based on 'Self' mapping information from MUSCLE output (if a sequence is mapped itself longer than 50 bp). Finally 10 fragments are selected.
    • hoxa3 - XENTR_CHORI-219_ID.24.muscle, XENTR_CHORI-219_ID.25.muscle, XENTR_CHORI-219_ID.26.muscle
    • pitx-1 - XENTR_CHORI-219_ID.27.muscle, XENTR_CHORI-219_ID.28.muscle, XENTR_CHORI-219_ID.29.muscle
    • hoxa5 - XENTR_CHORI-219_ID.2.muscle, XENTR_CHORI-219_ID.3.muscle
 $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle 

Here's the example of Self duplication.

>XENTR_NM_001113032_18 gi|163915026|ref|NM_001113032|

  3 + ccaggggtacccagggcacaaataagcactcaccccaaatccccccctaactggccttcaggct 66
      ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||
138 + ccaggggtacccagggcacaaataagcactcaccccaaatctccccctaactggccttcaggct 201

 67 + gggcccccttagcccataacaaggttacagatagttagaaacattggg 114
      |||||||||||||||||||||||||||||||||  |||||||||||||
202 + gggcccccttagcccataacaaggttacagatatatagaaacattggg 249