Difference between revisions of "Texas Xenopus Genome Project/Species Identification"

From Marcotte Lab
Jump to: navigation, search
(XENTR_CHORI-219_ID.2 (hoxa5))
 
(21 intermediate revisions by one user not shown)
Line 1: Line 1:
== Selection procedure ==
+
== Designed ==
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version).
+
=== XENTR_CHORI-219_ID.2 (hoxa5) ===
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped.
+
* Only cut XENTR - [http://rebase.neb.com/rebase/enz/PsrI.html PsrI](119, 151).
 +
* Only cut XENLA - None.
 +
* PCR primer (designed by Primer3)
 +
<pre>
 +
OLIGO            start  len      tm    gc%  any    3' seq
 +
LEFT PRIMER        24  20  59.99  45.00  2.00  0.00 caagagccacaaatcaagca
 +
RIGHT PRIMER      221  20  59.92  50.00  5.00  1.00 agatccatgccattgtagcc
 +
</pre>
  
* Download CHORI-216 sequences (from XenBase) and CHORI-219 sequences (from NCBI GenBank).
+
<pre>
** [[:xdata:ID/XENTR_CH216.fasta.gz]] 1.2 MB, gzipped. (CHORI-216 sequences. 160 BAC sequences from ''X. tropicalis'' genome)
+
XENLA_CH219-206K  1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
** [[:xdata:ID/XENLA_CH219.fasta.gz]] 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from ''X. laeves'' genome)
+
XENTR_NM_0010114  1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
 +
                      ****************************************************************
 +
                                              >>>>>>>>>>>>>>>>>>>>
 +
XENLA_CH219-206K  65 + gctcttattttgtaaactcattttgcggtcgctatccaaatggcccggactaccagttacataa 128
 +
XENTR_NM_0010114  65 + gctcttattttgtaaactcattttgcggtcgctatccaaatggcccggactaccagttacataa 128
 +
                      ****************************************************************
  
* Run BLAT (version 3.4, with default option) to known CHORI BAC sequences.
+
XENLA_CH219-206K 129 + ttatggagatcagagctcggtgagcgagcaatacagggatacaacgagcatgcattccagcagg 192
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped.
+
XENTR_NM_0010114 129 + ttatggagatcacagctcggtgaacgagcaatacagggatacaacgagcatgcattccagcagg 192
** [[:xdata:XENTR_mRNA.XENTR_CH216.blat_pslx.gz]] 20 MB, gzipped.
+
                      ************ ********** ****************************************
:<pre> blat XENTR_CH216.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENTR_CH216.blat_pslx -out=pslx</pre>
+
  
* Parse two BLAT output files with the following criteria.
+
XENLA_CH219-206K 193 + tacggatacggctacaatggcatggatctcagcgttgggcgctcagcttccaaccactttagtg 256
*# From ''X. tropicalis'' mRNA, only RefSeq (starts sith 'NM_') sequences are considered.  
+
XENTR_NM_0010114 193 + tacggatacggctacaatggcatggatctcagcgttgggcgctcagcttccaaccactttagtg 256
*# Select ''X. tropicalis'' mRNA sequences which hit both CHORI-219 and CHORI-216 (minimum match length is 200 bp to be called as a 'hit'). For CHORI-219 hits, I only consider 10 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
+
                      ****************************************************************
*# Survey each hit blocks. If the same mRNA fragment hits both CHORI-219 and CHORI-216, report three sequences: the query sequence from ''X. tropicalis'' mRNA, the target sequence from CHORI-219 BACs (''X. laevis'') and the target sequence from CHORI-216 BACs (''X. tropicalis''). ONE hit block is reported.
+
                                <<<<<<<<<<<<<<<<<<<<
 +
XENLA_CH219-206K 257 + ccaatgacagagcaaggaattatccatccaatccgagctcctctacagagccaaggtataacca 320
 +
XENTR_NM_0010114 257 + ccaatgacagagcaaggaattatccatccaatccgagctcctctacagagccaaggtataacca 320
 +
                      ****************************************************************
 +
 
 +
XENLA_CH219-206K 321 + acctgcagcctct 333(1)
 +
XENTR_NM_0010114 321 + acctgcagcgtct 333(1)
 +
                      ********* ***
 +
</pre>
 +
 
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.2.fasta|FASTA file]]
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.2.muscle|MUSCLE file]]
 +
 
 +
=== XENTR_CHORI-219_ID.25 (hoxa3) ===
 +
* Only cut XENTR - BccI(136), BfiI(110), BglII(21), BstEII(140), Eco57I(72), Eco57MI(72), MboII(78), MmeI(115), PflMI(134), PleI(118), Tsp45I(99), TspDTI(59), TspRI(123,178)  
 +
* Only cut XENLA - BciVi(136,167), BspMI(156), EcoP15I(193), HphI(136), MslI(105), NspI(157), SacI(44,122), SphI(157), TspGWI(157)
 +
* PCR primer
 
<pre>
 
<pre>
>XENTR_NM_001142220_0 gi|213983084|ref|NM_001142220|
+
OLIGO            start  len      tm    gc%  any    3' seq
ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaaggggg
+
LEFT PRIMER        161  20  60.21  50.00  7.00  2.00 cctcaccgagaggcaaatta
cccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttacccc
+
RIGHT PRIMER      351  20  59.68  60.00  5.00  3.00 gaggctcataggggacactg
aatgtttctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtg
+
ccctgggtacccctggaactatagcagggtgac
+
>XENTR_CH216-2E23_0
+
tcaccccaaatccccccctaactggccttcaggctgggcccccttagctcataacaaggttacagatatatagaaacattggggtaacagtca
+
ccccgctatagttccaggggtacccagggcacaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccccttagccca
+
taacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggcacaaataagcactcaccccaa
+
atc
+
>XENLA_CH219-20I13_0
+
ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttattagctaaggggg
+
cccagtctgaaggtcagttagggggagatttggggtgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttacccc
+
aatgtttctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtg
+
ccctggttacccctggaactatagcagggtgac
+
 
</pre>
 
</pre>
  
* Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. Interestingly, sequence from CHORI-216 is somewhat different, compared to both XENTR_mRNA and CHORI-219 fragment.
 
<pre> $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle </pre>
 
 
<pre>
 
<pre>
XENLA_CH219-20I1   1 + ttattt----------------------gtgccctggatacccctggaactatagcagggtgac 42
+
XENLA_CH219-206K   1 + gtgagagctgtgcaggagacaaaagtcccccggggcaatcctcttccaaaagggcccgcactgc 64
XENTR_NM_0011422   1 + ttattt----------------------gtgccctgggtacccctggaactatagcggggtgac 42
+
XENTR_NM_0011274   1 + gtgagagttgtgctggagacaaaagccccccggggcaatcctcttccaagagggcccgcactgc 64
XENTR_CH216-2E23  1 + tcaccccaaatccccccctaactggccttcaggctgggcccccttag-ctcataacaaggttac 63
+
                       ******* ***** *********** *********************** **************
                       *.*...                      .....****...***.*.**...***.*..***.**
+
  
XENLA_CH219-20I1 43 + tgttaccccaatgtttctatatatctgtaaccttgttattagct-aagggggcccagtctgaag 105
+
XENLA_CH219-206K 65 + ttacacaagcgctcagctggtagaactggaaaaagagttccactttaacagatacctgtgcaga 128
XENTR_NM_0011422 43 + tgttaccccaatgtttctatatatctgtaaccttgttatgggct-aagggggcccagcctgaag 105
+
XENTR_NM_0011274 65 + ttacacaagcgctcagctggtagaactggaaaaagagttccactttaacagatacctgtgcaga 128
XENTR_CH216-2E23  64 + agatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggc---- 123
+
                       ****************************************************************
                       .*.**.....*....*.....**.*.**.***..*.*** .... *.***..***** ..****
+
  
XENLA_CH219-20I1 106 + gtcagttagggggagatttggggtgagggcttatttg-----taccctgggtacccctggaact 164
+
XENLA_CH219-206K 129 + cccaggagggtggagatggccaatctactcaacctcaccgagaggcaaattaagatctggtttc 192
XENTR_NM_0011422 106 + gccagttagggggggatttggggtgagtgcttatttg-----tgccctgggtacccctggaact 164
+
XENTR_NM_0011274 129 + cccaggagggtggagatggccaatctgctcaacctcaccgagaggcaaattaagatctggtttc 192
XENTR_CH216-2E23 124 + -acaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccc-cttagccc 185
+
                      ************************** *************************************
                       * **..**.*... .*.......*.*. .*.**..**     ....*****..*****....*.
+
                                                      >>>>>>>>>>>>>>>>>>>>
 +
XENLA_CH219-206K 193 + agaacaggcgaatgaaatacaaaaaggatcaaaaagggaaatccatgatgacctcttcaggagg 256
 +
XENTR_NM_0011274 193 + agaacaggcgaatgaaatacaaaaaggatcaaaaagggaaatccatgatgacctcttcaggagg 256
 +
                       ****************************************************************
  
XENLA_CH219-20I1 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgagctaa-gggg 227
+
XENLA_CH219-206K 257 + acagtcaccatgtaggagcccggtgccagctccatctgttggaggttacctaaactctatgcat 320
XENTR_NM_0011422 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaa-gggg 227
+
XENTR_NM_0011274 257 + gcagtcaccatgtaggagcccagtgccgactccatctgttggaggttacctaaactctatgcat 320
XENTR_CH216-2E23 186 + ataacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggt 249
+
                        ******************** *****  ***********************************
                      ***.**.***.**.*.**.....*....*.....**.*.**.***..*.***......* ***.
+
  
XENLA_CH219-20I1 228 + gcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtgccctggttaccc 291
+
XENLA_CH219-206K 321 + tctttggtaaacagtgtcccctatgagcctcagtctccccctg 363(1)
XENTR_NM_0011422 228 + gcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtaccc 291
+
XENTR_NM_0011274 321 + tctttggtaaacagtgtcccctatgagcctcagtctcccccag 363(1)
XENTR_CH216-2E23 250 + acccagggca---------------caaataagcact----------------------caccc 276
+
                       ***************************************** *
                       .***** ...***************...**..*...****** *************** .****
+
                                  <<<<<<<<<<<<<<<<<<<<
 
+
XENLA_CH219-20I1 292 + ctggaactatagcagggtgac 312(341)
+
XENTR_NM_0011422 292 + ctggaactatagcagggtgac 312(341)
+
XENTR_CH216-2E23 277 + c---------------aaatc 282(341)
+
                      ****************....*
+
 
</pre>
 
</pre>
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.25.fasta|FASTA file]]
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.25.muscle|MUSCLE file]]
  
* Run MUSCLE again, only with XENTR_mRNA and CHORI-219 sequence.
+
=== XENTR_CHORI-219_ID.28 (pitx-1) ===
 +
* Only cut XENTR - SfaNI(46),TauI(52)
 +
* Only cut XENLA - BciVI(78),FauI(42),Hpy99I(72)
 +
* PCR primer
 
<pre>
 
<pre>
XENLA_CH219-20I1   1 + ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatat 64
+
OLIGO            start  len      tm    gc%  any    3' seq
XENTR_NM_0011422   1 + ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatat 64
+
LEFT PRIMER        42  20  60.12  55.00  5.00  2.00 gaaccagcagatggacctgt
                       *************** ****************** *****************************
+
RIGHT PRIMER      194  20  59.84  50.00  4.00  0.00 aaggtgaagctcttggtgga
 +
</pre>
 +
<pre>
 +
XENLA_CH219-166K   1 + tggttcaagaaccgccgagccaagtggaggaagagggagcggaaccagcagatggacctgtgta 64
 +
XENTR_NM_0010074   1 + tggttcaagaaccgcagagccaagtggaggaagagggagcggaaccagcagatggacctgtgca 64
 +
                       *************** ********************************************** *
 +
                                                              >>>>>>>>>>>>>>>>>>>
 +
XENLA_CH219-166K  65 + agaatggttacgtgccccagttcagcgggctcatgcagccgtacgacgagatgtacgcaggata 128
 +
XENTR_NM_0010074  65 + agaatggctacgtgccccagttcagcggcctgatgcagccctacgatgagatgtacgctggcta 128
 +
                      ******* ******************** ** ******** ***** *********** ** **
  
XENLA_CH219-20I1  65 + atctgtaaccttgttattagctaagggggcccagtctgaaggtcagttagggggagatttgggg 128
+
XENLA_CH219-166K 129 + cccctacaacaactgggccacaaaaagcctcacccctgcccccctgtccaccaagagcttcacc 192
XENTR_NM_0011422  65 + atctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttgggg 128
+
XENTR_NM_0010074 129 + cccgtacaacaactgggccaccaaaagcctcacccctgcccccctgtccaccaagagcttcacc 192
                       *****************  *************** ******* *********** *********
+
                       *** ***************** ******************************************
 +
                                                                    <<<<<<<<<<<<<<<<<<
 +
XENLA_CH219-166K 193 + ttcttcaactccatgagtcccttgtcttcccagtccatgttctccggccccagctccatctctt 256
 +
XENTR_NM_0010074 193 + ttcttcaactccatgagtccgctgtcctcccagtccatgttctctggccccagctccatctctt 256
 +
                      ********************  **** ***************** *******************
 +
                      <<
 +
XENLA_CH219-166K 257 + ccatgagcatgccctccagcatgggtcactctgcggtgccaggcatggccaactc 311(1)
 +
XENTR_NM_0010074 257 + ccatgagcatgccctccagcatgggccactcggcggtgcccggcatgcccaactc 311(1)
 +
                      ************************* ***** ******** ****** *******
 +
</pre>
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.28.fasta|FASTA file]]
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.28.muscle|MUSCLE file]]
  
XENLA_CH219-20I1 129 + tgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttaccccaatgtt 192
+
== Candidates ==
XENTR_NM_0011422 129 + tgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttaccccaatgtt 192
+
=== XENTR_CHORI-219_ID.3 (hoxa5) ===
                      **** ********** ************************************************
+
* [[:xdata:ID/XENTR_CHORI-219_ID.3.fasta|FASTA file]]
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.3.muscle|MUSCLE file]]
  
XENLA_CH219-20I1 193 + tctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggaga 256
+
=== XENTR_CHORI-219_ID.24 (hoxa3) ===
XENTR_NM_0011422 193 + tctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttaggggggga 256
+
* [[:xdata:ID/XENTR_CHORI-219_ID.24.fasta|FASTA file]]
                      ************************* *************** ******************* **
+
* [[:xdata:ID/XENTR_CHORI-219_ID.24.muscle|MUSCLE file]]
  
XENLA_CH219-20I1 257 + tatggggtgagtgtttatttgtgccctggttacccctggaactatagcagggtgac 312(1)
+
=== XENTR_CHORI-219_ID.26 (hoxa3) ===
XENTR_NM_0011422 257 + tttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgac 312(1)
+
* [[:xdata:ID/XENTR_CHORI-219_ID.26.fasta|FASTA file]]
                      * *********** *************** **************************
+
* [[:xdata:ID/XENTR_CHORI-219_ID.26.muscle|MUSCLE file]]
 +
 
 +
=== XENTR_CHORI-219_ID.27 (pitx-1) ===
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.27.fasta|FASTA file]]
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.27.muscle|MUSCLE file]]
 +
 
 +
=== XENTR_CHORI-219_ID.29 (pitx-1) ===
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.29.fasta|FASTA file]]
 +
* [[:xdata:ID/XENTR_CHORI-219_ID.29.muscle|MUSCLE file]]
 +
 
 +
== Selection procedure ==
 +
* Download ''X. tropicalis'' mRNA sequences from XenBase (Nov. 27, 2009 version).
 +
** [[:xdata:ID/XENTR_mRNA.xenbase20091127.fasta.gz]] 17 MB, gzipped.
 +
 
 +
* Download CHORI-219 sequences (from NCBI GenBank).
 +
** [[:xdata:ID/XENLA_CH219.fasta.gz]] 6.5 MB, gzipped. (CHORI-219 sequences. 29 BAC sequences from ''X. laeves'' genome)
 +
 
 +
* Run BLAT (version 3.4, with default option) to known CHORI BAC sequences.
 +
** [[:xdata:ID/XENTR_mRNA.XENLA_CH219.blat_pslx.gz]] 1.2 MB, gzipped.
 +
:<pre> blat XENLA_CH219.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENLA_CH219.blat_pslx -out=pslx</pre>
 +
 
 +
* Parse two BLAT output files with the following criteria.
 +
*# From ''X. tropicalis'' mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
 +
*# Select ''X. tropicalis'' mRNA sequences which hit both CHORI-219 (minimum match length is 200 bp to be called as a 'hit'). I only consider 10 CHORI-219 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
 +
*# Survey each hit blocks. If the hit block is less than 200 bp, discard it. 42 hit blocks from 8 mRNAs are selected.
 +
*#* NM_001004837 Unnamed, predicted gene MGC69309 [http://www.ncbi.nlm.nih.gov/nuccore/52345577 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=5903347 XenBase]
 +
*#* NM_001007499 paired-like homeodomain 1 (pitx-1) [http://www.ncbi.nlm.nih.gov/nuccore/55926079 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=485440 XenBase]
 +
*#* NM_001011405 Homeobox A5 (hoxa5) [http://www.ncbi.nlm.nih.gov/nuccore/58332665 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=486060 XebBase]
 +
*#* NM_001035121 CCAAT/enhancer binding protein (C/EBP), beta (cebpb) [http://www.ncbi.nlm.nih.gov/nuccore/78042600 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=479778 XenBase]
 +
*#* NM_001113032 LY6/PLAUR domain containing 6 (lypd6) [http://www.ncbi.nlm.nih.gov/nuccore/163915026 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=954350 XenBase]
 +
*#* NM_001127429 homeobox A3 (hoxa3) [http://www.ncbi.nlm.nih.gov/nuccore/188528950 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=482266 XenBase]
 +
*#* NM_001129937 SRY (sex determining region Y)-box 18 (sox18) [http://www.ncbi.nlm.nih.gov/nuccore/194018645 NCBI] [http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=483232 XenBase]
 +
*#* NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) [http://www.ncbi.nlm.nih.gov/nuccore/213983084 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=6001437 XenBase]
 +
 
 +
* Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. I found that most of candidate sequences have duplications. So I filtered them out based on 'Self' mapping information from MUSCLE output (if a sequence is mapped itself longer than 50 bp). Finally 10 fragments are selected.
 +
** hoxa3 - XENTR_CHORI-219_ID.24.muscle, XENTR_CHORI-219_ID.25.muscle, XENTR_CHORI-219_ID.26.muscle
 +
** pitx-1 - XENTR_CHORI-219_ID.27.muscle, XENTR_CHORI-219_ID.28.muscle, XENTR_CHORI-219_ID.29.muscle
 +
** hoxa5 - XENTR_CHORI-219_ID.2.muscle, XENTR_CHORI-219_ID.3.muscle
 +
:<pre> $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle </pre>
 +
 
 +
Here's the example of Self duplication.
 +
<pre>
 +
>XENTR_NM_001113032_18 gi|163915026|ref|NM_001113032|
 +
 
 +
  3 + ccaggggtacccagggcacaaataagcactcaccccaaatccccccctaactggccttcaggct 66
 +
      ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||
 +
138 + ccaggggtacccagggcacaaataagcactcaccccaaatctccccctaactggccttcaggct 201
 +
 
 +
67 + gggcccccttagcccataacaaggttacagatagttagaaacattggg 114
 +
      |||||||||||||||||||||||||||||||||  |||||||||||||
 +
202 + gggcccccttagcccataacaaggttacagatatatagaaacattggg 249
 
</pre>
 
</pre>
 +
 
----
 
----
 
[[Category:XenopusGenome]]
 
[[Category:XenopusGenome]]

Latest revision as of 16:43, 9 December 2009

Contents

Designed

XENTR_CHORI-219_ID.2 (hoxa5)

  • Only cut XENTR - PsrI(119, 151).
  • Only cut XENLA - None.
  • PCR primer (designed by Primer3)
OLIGO            start  len      tm     gc%   any    3' seq 
LEFT PRIMER         24   20   59.99   45.00  2.00  0.00 caagagccacaaatcaagca
RIGHT PRIMER       221   20   59.92   50.00  5.00  1.00 agatccatgccattgtagcc
XENLA_CH219-206K   1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
XENTR_NM_0010114   1 + gtgctatagacgcgcaaaacgaccaagagccacaaatcaagcacacatatcaaaaaacaaatga 64
                       ****************************************************************
                                              >>>>>>>>>>>>>>>>>>>>
XENLA_CH219-206K  65 + gctcttattttgtaaactcattttgcggtcgctatccaaatggcccggactaccagttacataa 128
XENTR_NM_0010114  65 + gctcttattttgtaaactcattttgcggtcgctatccaaatggcccggactaccagttacataa 128
                       ****************************************************************

XENLA_CH219-206K 129 + ttatggagatcagagctcggtgagcgagcaatacagggatacaacgagcatgcattccagcagg 192
XENTR_NM_0010114 129 + ttatggagatcacagctcggtgaacgagcaatacagggatacaacgagcatgcattccagcagg 192
                       ************ ********** ****************************************

XENLA_CH219-206K 193 + tacggatacggctacaatggcatggatctcagcgttgggcgctcagcttccaaccactttagtg 256
XENTR_NM_0010114 193 + tacggatacggctacaatggcatggatctcagcgttgggcgctcagcttccaaccactttagtg 256
                       ****************************************************************
                                <<<<<<<<<<<<<<<<<<<<
XENLA_CH219-206K 257 + ccaatgacagagcaaggaattatccatccaatccgagctcctctacagagccaaggtataacca 320
XENTR_NM_0010114 257 + ccaatgacagagcaaggaattatccatccaatccgagctcctctacagagccaaggtataacca 320
                       ****************************************************************

XENLA_CH219-206K 321 + acctgcagcctct 333(1)
XENTR_NM_0010114 321 + acctgcagcgtct 333(1)
                       ********* ***

XENTR_CHORI-219_ID.25 (hoxa3)

  • Only cut XENTR - BccI(136), BfiI(110), BglII(21), BstEII(140), Eco57I(72), Eco57MI(72), MboII(78), MmeI(115), PflMI(134), PleI(118), Tsp45I(99), TspDTI(59), TspRI(123,178)
  • Only cut XENLA - BciVi(136,167), BspMI(156), EcoP15I(193), HphI(136), MslI(105), NspI(157), SacI(44,122), SphI(157), TspGWI(157)
  • PCR primer
OLIGO            start  len      tm     gc%   any    3' seq 
LEFT PRIMER        161   20   60.21   50.00  7.00  2.00 cctcaccgagaggcaaatta
RIGHT PRIMER       351   20   59.68   60.00  5.00  3.00 gaggctcataggggacactg
XENLA_CH219-206K   1 + gtgagagctgtgcaggagacaaaagtcccccggggcaatcctcttccaaaagggcccgcactgc 64
XENTR_NM_0011274   1 + gtgagagttgtgctggagacaaaagccccccggggcaatcctcttccaagagggcccgcactgc 64
                       ******* ***** *********** *********************** **************

XENLA_CH219-206K  65 + ttacacaagcgctcagctggtagaactggaaaaagagttccactttaacagatacctgtgcaga 128
XENTR_NM_0011274  65 + ttacacaagcgctcagctggtagaactggaaaaagagttccactttaacagatacctgtgcaga 128
                       ****************************************************************

XENLA_CH219-206K 129 + cccaggagggtggagatggccaatctactcaacctcaccgagaggcaaattaagatctggtttc 192
XENTR_NM_0011274 129 + cccaggagggtggagatggccaatctgctcaacctcaccgagaggcaaattaagatctggtttc 192
                       ************************** *************************************
                                                       >>>>>>>>>>>>>>>>>>>>
XENLA_CH219-206K 193 + agaacaggcgaatgaaatacaaaaaggatcaaaaagggaaatccatgatgacctcttcaggagg 256
XENTR_NM_0011274 193 + agaacaggcgaatgaaatacaaaaaggatcaaaaagggaaatccatgatgacctcttcaggagg 256
                       ****************************************************************

XENLA_CH219-206K 257 + acagtcaccatgtaggagcccggtgccagctccatctgttggaggttacctaaactctatgcat 320
XENTR_NM_0011274 257 + gcagtcaccatgtaggagcccagtgccgactccatctgttggaggttacctaaactctatgcat 320
                        ******************** *****  ***********************************

XENLA_CH219-206K 321 + tctttggtaaacagtgtcccctatgagcctcagtctccccctg 363(1)
XENTR_NM_0011274 321 + tctttggtaaacagtgtcccctatgagcctcagtctcccccag 363(1)
                       ***************************************** *
                                  <<<<<<<<<<<<<<<<<<<<

XENTR_CHORI-219_ID.28 (pitx-1)

  • Only cut XENTR - SfaNI(46),TauI(52)
  • Only cut XENLA - BciVI(78),FauI(42),Hpy99I(72)
  • PCR primer
OLIGO            start  len      tm     gc%   any    3' seq 
LEFT PRIMER         42   20   60.12   55.00  5.00  2.00 gaaccagcagatggacctgt
RIGHT PRIMER       194   20   59.84   50.00  4.00  0.00 aaggtgaagctcttggtgga
XENLA_CH219-166K   1 + tggttcaagaaccgccgagccaagtggaggaagagggagcggaaccagcagatggacctgtgta 64
XENTR_NM_0010074   1 + tggttcaagaaccgcagagccaagtggaggaagagggagcggaaccagcagatggacctgtgca 64
                       *************** ********************************************** *
                                                               >>>>>>>>>>>>>>>>>>>
XENLA_CH219-166K  65 + agaatggttacgtgccccagttcagcgggctcatgcagccgtacgacgagatgtacgcaggata 128
XENTR_NM_0010074  65 + agaatggctacgtgccccagttcagcggcctgatgcagccctacgatgagatgtacgctggcta 128
                       ******* ******************** ** ******** ***** *********** ** **

XENLA_CH219-166K 129 + cccctacaacaactgggccacaaaaagcctcacccctgcccccctgtccaccaagagcttcacc 192
XENTR_NM_0010074 129 + cccgtacaacaactgggccaccaaaagcctcacccctgcccccctgtccaccaagagcttcacc 192
                       *** ***************** ******************************************
                                                                     <<<<<<<<<<<<<<<<<<
XENLA_CH219-166K 193 + ttcttcaactccatgagtcccttgtcttcccagtccatgttctccggccccagctccatctctt 256
XENTR_NM_0010074 193 + ttcttcaactccatgagtccgctgtcctcccagtccatgttctctggccccagctccatctctt 256
                       ********************  **** ***************** *******************
                       <<
XENLA_CH219-166K 257 + ccatgagcatgccctccagcatgggtcactctgcggtgccaggcatggccaactc 311(1)
XENTR_NM_0010074 257 + ccatgagcatgccctccagcatgggccactcggcggtgcccggcatgcccaactc 311(1)
                       ************************* ***** ******** ****** *******

Candidates

XENTR_CHORI-219_ID.3 (hoxa5)

XENTR_CHORI-219_ID.24 (hoxa3)

XENTR_CHORI-219_ID.26 (hoxa3)

XENTR_CHORI-219_ID.27 (pitx-1)

XENTR_CHORI-219_ID.29 (pitx-1)

Selection procedure

  • Download CHORI-219 sequences (from NCBI GenBank).
 blat XENLA_CH219.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENLA_CH219.blat_pslx -out=pslx
  • Parse two BLAT output files with the following criteria.
    1. From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
    2. Select X. tropicalis mRNA sequences which hit both CHORI-219 (minimum match length is 200 bp to be called as a 'hit'). I only consider 10 CHORI-219 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
    3. Survey each hit blocks. If the hit block is less than 200 bp, discard it. 42 hit blocks from 8 mRNAs are selected.
      • NM_001004837 Unnamed, predicted gene MGC69309 NCBIXenBase
      • NM_001007499 paired-like homeodomain 1 (pitx-1) NCBIXenBase
      • NM_001011405 Homeobox A5 (hoxa5) NCBIXebBase
      • NM_001035121 CCAAT/enhancer binding protein (C/EBP), beta (cebpb) NCBIXenBase
      • NM_001113032 LY6/PLAUR domain containing 6 (lypd6) NCBIXenBase
      • NM_001127429 homeobox A3 (hoxa3) NCBIXenBase
      • NM_001129937 SRY (sex determining region Y)-box 18 (sox18) NCBI XenBase
      • NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) NCBIXenBase
  • Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. I found that most of candidate sequences have duplications. So I filtered them out based on 'Self' mapping information from MUSCLE output (if a sequence is mapped itself longer than 50 bp). Finally 10 fragments are selected.
    • hoxa3 - XENTR_CHORI-219_ID.24.muscle, XENTR_CHORI-219_ID.25.muscle, XENTR_CHORI-219_ID.26.muscle
    • pitx-1 - XENTR_CHORI-219_ID.27.muscle, XENTR_CHORI-219_ID.28.muscle, XENTR_CHORI-219_ID.29.muscle
    • hoxa5 - XENTR_CHORI-219_ID.2.muscle, XENTR_CHORI-219_ID.3.muscle
 $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle 

Here's the example of Self duplication.

>XENTR_NM_001113032_18 gi|163915026|ref|NM_001113032|

  3 + ccaggggtacccagggcacaaataagcactcaccccaaatccccccctaactggccttcaggct 66
      ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||
138 + ccaggggtacccagggcacaaataagcactcaccccaaatctccccctaactggccttcaggct 201

 67 + gggcccccttagcccataacaaggttacagatagttagaaacattggg 114
      |||||||||||||||||||||||||||||||||  |||||||||||||
202 + gggcccccttagcccataacaaggttacagatatatagaaacattggg 249