Difference between revisions of "Texas Xenopus Genome Project/Species Identification"

From Marcotte Lab
Jump to: navigation, search
Line 28: Line 28:
 
*#* NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) [http://www.ncbi.nlm.nih.gov/nuccore/213983084 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=6001437 XenBase]
 
*#* NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) [http://www.ncbi.nlm.nih.gov/nuccore/213983084 NCBI][http://www.xenbase.org/gene/showgene.do?method=displayGeneSummary&geneId=6001437 XenBase]
  
 
+
* Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. I found that most of candidate sequences have duplications. So I filtered them out based on 'Self' mapping information from MUSCLE output (if a sequence is mapped itself longer than 50 bp). Here's the example of Self replication.
 
+
** hoxa3 - XENTR_CHORI-219_ID.24.muscle, XENTR_CHORI-219_ID.25.muscle, XENTR_CHORI-219_ID.26.muscle
 
+
** pitx-1 - XENTR_CHORI-219_ID.27.muscle, XENTR_CHORI-219_ID.28.muscle, XENTR_CHORI-219_ID.29.muscle
* Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. Interestingly, sequence from CHORI-216 is somewhat different, compared to both XENTR_mRNA and CHORI-219 fragment.  
+
** hoxa5 - XENTR_CHORI-219_ID.2.muscle, XENTR_CHORI-219_ID.3.muscle
 
<pre> $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle </pre>
 
<pre> $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle </pre>
 
<pre>
 
<pre>
XENLA_CH219-20I1  1 + ttattt----------------------gtgccctggatacccctggaactatagcagggtgac 42
+
>XENTR_NM_001113032_18 gi|163915026|ref|NM_001113032|
XENTR_NM_0011422  1 + ttattt----------------------gtgccctgggtacccctggaactatagcggggtgac 42
+
XENTR_CH216-2E23  1 + tcaccccaaatccccccctaactggccttcaggctgggcccccttag-ctcataacaaggttac 63
+
                      *.*...                      .....****...***.*.**...***.*..***.**
+
 
+
XENLA_CH219-20I1  43 + tgttaccccaatgtttctatatatctgtaaccttgttattagct-aagggggcccagtctgaag 105
+
XENTR_NM_0011422  43 + tgttaccccaatgtttctatatatctgtaaccttgttatgggct-aagggggcccagcctgaag 105
+
XENTR_CH216-2E23  64 + agatatatagaaacattggggtaacagtcaccccgctatagttccaggggtacccagggc---- 123
+
                      .*.**.....*....*.....**.*.**.***..*.*** .... *.***..***** ..****
+
 
+
XENLA_CH219-20I1 106 + gtcagttagggggagatttggggtgagggcttatttg-----taccctgggtacccctggaact 164
+
XENTR_NM_0011422 106 + gccagttagggggggatttggggtgagtgcttatttg-----tgccctgggtacccctggaact 164
+
XENTR_CH216-2E23 124 + -acaaataagcactcaccccaaatcatcccctaactggccttcaggctgggcccc-cttagccc 185
+
                      * **..**.*... .*.......*.*. .*.**..**    ....*****..*****....*.
+
 
+
XENLA_CH219-20I1 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgagctaa-gggg 227
+
XENTR_NM_0011422 165 + atagcagggtgactgttaccccaatgtttctatatatctgtaaccttgttatgggctaa-gggg 227
+
XENTR_CH216-2E23 186 + ataacaaggttacagatatatagaaacattggggtaacagtcaccccgctatagttccaggggt 249
+
                      ***.**.***.**.*.**.....*....*.....**.*.**.***..*.***......* ***.
+
 
+
XENLA_CH219-20I1 228 + gcccagtctgaaggccagttagggggagatatggggtgagtgtttatttgtgccctggttaccc 291
+
XENTR_NM_0011422 228 + gcccagcctgaaggccagttagggggggatttggggtgagtgcttatttgtgccctgggtaccc 291
+
XENTR_CH216-2E23 250 + acccagggca---------------caaataagcact----------------------caccc 276
+
                      .***** ...***************...**..*...****** *************** .****
+
 
+
XENLA_CH219-20I1 292 + ctggaactatagcagggtgac 312(341)
+
XENTR_NM_0011422 292 + ctggaactatagcagggtgac 312(341)
+
XENTR_CH216-2E23 277 + c---------------aaatc 282(341)
+
                      ****************....*
+
</pre>
+
 
+
* Run MUSCLE again, only with XENTR_mRNA and CHORI-219 sequence.
+
<pre>
+
XENLA_CH219-20I1  1 + ttatttgtgccctggatacccctggaactatagcagggtgactgttaccccaatgtttctatat 64
+
XENTR_NM_0011422  1 + ttatttgtgccctgggtacccctggaactatagcggggtgactgttaccccaatgtttctatat 64
+
                      *************** ****************** *****************************
+
 
+
XENLA_CH219-20I1  65 + atctgtaaccttgttattagctaagggggcccagtctgaaggtcagttagggggagatttgggg 128
+
XENTR_NM_0011422  65 + atctgtaaccttgttatgggctaagggggcccagcctgaaggccagttagggggggatttgggg 128
+
                      *****************  *************** ******* *********** *********
+
 
+
XENLA_CH219-20I1 129 + tgagggcttatttgtaccctgggtacccctggaactatagcagggtgactgttaccccaatgtt 192
+
XENTR_NM_0011422 129 + tgagtgcttatttgtgccctgggtacccctggaactatagcagggtgactgttaccccaatgtt 192
+
                      **** ********** ************************************************
+
  
XENLA_CH219-20I1 193 + tctatatatctgtaaccttgttatgagctaagggggcccagtctgaaggccagttagggggaga 256
+
  3 + ccaggggtacccagggcacaaataagcactcaccccaaatccccccctaactggccttcaggct 66
XENTR_NM_0011422 193 + tctatatatctgtaaccttgttatgggctaagggggcccagcctgaaggccagttaggggggga 256
+
      ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||
                      ************************* *************** ******************* **
+
138 + ccaggggtacccagggcacaaataagcactcaccccaaatctccccctaactggccttcaggct 201
  
XENLA_CH219-20I1 257 + tatggggtgagtgtttatttgtgccctggttacccctggaactatagcagggtgac 312(1)
+
67 + gggcccccttagcccataacaaggttacagatagttagaaacattggg 114
XENTR_NM_0011422 257 + tttggggtgagtgcttatttgtgccctgggtacccctggaactatagcagggtgac 312(1)
+
      |||||||||||||||||||||||||||||||||  |||||||||||||
                      * *********** *************** **************************
+
202 + gggcccccttagcccataacaaggttacagatatatagaaacattggg 249
 
</pre>
 
</pre>
* However, it turns out that they are highly repetitive (~ 135 bp unit). See the 1st, 3rd and 5th line (or the 2nd and 4th line) in each sequences.
 
  
 
----
 
----
 
[[Category:XenopusGenome]]
 
[[Category:XenopusGenome]]

Revision as of 14:40, 9 December 2009

Target gene

Selection procedure

  • Download CHORI-219 sequences (from NCBI GenBank).
 blat XENLA_CH219.fasta XENTR_mRNA.xenbase20091127.fasta XENTR_mRNA.XENLA_CH219.blat_pslx -out=pslx
  • Parse two BLAT output files with the following criteria.
    1. From X. tropicalis mRNA, only RefSeq (starts sith 'NM_') sequences are considered.
    2. Select X. tropicalis mRNA sequences which hit both CHORI-219 (minimum match length is 200 bp to be called as a 'hit'). I only consider 10 CHORI-219 BACs which we already knew that they are available ('74I8','204L9','197E3','71P23','36I4','35I18','262A22','20I13','206K7','166K18').
    3. Survey each hit blocks. If the hit block is less than 200 bp, discard it. 42 hit blocks from 8 mRNAs are selected.
      • NM_001004837 Unnamed, predicted gene MGC69309 NCBIXenBase
      • NM_001007499 paired-like homeodomain 1 (pitx-1) NCBIXenBase
      • NM_001011405 Homeobox A5 (hoxa5) NCBIXebBase
      • NM_001035121 CCAAT/enhancer binding protein (C/EBP), beta (cebpb) NCBIXenBase
      • NM_001113032 LY6/PLAUR domain containing 6 (lypd6) NCBIXenBase
      • NM_001127429 homeobox A3 (hoxa3) NCBIXenBase
      • NM_001129937 SRY (sex determining region Y)-box 18 (sox18) NCBI XenBase
      • NM_001142220 phosphoribosylformylglycinamidine synthase (pfas) NCBIXenBase
  • Run MUSCLE (version 4.0, with default option) for multiple sequence alignment. I found that most of candidate sequences have duplications. So I filtered them out based on 'Self' mapping information from MUSCLE output (if a sequence is mapped itself longer than 50 bp). Here's the example of Self replication.
    • hoxa3 - XENTR_CHORI-219_ID.24.muscle, XENTR_CHORI-219_ID.25.muscle, XENTR_CHORI-219_ID.26.muscle
    • pitx-1 - XENTR_CHORI-219_ID.27.muscle, XENTR_CHORI-219_ID.28.muscle, XENTR_CHORI-219_ID.29.muscle
    • hoxa5 - XENTR_CHORI-219_ID.2.muscle, XENTR_CHORI-219_ID.3.muscle
 $ mus4 -i XENTR_CHORI.fasta -o XENTR_CHORI.muscle 
>XENTR_NM_001113032_18 gi|163915026|ref|NM_001113032|

  3 + ccaggggtacccagggcacaaataagcactcaccccaaatccccccctaactggccttcaggct 66
      ||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||
138 + ccaggggtacccagggcacaaataagcactcaccccaaatctccccctaactggccttcaggct 201

 67 + gggcccccttagcccataacaaggttacagatagttagaaacattggg 114
      |||||||||||||||||||||||||||||||||  |||||||||||||
202 + gggcccccttagcccataacaaggttacagatatatagaaacattggg 249