Difference between revisions of "TXGP ens63 reference"

From Marcotte Lab
Jump to: navigation, search
Line 3: Line 3:
  
 
= Genes & Transcripts =
 
= Genes & Transcripts =
http://www.marcottelab.org/users/XenopusData/ens63/ens63_gene_tx.small.png
 
 
 
This figure shows total number of genes and transcripts in each organisms. The number on top of green bar means total number of transcripts, and the number on top of blue bar means total number of genes (based on EnsEMBL v.63 annotation). The number on top of cyan bar means the number of genes that contain only one transcript.  
 
This figure shows total number of genes and transcripts in each organisms. The number on top of green bar means total number of transcripts, and the number on top of blue bar means total number of genes (based on EnsEMBL v.63 annotation). The number on top of cyan bar means the number of genes that contain only one transcript.  
 +
 +
http://www.marcottelab.org/users/XenopusData/ens63/ens63_gene_tx.small.png
  
 
= Clustering of transcripts =
 
= Clustering of transcripts =

Revision as of 10:33, 13 October 2011

Overview

One of the most interesting questions we can ask with X. laevis genome would be how many genes it has. To construct gene models, we are mainly focusing on de novo transcriptome assembly approach with our RNA-seq data. However, de novo transcriptome assembly programs generate many 'false positive' transcripts. Also, because of allotetraploidy in X. laevis, transcriptome data may contain many transcript variants for each gene. So, to estimate the gene model from transcriptome data precisely, we would like to combine all transcripts candidates foe each gene together, and analyze them separately. Sequence-based clustering is natural way to do this, but we need to optimize parameters, such as %identity to define a cluster. To get some ideas for this, we have looked at genes and transcripts of several well-studied organisms.

Genes & Transcripts

This figure shows total number of genes and transcripts in each organisms. The number on top of green bar means total number of transcripts, and the number on top of blue bar means total number of genes (based on EnsEMBL v.63 annotation). The number on top of cyan bar means the number of genes that contain only one transcript.

ens63_gene_tx.small.png

Clustering of transcripts

ens63.gene_vs_clusters.uc090.small.png ens63.gene_vs_clusters.uc080.small.png ens63.gene_vs_clusters.uc070.small.png ens63.gene_vs_clusters.uc060.small.png