All of these files are derived from XenBase (downloaded on May, 01, 2011).
Version 1. RefSeq of cDNA & protein
XENLA_cDNA_ref.v1.fasta (8,879 sequences)
- Used XenBase files: NcbiMrnaXenbaseGene_laevis.txt, xlaevisMRNA.fasta
XENLA_prot_ref.v1.fasta (8,878 sequences; 'taf5' is not annotated as RefSeq in protein, although its corresponding mRNA sequence is annotated as RefSeq.)
- Used XenBase files: NcbiProteinXenbaseGene_laevis.txt, xlaevisProtein.fasta
- Read gene name for each NCBI id from 'Ncbi...' file. Filter out genes with 'unnamed' in gene name field.
- Read all sequences from '.fasta' file. Convert all sequence character to upper case.
- If I find a sequence with '>gi|<gi number>|ref|<genbank accession>' header (means it is RefSeq entity), write it down.