File Formats
Data files should be in text format, with all data fields separated by
tabs. (The program is sensitive to white space or tabs at the ends of lines,
so be sure these are not present.) The file format
is essentially that output from Microsoft Excel or other spreadsheets
when you "Save" to text (tab-delimited) format. The first column of the data is
reserved for gene names, and the first line is reserved for basis set or experiment names.
The gene names (or numbers, etc.) must be present and must be unique to each gene,
and must match between the basis sets and experiments to be analyzed, as the
names are used to index the comparison between the data sets. By comparing data sets based
upon the gene names, the program tolerates missing genes (e.g. you can fit entire data
sets with basis sets derived from subsets of genes), and the gene names can be listed
in different order in the basis set and experiment files. The basis or experiment
set names don't have to be unique--these are used only to help interpret the output. For example:
The format should be compatible with files formatted for the program Cluster (M. Eisen), provided they
don't have the optional column after the gene names or the optional row after the experiment names. To
indicate missing data, enter -999 instead of the gene expression level.
Back