File Formats

Data files should be in text format, with all data fields separated by tabs. (The program is sensitive to white space or tabs at the ends of lines, so be sure these are not present.) The file format is essentially that output from Microsoft Excel or other spreadsheets when you "Save" to text (tab-delimited) format. The first column of the data is reserved for gene names, and the first line is reserved for basis set or experiment names. The gene names (or numbers, etc.) must be present and must be unique to each gene, and must match between the basis sets and experiments to be analyzed, as the names are used to index the comparison between the data sets. By comparing data sets based upon the gene names, the program tolerates missing genes (e.g. you can fit entire data sets with basis sets derived from subsets of genes), and the gene names can be listed in different order in the basis set and experiment files. The basis or experiment set names don't have to be unique--these are used only to help interpret the output. For example:
The format should be compatible with files formatted for the program Cluster (M. Eisen), provided they don't have the optional column after the gene names or the optional row after the experiment names. To indicate missing data, enter -999 instead of the gene expression level.

Back