CSHL ComputationalGenomics 2013
CSHL Computational Genomics - Edward Marcotte - November 2013
This is the supporting web page for the interactive portion of the November 8, 2013 gene network lectures (full schedule). The tools linked here allow you to visualize and explore gene networks, and to use gene networks to form hypotheses about new genes in biological processes of interest.
A few papers relevant to the first lecture:
- A probabilistic view of gene function, by Fraser & Marcotte
- It's the machine that matters: predicting gene function and phenotype from protein networks, by Wang & Marcotte
- How to visually interpret biological data using networks, by Merico, Gfeller & Bader
- Systematic discovery of nonobvious human disease models through orthologous phenotypes, by McGary, Park, Woods, Cha, Wallingford, & Marcotte
A few supporting web links:
- Metabolic networks: The wall chart, the current state of the human metabolic reaction network, and older but still relevant review of transcriptional networks (with the current record holder in this regard held by ENCODE), and an early review of protein interaction extent and quality whose lessons still hold.
- Functional networks for many organisms have been pre-calculated and can be searched interactively for genes of interest. As an exercise, pick a gene or group of genes related to a pathway or disease in which you are currently interested. (Note that most web tools suggest a default set of genes to get you started if you prefer to use those instead.) Then, explore the web-based gene networks to predict new candidate genes for this pathway. The links below take you to several available web-based network tools.
- FunctionalNet, which links to human, worm, Arabidopsis, mouse and yeast gene networks. Not the prettiest web site, but useful, and helped my own group find genes for a wide variety of biological processes. A newer version of the yeast network has just been made available here. The key with each of these tools is to examine the cross-validated predictive performance in the ROC curve. This assesses how well the network captures your biological process of interest: The program withholds each of your input genes in turn and tries to predict it back based on network connectivity to the remaining genes. Poor performance on recovering the hidden, but known, genes means that the network is also unlikely to correctly predict useful new candidate genes. An area under a ROC curve (AUC) greater than 0.7 is worth paying attention to.
- STRING is available for many organisms, including large numbers of prokaryotes. Useful links to supporting datasets. Start a STRING search with a gene of interest, and interactively walk through the network. Also quite useful for comparing gene synteny in prokaryotic chromosomes centered on orthologs of a conserved gene of interest .
- GeneMania, which aggregates many individual gene networks.
- MouseFunc, a collection of network and classifier-based predictions of gene function from an open contest to predict gene function in the mouse.
- Network visualization
- By far the best interactive tool for network visualization is Cytoscape. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own tools (e.g., as used in YeastNet).
- Given a starting set of genes linked to some trait or biological process of interest, phenologs can be used to suggest additional candidate genes by leveraging phenotypic data in other organisms. You can search for phenologs here. I suggest getting starting by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to a feel for how this works.
- Alternatively, try searching for a phenotype of interest using the "phenotype finder". For example, try reconstructing the yeast model of angiogenesis by searching mouse phenotypes for "angiogenesis", then predicting phenologs for "abnormal angiogenesis".
- Phenologs rely, as do the network approaches linked above, on inferences from observations across organisms, and thus on proper calculation of gene orthologs. In general, these approaches benefit from orthology calculations that provide larger recall, even if precision suffers a bit. One good tool for discovering orthologs is InParanoid. As an exercise, try hunting for the ortholog of a human gene (perhaps MYRF, to pick a recent favorite of mine) in a distant species (e.g., the nematode C. elegans). Hint: InParanoid annotation lags a bit, so you'll need to find the Ensembl protein id, or try a text search for the common name (e.g., at that time MYRF was still called C11orf9). In spite of the lag, it's a very useful program, and can be downloaded/run locally on any genomes of interest.