CSHL YeastGenetics 2016
CSHL Yeast Genetics - Edward Marcotte - August 2016
Supporting web pages for the August 8, 2016 phenologs and humanizing yeast lectures. The tools linked here allow you to form hypotheses about new genes in biological processes of interest.
- The Saccharomyces Genome Database has set up a nice site (YeastMine) collating all of the yeast-human gene complementation studies. Search here for the set of human genes complementing or complemented by yeast genes. (Note that in our paper on humanizing yeast, we provide our complementation data for each strain/promoter/conditional allele tested.) SGD also maintains a list of Human genes with yeast homologs.
- Given a starting set of genes linked to some trait or biological process of interest, phenologs can be used to suggest additional candidate genes by leveraging phenotypic data in other organisms. You can search for phenologs here. I suggest getting starting by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to a feel for how this works.
- Alternatively, try searching for a phenotype of interest using the "phenotype finder". For example, try reconstructing the yeast model of angiogenesis by searching mouse phenotypes for "angiogenesis", then predicting phenologs for "abnormal angiogenesis".
- Phenologs rely, as do the network approaches linked below, on inferences from observations across organisms, and thus on proper calculation of gene orthologs. In general, these approaches benefit from orthology calculations that provide larger recall, even if precision suffers a bit. One good tool for discovering orthologs is InParanoid. As an exercise, try hunting for the ortholog of a human gene (perhaps MYRF, to pick a recent favorite of mine) in a distant species (e.g., the nematode C. elegans). Hint: InParanoid annotation lags a bit, so you'll need to find the Ensembl protein id, or try a text search for the common name (e.g., at that time MYRF was still called C11orf9). In spite of the lag, it's a very useful program, and can be downloaded/run locally on any genomes of interest.
Some other useful web sites for yeast (and other species) gene function prediction/discovery
- Functional networks for many organisms, including yeast, have been pre-calculated and can be searched interactively for genes of interest. To try these out, pick a gene or group of genes related to a pathway or disease in which you are currently interested. (Note that most web tools suggest a default set of genes to get you started if you prefer to use those instead.) Then, explore the web-based gene networks to predict new candidate genes for this pathway. The links below take you to several available web-based network tools.
- FunctionalNet, which links to human, worm, Arabidopsis, mouse and yeast gene networks. Not the prettiest web site, but useful, and helped my own group find genes for a wide variety of biological processes. A newer version of the yeast network is available here. The key with each of these tools is to examine the cross-validated predictive performance in the ROC curve. This assesses how well the network captures your biological process of interest: The program withholds each of your input genes in turn and tries to predict it back based on network connectivity to the remaining genes. Poor performance on recovering the hidden, but known, genes means that the network is also unlikely to correctly predict useful new candidate genes. An area under a ROC curve (AUC) greater than 0.7 is worth paying attention to.
- STRING is available for many organisms, including many other fungi. Useful links to supporting datasets. Start a STRING search with a gene of interest, and interactively walk through the network.
- GeneMania, which aggregates many individual gene networks.
- Quite a few large-scale protein interaction mapping datasets are increasingly becoming available, and can be useful building hypotheses about protein function based on the interactions observed. BioGRID is an excellent site to get you started exploring these data. Our own large-scale maps of metazoan protein complexes derived from ~6,000 co-fractionation/mass spectrometry experiments are available here. A projection of the data into yeast (and many other species) can be found under the PPI projections tab. Also, limited fractionation data for yeast are available in the download tab.
- By far the best interactive tool for network visualization is Cytoscape. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own tools (e.g., as used in YeastNet).