http://www.marcottelab.org/api.php?action=feedcontributions&user=Marcotte&feedformat=atomMarcotte Lab - User contributions [en]2024-03-28T22:50:09ZUser contributionsMediaWiki 1.21.2http://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-27T23:37:41Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<br />
<br />
'''Mar 28, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
<br />
<br />
<br />
'''Mar 26, 2024 - Classifiers'''<br />
* Science news of the day: [https://www.nytimes.com/2024/03/21/health/pig-kidney-organ-transplant.html Surgeons Transplant Pig Kidney Into a Patient A Medical Milestone] ([http://www.marcottelab.org/users/BCH394P_364C_2024/SurgeonsTransplantPigKidneyIntoaPatientAMedicalMilestone-TheNewYorkTimes.pdf pdf version])<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
<br />
<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP], and [https://pair-code.github.io/understanding-umap/ an intuitive explanation of the methods]. BUT: [https://twitter.com/lpachter/status/1431325969411821572?lang=en here's an X thread you should read] with strong criticisms and very compelling reasons against relying exclusively on these methods for drawing conclusions about your data.<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins] (Note:a few of these proteins have "U" amino acids, which indicates selenocysteine. You can count it or ignore it, your choice.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-25T21:23:31Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<br />
'''Apr 2, 2024 - Classifiers'''<br />
* Science news of the day: [https://www.nytimes.com/2024/03/21/health/pig-kidney-organ-transplant.html Surgeons Transplant Pig Kidney Into a Patient A Medical Milestone] ([http://www.marcottelab.org/users/BCH394P_364C_2024/SurgeonsTransplantPigKidneyIntoaPatientAMedicalMilestone-TheNewYorkTimes.pdf pdf version])<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
<br />
<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP], and [https://pair-code.github.io/understanding-umap/ an intuitive explanation of the methods]. BUT: [https://twitter.com/lpachter/status/1431325969411821572?lang=en here's an X thread you should read] with strong criticisms and very compelling reasons against relying exclusively on these methods for drawing conclusions about your data.<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins] (Note:a few of these proteins have "U" amino acids, which indicates selenocysteine. You can count it or ignore it, your choice.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-25T21:17:01Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP], and [https://pair-code.github.io/understanding-umap/ an intuitive explanation of the methods]. BUT: [https://twitter.com/lpachter/status/1431325969411821572?lang=en here's an X thread you should read] with strong criticisms and very compelling reasons against relying exclusively on these methods for drawing conclusions about your data.<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins] (Note:a few of these proteins have "U" amino acids, which indicates selenocysteine. You can count it or ignore it, your choice.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-21T14:46:44Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP], and [https://pair-code.github.io/understanding-umap/ an intuitive explanation of the methods]. BUT: [https://twitter.com/lpachter/status/1431325969411821572?lang=en here's an X thread you should read] with strong criticisms and very compelling reasons against relying exclusively on these methods for drawing conclusions about your data.<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-21T14:45:52Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP], and [https://pair-code.github.io/understanding-umap/ an intuitive explanation of the methods]. BUT: [https://twitter.com/lpachter/status/1431325969411821572?lang=en here's an X thread you should read] with very strong criticisms against relying exclusively on tSNE or UMAP<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-21T14:33:12Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]. BUT: [https://twitter.com/lpachter/status/1431325969411821572?lang=en here's an X thread you should read] with very strong criticisms against relying exclusively on tSNE or UMAP<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-20T20:36:23Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
--><br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-03-20T16:01:39Z<p>Marcotte: /* 2024 */</p>
<hr />
<div>== 2024 ==<br />
<ol><br />
<li value="250"> {{Paper<br />
|title=What can recent methodological advances help us understand about protein and genome evolution?<br />
|authors=Orengo C, Ehrenreich IM, Marcotte EM, Kolodny R, Ben-Tal N, de Boer CG, McWhite CD, Ranganathan R, Honig B, Bromberg Y, Thornton JW<br />
|journal=Cell Systems<br />
|pub_year=2024<br />
|volume=15(3)<br />
|page=205-210<br />
|link=https://www.sciencedirect.com/science/article/pii/S2405471224000607?dgcid=coauthor<br />
|pubmed=<br />
|pdf=CellSystems_Voices_2024.pdf<br />
}} <br />
<li value="249"> {{Paper<br />
|title=DeepSLICEM: Clustering CryoEM particles using deep image and similarity graph representations<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2024<br />
|volume=Deposited Feb 8<br />
|page=<br />
|link=https://doi.org/10.1101/2024.02.04.578778 <br />
|pubmed=38370702<br />
}} <br />
<li value="248"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2024<br />
|volume=Jan 3<br />
|page=mbcE23030084<br />
|link=https://doi.org/10.1091/mbc.E23-03-0084<br />
|comment=[https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 bioRxiv preprint] (deposited Mar 9, 2023)<br />
|pubmed=38170584<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2023 ==<br />
<ol><br />
<li value="247"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
|pdf=CommunicationsBiology_OmicronAntibody_2023.pdf<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="245"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=jkad293<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="243"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="242"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|pdf=NatureCommunications_NDRC_Structure_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="240"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="238"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|pdf=FrontiersInPlantScience_ProteinAggregation_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
|pdf=PLoSComputationalBiology_Whatprot_2023.pdf<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
|pdf=eLife_IFTAStructure_2023.pdf<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|pdf=CellReportsMethods_MERGE_2023.pdf<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022) [http://www.marcottelab.org/paper-pdfs/CellReportsMethods_MERGE_2023_Supplement.pdf Supplement]<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnRevCellDevBiol_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-19T13:37:54Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
<br />
<br />
--><br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* & the final problem set of the semester: [http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-18T21:27:42Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* I'm also posting the final problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
'''Mar 18, 2024'''<br />
* For those of you struggling with the Rosalind New Motif Discovery problem because of Meme taking too long, you can paste the input sequences + meme output into a single file and submit that through Canvas, and we'll give you credit for it.<br />
<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-18T01:33:02Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
* I'm also posting the last problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 27, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-18T01:02:45Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
<br />
<br />
<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-18T00:51:37Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
--><br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-03-07T22:58:13Z<p>Marcotte: </p>
<hr />
<div>== 2024 ==<br />
<ol><br />
<li value="249"> {{Paper<br />
|title=DeepSLICEM: Clustering CryoEM particles using deep image and similarity graph representations<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2024<br />
|volume=Deposited Feb 8<br />
|page=<br />
|link=https://doi.org/10.1101/2024.02.04.578778 <br />
|pubmed=38370702<br />
}} <br />
<li value="248"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2024<br />
|volume=Jan 3<br />
|page=mbcE23030084<br />
|link=https://doi.org/10.1091/mbc.E23-03-0084<br />
|comment=[https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 bioRxiv preprint] (deposited Mar 9, 2023)<br />
|pubmed=38170584<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2023 ==<br />
<ol><br />
<li value="247"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
|pdf=CommunicationsBiology_OmicronAntibody_2023.pdf<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="245"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=jkad293<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="243"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="242"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|pdf=NatureCommunications_NDRC_Structure_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="240"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="238"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|pdf=FrontiersInPlantScience_ProteinAggregation_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
|pdf=PLoSComputationalBiology_Whatprot_2023.pdf<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
|pdf=eLife_IFTAStructure_2023.pdf<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|pdf=CellReportsMethods_MERGE_2023.pdf<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022) [http://www.marcottelab.org/paper-pdfs/CellReportsMethods_MERGE_2023_Supplement.pdf Supplement]<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnRevCellDevBiol_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-03-07T22:57:48Z<p>Marcotte: /* 2023 */</p>
<hr />
<div>== 2024 ==<br />
<ol><br />
<li value="249"> {{Paper<br />
|title=DeepSLICEM: Clustering CryoEM particles using deep image and similarity graph representations<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2024<br />
|volume=Deposited Feb 8<br />
|page=<br />
|link=https://doi.org/10.1101/2024.02.04.578778 <br />
|pubmed=38370702<br />
}} <br />
<li value="248"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2024<br />
|volume=Jan 3<br />
|page=mbcE23030084<br />
|link=https://doi.org/10.1091/mbc.E23-03-0084<br />
|comment=[https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 bioRxiv preprint] (deposited Mar 9, 2023)<br />
|pubmed=38170584<br />
}}<br />
</li><br />
</ol><br />
== 2023 ==<br />
<ol><br />
<li value="247"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
|pdf=CommunicationsBiology_OmicronAntibody_2023.pdf<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="245"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=jkad293<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="243"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="242"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|pdf=NatureCommunications_NDRC_Structure_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="240"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="238"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|pdf=FrontiersInPlantScience_ProteinAggregation_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
|pdf=PLoSComputationalBiology_Whatprot_2023.pdf<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
|pdf=eLife_IFTAStructure_2023.pdf<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|pdf=CellReportsMethods_MERGE_2023.pdf<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022) [http://www.marcottelab.org/paper-pdfs/CellReportsMethods_MERGE_2023_Supplement.pdf Supplement]<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnRevCellDevBiol_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-07T14:49:08Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
--><br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-05T15:56:02Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
--><br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-05T15:26:49Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
--><br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* '''Due March 18 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? We'll spend a few minutes at the start of class asking around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/view/subcellularloc/projects 4] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 5] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 6] [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 7] [https://sites.google.com/view/bch-364c-final-project/home?authuser=0 8] [https://metabolicnetworkpathways.wordpress.com/ 9] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 10]. Remember that the project itself will ultimately be due one month later on April 17 (& late days can't be used for the final project.)<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-05T15:06:09Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
--><br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-04T20:32:46Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
* k-mer-based RNA quantification offers [https://www.nature.com/articles/nbt.3519 near-optimal probabilistic RNA-seq quantification]. Here's [https://bioinfo.iric.ca/understanding-how-kallisto-works/ how the program kallisto works]<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-04T20:28:21Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
* The importance of getting mapping correct: [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7500457/ Prominent analyses of cancer microbiomes] may suffer from [https://www.biorxiv.org/content/10.1101/2023.07.28.550993v1 "major, fatal errors in the data and methods"]<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-04T20:08:45Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 18'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Regarding the difficulties finding short genes: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* Science news of the day: [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-03-04T19:47:46Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 6'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT. Here, she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
<!--<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-26T21:45:54Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT, where she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 6'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
<br />
--><br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-22T17:54:42Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT, where she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 6'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
'''Feb 22, 2024 - Hot off the presses update!<br />
* I was poking around in recent literature after class and ran across the following [https://www.biorxiv.org/content/10.1101/2024.01.12.574168v2.full bioRxiv preprint] (posted 3 days ago!) bench-marking the major motif-finding algorithms. They particularly recommended DEME, Opal, and SLiMFinder. DEME and Opal seem a bit harder to access, but SLiMFinder can be run through a [http://www.slimsuite.unsw.edu.au/servers/slimfinder.php web server] (also accessible [http://slim.icr.ac.uk/tools/peptools/input here]). <br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-22T15:43:12Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT, where she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 6'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
--><br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* We're introducing methods focused on discovering position weight matrices using Gibbs Sampling, but there are interesting developments using deep neural networks too. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/DeepNN-MotifFinders-2020Review.pdf recent review]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-20T15:09:22Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT, where she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 6'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
--><br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-19T17:11:09Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
<br />
<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 18'''.<br />
<br />
<br />
'''Mar 7, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22 human autosomes and Chromosome X)<br />
<br />
<br />
'''Mar 5, 2024 - Genome Assembly - I'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]. [https://www.nature.com/nbt/volumes/42/issues/2 The latest issue of Nature Biotechnology] focuses extensively on new AI-guided protein engineering methods. We'll go into these methods extensively in the last portion of the course.<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
'''Feb 29, 2024 - Intro to Proteomics'''<br />
* Guest speaker: Vy Dang, who earned her B.S. and subsequently worked in genomics at the University of Washington, Seattle, where she was a major contributor to [https://www.science.org/doi/full/10.1126/science.aax2083 the sequencing of the Melanesian genome] before joining us at UT, where she has performed >2,000 mass spectrometry proteomics experiments to map brain protein-protein interactions conserved across vertebrates.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
<br />
<br />
'''Feb 27, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 6'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
<br />
<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
<br />
<br />
'''Feb 22, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
--><br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection. Note the use of AUGUSTUS to annotate genes, relevant to the Feb 20 lecture.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-19T16:31:55Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<br />
<br />
<!--<br />
'''Feb 22, 2024 - Genome Assembly - I'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
<br />
<br />
<br />
<br />
'''Feb 20, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
* If you would like a few examples of proteins with their transmembrane and soluble regions annotated (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
Reading:<br><br />
* Re-posting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
--><br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-19T15:56:58Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] will be offered starting in March in programming, bioinformatics, genome sequencing, and cryoEM<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
--><br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-15T14:24:54Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-13T15:07:37Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
--><br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day! We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-13T15:07:18Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
--><br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day!<br />
* We'll be finishing up slides from last time. <br />
* ''Science news of the day:'' 2000 years after they were buried in lava by Mt. Vesuvius, and 275 years after they were unearthed by archeologists, the first significant portion of the Herculaneum Papyri (from a neighboring town to Pompeii) [https://scrollprize.org/grandprize '''have finally been read''']. There are about a thousand of these scrolls, possibly thousands more still to be unearthed, in the only known intact library from the ancient world. They've been unreadable until now because they're all in the form of charred, cemented remains. The breakthrough comes from X-ray imaging the scrolls with a particle accelerator, then computationally unwrapping the layers (somewhat analogous to segmenting images in cryotomography) and sophisticated image analysis + machine learning to read the characters from the subtle differences in X-ray densities due to the ink.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-12T16:24:52Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - Gene finding'''<br />
* Happy day-after-Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Happy day-before-Valentine's Day!<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/Main_PageMain Page2024-02-09T22:57:26Z<p>Marcotte: /* Contact */</p>
<hr />
<div><big> '''Welcome to the Marcotte Lab''' </big><br />
<br />
[[Image:LabSplash1.png||800px]]<br />
<br />
Feel free to explore the website, and don't hesitate to contact us with questions/comments. - Members of the Marcotte Lab :-)<br />
<br />
== Research == <br />
Our group studies the large-scale organization of proteins, essentially trying to reconstruct the 'wiring diagrams' of cells by learning how all of the proteins encoded by a genome are associated into functional pathways, systems, and networks. Such models let us better define the functions of genes, and to link genes to traits and diseases. See more on the [[Research]] page.<br />
<br />
== Contact ==<br />
See the [[People]] page for individual contact info. <br />
<br />
* Physical location : MBB 3.128/3.148 [http://www.utexas.edu/maps/main/buildings/mbb.html Where is MBB?]<br />
* Address for Correspondence:<br />
<pre><br />
Edward Marcotte<br />
2500 Speedway, MBB 3.148BA<br />
The University of Texas at Austin<br />
Austin, Texas 78712<br />
</pre><br />
<br />
* Phone: 512-232-3919 (General Lab Phone)<br />
* Fax: 512-232-3472<br />
* Campus Mail Code: A4800<br />
<br />
== Quick links ==<br />
See [[Links]] for more useful links.<br />
* [https://wikis.utexas.edu/pages/viewpage.action?spaceKey=marcottelab&title=Marcotte+Lab+Internal+Home internal wiki]<br />
* [http://www.marcottelab.org/local/index.php/Main_Page prior (out-of-date) internal wiki]<br />
* Additional information as supplements to papers submitted to various scientific journals, hosted on the [http://bioinformatics.icmb.utexas.edu central bioinformatics server].<br />
* You will need to [[Special:RequestAccount|Request a Wiki Account]] in order to edit this wiki.</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-08T15:14:16Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-07T00:21:48Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
<br />
<br />
<br />
'''Feb 13, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 26, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
<br />
<br />
<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-06T23:46:32Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 8, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 14'''.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
'''Just a reminder about the mechanics of this class:''' ''Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-06T17:17:34Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Feb 8, 2024 - <br />
--><br />
<br />
* ''Just a reminder about the mechanics of this class: Lectures will generally be about algorithms and concepts, while the coding help hours (or my office hours) are for you to get individual coding help and feedback. Please plan to go to coding help hours if you need that support!''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-06T14:41:09Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
<br />
--><br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-05T23:55:14Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
<br />
<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 14''':<br />
* Besides giving a bit more programming experience, these questions will also give you some more practice with the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you have yet to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
* NOTE: The problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it will work fine. e.g. all you need is something like this: my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC")<br />
<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
<br />
<br />
--><br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-04T20:16:46Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
<!--<br />
'''ROSALIND ANNOUNCEMENT'''<br />
* It looks like some people are struggling with the Rosalind problem titled ''Protein Translation''. As an alternative option, I've assigned a problem titled ''Translating RNA into Protein''. Choose one; you'll get credit regardless of which of them you do. Also, it looks like the problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it should work fine.<br />
<br />
<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
<br />
<br />
<br />
<br />
'''Feb 6, 2024 - Biological databases'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 8''':<br />
* Besides giving a bit more programming experience, these questions will also introduce you to the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you need to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [http://www.bio.utexas.edu/research/meyers/LaurenM/index.html Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
--><br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-02-01T14:33:47Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
<!--<br />
'''ROSALIND ANNOUNCEMENT'''<br />
* It looks like some people are struggling with the Rosalind problem titled ''Protein Translation''. As an alternative option, I've assigned a problem titled ''Translating RNA into Protein''. Choose one; you'll get credit regardless of which of them you do. Also, it looks like the problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it should work fine.<br />
--><br />
<br />
<!--<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
* '''WEATHER WARNING #2: Change of plans!''' UT has now officially canceled in-person classes, but more to the point, >100,000 people have lost power in Austin today. We're going to cancel the live zoom class tomorrow, and Matt will instead record the lecture and upload it to Canvas for viewing.<br />
* Matt is an expert in the bioinformatic analyses of plasmid sequences and developed the popular [http://plannotate.barricklab.org/ pLannotate tool] to annotate and visualize plasmid features, based on a large database of genetic parts and protein sequences. Funny enough, he first described an early version of pLannotate as his project for this class back in 2019. He'll be introducing several useful Python libraries, including the Pandas package for handling large tables and a data visualization library for plotting data.<br />
--><br />
<br />
<!--<br />
'''Jan 6, 2024 - Biological databases'''<br />
* WEATHER WARNING: UT just announced a campus closure for the morning, so for those of you that are able to attend online, I'll plan to hold it at the normal time on the class zoom channel (link available on Canvas). However, for those that can't make it, don't stress! We'll record the lecture and post the video to Canvas so that you can watch it later. Note: the next Rosalind homework is assigned below.<br />
* Science news of the day: [https://www.theguardian.com/science/2024/jan/26/science-journals-ban-listing-of-chatgpt-as-co-author-on-papers Cell, Nature, Science, eLife, and the Lancet ban listing ChatGPT as a co-author]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 8''':<br />
* Besides giving a bit more programming experience, these questions will also introduce you to the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you need to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [https://stat.utexas.edu/people/lauren-ancel-meyers Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
--><br />
<br />
<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
<br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-01-31T17:35:43Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
<!--<br />
'''ROSALIND ANNOUNCEMENT'''<br />
* It looks like some people are struggling with the Rosalind problem titled ''Protein Translation''. As an alternative option, I've assigned a problem titled ''Translating RNA into Protein''. Choose one; you'll get credit regardless of which of them you do. Also, it looks like the problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it should work fine.<br />
--><br />
<br />
<!--<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
* '''WEATHER WARNING #2: Change of plans!''' UT has now officially canceled in-person classes, but more to the point, >100,000 people have lost power in Austin today. We're going to cancel the live zoom class tomorrow, and Matt will instead record the lecture and upload it to Canvas for viewing.<br />
* Matt is an expert in the bioinformatic analyses of plasmid sequences and developed the popular [http://plannotate.barricklab.org/ pLannotate tool] to annotate and visualize plasmid features, based on a large database of genetic parts and protein sequences. Funny enough, he first described an early version of pLannotate as his project for this class back in 2019. He'll be introducing several useful Python libraries, including the Pandas package for handling large tables and a data visualization library for plotting data.<br />
--><br />
<br />
<!--<br />
'''Jan 6, 2024 - Biological databases'''<br />
* WEATHER WARNING: UT just announced a campus closure for the morning, so for those of you that are able to attend online, I'll plan to hold it at the normal time on the class zoom channel (link available on Canvas). However, for those that can't make it, don't stress! We'll record the lecture and post the video to Canvas so that you can watch it later. Note: the next Rosalind homework is assigned below.<br />
* Science news of the day: [https://www.theguardian.com/science/2024/jan/26/science-journals-ban-listing-of-chatgpt-as-co-author-on-papers Cell, Nature, Science, eLife, and the Lancet ban listing ChatGPT as a co-author]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 8''':<br />
* Besides giving a bit more programming experience, these questions will also introduce you to the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you need to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [https://stat.utexas.edu/people/lauren-ancel-meyers Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
--><br />
<br />
<!--<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/users/BCH394P_364C_2024/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
--><br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-01-29T19:36:44Z<p>Marcotte: /* 2022 */</p>
<hr />
<div>== 2023 ==<br />
<ol><br />
<li value="248"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
|pdf=CommunicationsBiology_OmicronAntibody_2023.pdf<br />
}} <br />
<li value="247"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=jkad293<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="245"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="243"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="242"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|pdf=NatureCommunications_NDRC_Structure_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="240"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="238"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2024<br />
|volume=Jan 3<br />
|page=mbcE23030084<br />
|link=https://doi.org/10.1091/mbc.E23-03-0084<br />
|comment=[https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 bioRxiv preprint] (deposited Mar 9, 2023)<br />
|pubmed=38170584<br />
}}<br />
</li><br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|pdf=FrontiersInPlantScience_ProteinAggregation_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
|pdf=PLoSComputationalBiology_Whatprot_2023.pdf<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
|pdf=eLife_IFTAStructure_2023.pdf<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|pdf=CellReportsMethods_MERGE_2023.pdf<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022) [http://www.marcottelab.org/paper-pdfs/CellReportsMethods_MERGE_2023_Supplement.pdf Supplement]<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnRevCellDevBiol_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-01-29T19:23:57Z<p>Marcotte: /* 2023 */</p>
<hr />
<div>== 2023 ==<br />
<ol><br />
<li value="248"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
|pdf=CommunicationsBiology_OmicronAntibody_2023.pdf<br />
}} <br />
<li value="247"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=jkad293<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="245"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="243"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="242"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|pdf=NatureCommunications_NDRC_Structure_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="240"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="238"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2024<br />
|volume=Jan 3<br />
|page=mbcE23030084<br />
|link=https://doi.org/10.1091/mbc.E23-03-0084<br />
|comment=[https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 bioRxiv preprint] (deposited Mar 9, 2023)<br />
|pubmed=38170584<br />
}}<br />
</li><br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|pdf=FrontiersInPlantScience_ProteinAggregation_2023.pdf<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022)<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnRevCellDevBiol_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-01-29T19:13:48Z<p>Marcotte: /* 2012 */</p>
<hr />
<div>== 2023 ==<br />
<ol><br />
<li value="248"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
}} <br />
<li value="247"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="245"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="243"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="242"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="240"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="238"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=bioRxiv<br />
|pub_year=2023<br />
|volume=Deposited Mar 9<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 <br />
|pubmed=36945534<br />
}}<br />
</li><br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022)<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnRevCellDevBiol_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/PublicationPublication2024-01-29T19:11:24Z<p>Marcotte: /* 2012 */</p>
<hr />
<div>== 2023 ==<br />
<ol><br />
<li value="248"> {{Paper<br />
|title=SARS-COV-2 Omicron variants conformationally escape a rare quaternary antibody binding mode<br />
|authors=Goike J, Hsieh CL, Horton AP, Gardner EC, Zhou L, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Xie X, Xia H, Shi PY, Renberg R, Segall-Shapiro TH, Terrace CI, Wu W, Shroff R, Byrom M, Ellington AD, Marcotte EM, Musser JM, Kuchipudi SV, Kapur V, Georgiou G, Weaver SC, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|page=1250<br />
|volume=6(1)<br />
|link=https://doi.org/10.1038/s42003-023-05649-6<br />
|pubmed=38082099<br />
}} <br />
<li value="247"> {{Paper<br />
|title=Robust and scalable single-molecule protein sequencing with fluorosequencing<br />
|authors=Mapes JH, Stover J, Stout HD, Folsom TM, Babcock E, Loudwig S, Martin C, Austin MJ, Tu F, Howdieshell CJ, Simpson ZB, Blom T, Weaver D, Winkler D, Vander Velden K, Ossareh PM, Beierle JM, Somekh T, Bardo AM, Anslyn EV, Marcotte EM, Swaminathan J<br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 16<br />
|link=https://doi.org/10.1101/2023.09.15.558007 <br />
|pubmed=37745461<br />
}} <br />
<li value="246"> {{Paper<br />
|title=Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching<br />
|authors=Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ<br />
|journal=G3<br />
|pub_year=2023<br />
|page=<br />
|volume=<br />
|link=https://doi.org/10.1093/g3journal/jkad293<br />
|comment=[https://doi.org/10.1101/2023.09.21.558736 bioRxiv preprint] (deposited Sept 22, 2023)<br />
|pubmed=38135291<br />
}}<br />
<li value="245"> {{Paper<br />
|title=Mining Patents with Large Language Models Demonstrates Congruence of Functional Labels and Chemical Structures<br />
|authors=Kosonocky CW, Wilke CO, Marcotte EM, Ellington AD<br />
|journal=arXiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited Sept 15<br />
|link=https://arxiv.org/abs/2309.08765<br />
|pubmed=<br />
}}<br />
<li value="244"> {{Paper<br />
|title=Estimating error rates for single molecule protein sequencing experiments<br />
|authors=Smith MB, VanderVelden K, Blom T, Stout HD, Mapes JH, Folsom TM, Martin C, Bardo AM, Marcotte EM <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 19<br />
|link=https://doi.org/10.1101/2023.07.18.549591<br />
|pubmed=37502879<br />
}}<br />
<li value="243"> {{Paper<br />
|title=An amino acid-resolution interactome for motile cilia illuminates the structure and function of ciliopathy protein complexes<br />
|authors=McCafferty CL, Papoulas O, Lee C, Bui KH, Taylor DW, Marcotte EM, Wallingford JB <br />
|journal=bioRxiv <br />
|pub_year=2023<br />
|page=<br />
|volume=Deposited July 10<br />
|link=https://doi.org/10.1101/2023.07.09.548259 <br />
|pubmed=37781579<br />
}}<br />
<li value="242"> {{Paper<br />
|title=Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism<br />
|authors=Ghanaeian A, Majhi S, McCafferty CL, Nami B, Black CS, Yang SK, Legal T, Papoulas O, Janowska M, Valente-Paterno M, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications<br />
|pub_year=2023<br />
|page=5741<br />
|volume=14<br />
|link=https://www.nature.com/articles/s41467-023-41480-7<br />
|pubmed=37398254<br />
|comment=[https://doi.org/10.1101/2023.05.31.543107 bioRxiv preprint] (deposited June 01, 2023)<br />
}}<br />
<li value="241"> {{Paper<br />
|title=Distinctive interactomes of RNA polymerase II phosphorylation during different stages of transcription<br />
|authors=Moreno RY, Juetten KJ, Panina SB, Butalewicz JP, Floyd BM, Ramani MKV, Marcotte EM, Brodbelt JS, Zhang Yan<br />
|journal=iScience<br />
|pub_year=2023<br />
|page=107581<br />
|pdf=SSRN-id4449188.pdf<br />
|volume=26(9)<br />
|link=https://ssrn.com/abstract=4449188 <br />
|comment=[https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4449188&download=yes&redirectFrom=true SSRN preprint] (deposited May 17, 2023)<br />
|pubmed=37664589<br />
}}<br />
</li><br />
<li value="240"> {{Paper<br />
|title=Native doublet microtubules from Tetrahymena thermophila reveal the importance of outer junction proteins<br />
|authors=Kubo S, Black CS, Joachimiak E, Yang SK, Legal T, Peri K, Khalifa AAZ, Ghanaeian A, McCafferty CL, Valente-Paterno M, De Bellis C, Huynh PM, Fan Z, Marcotte EM, Wloga D, Bui KH<br />
|journal=Nature Communications <br />
|pub_year=2023<br />
|volume=14<br />
|page=Article number: 2168<br />
|link=https://www.nature.com/articles/s41467-023-37868-0 <br />
|pubmed=37061538<br />
|pdf=NatureCommunications_MTDoubletStructure_2023.pdf<br />
}}<br />
</li><br />
<li value="239"> {{Paper<br />
|title=Does AlphaFold2 model proteins' intracellular conformations? An experimental test using cross-linking mass spectrometry of endogenous ciliary proteins<br />
|authors=McCafferty CL, Pennington EL, Papoulas O, Taylor DW, Marcotte EM<br />
|journal=Communications Biology<br />
|pub_year=2023<br />
|volume=6<br />
|page=Article number: 421<br />
|link=https://www.nature.com/articles/s42003-023-04773-7<br />
|pdf=CommunicationsBiology_XLTestOfAF2_2023.pdf<br />
|pubmed=37061613<br />
|comment=[https://doi.org/10.1101/2022.08.25.505345 bioRxiv preprint] (deposited Aug 26, 2022)<br />
}}<br />
<li value="238"> {{Paper<br />
|title=Label-free proteomic comparison reveals ciliary and non- ciliary phenotypes of IFT-A mutants<br />
|authors=Leggere J, Hibbard J, Papoulas O, Lee C, Pearson CG, Marcotte EM, Wallingford JB<br />
|journal=bioRxiv<br />
|pub_year=2023<br />
|volume=Deposited Mar 9<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/2023.03.08.531778v1 <br />
|pubmed=36945534<br />
}}<br />
</li><br />
<li value="237"> {{Paper<br />
|title=Protein nonadditive expression and solubility contribute to heterosis in ''Arabidopsis'' hybrids and allotetraploids<br />
|authors=June V, Xu D, Papoulas O, Boutz D, Marcotte EM, Chen ZJ<br />
|journal=Frontiers in Plant Science<br />
|pub_year=2023<br />
|volume=14<br />
|page=1252564<br />
|link=https://doi.org/10.3389/fpls.2023.1252564<br />
|pubmed=37780492<br />
|comment=[https://doi.org/10.1101/2023.03.01.530688 bioRxiv preprint] (deposited Mar 2, 2023)<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2022 ==<br />
<ol> <br />
<li value="236"> {{Paper<br />
|title=Humanized CB1R and CB2R yeast biosensors enable facile screening of cannabinoid compounds<br />
|authors=Mulvihill CJ, Lutgens J, Gollihar JD, Bachanová P, Marcotte EM, Ellington AD, Gardner EC<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Oct 12<br />
|page=<br />
|link=https://doi.org/10.1101/2022.10.12.511978 <br />
|pubmed=<br />
}}<br />
<li value="235"> {{Paper<br />
|title=Amino acid sequence assignment from single molecule peptide sequencing data using a two-stage classifier<br />
|authors=Smith MB, Simpson ZB, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2023<br />
|volume=19(5)<br />
|page=e1011157<br />
|comment=[https://doi.org/10.1101/2022.09.23.509260 bioRxiv preprint] (deposited Sept 26, 2022)<br />
|link=https://doi.org/10.1371/journal.pcbi.1011157<br />
|pubmed=37253025<br />
}}<br />
<li value="234"> {{Paper<br />
|title=Alternative proteoforms and proteoform-dependent assemblies in humans and plants<br />
|authors=McWhite CD, Sae-Lee W, Yuan Y, Mallam A, Gort-Frietas NA, Ramundo S, Onishi M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2022<br />
|volume=Deposited Sept 22<br />
|page=<br />
|link=https://doi.org/10.1101/2022.09.21.508930 <br />
|pubmed=<br />
}}<br />
<li value="233"> {{Paper<br />
|title=The protein organization of a red blood cell<br />
|authors=Sae-Lee W, McCafferty CL, Verbeke EJ, Havugimana PC, Papoulas O, McWhite CD, Houser JR, Vanuytsel K, Murphy G, Drew K, Emili A, Taylor DW, Marcotte EM<br />
|journal=Cell Reports<br />
|pub_year=2022<br />
|volume=40(3)<br />
|page=111103<br />
|pdf=CellReports_RBCs_2022.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2022.111103<br />
|comment=[https://doi.org/10.1101/2021.12.10.472004 bioRxiv preprint] (deposited Dec 11, 2021)<br />
|pubmed=35858567<br />
}}<br />
<li value="232"> {{Paper<br />
|title=Integrative modeling reveals the molecular architecture of the Intraflagellar Transport A (IFT-A) complex<br />
|authors=McCafferty CL, Papoulas O, Jordan MA, Hoogerbrugge G, Nichols C, Pigino G, Taylor DW, Wallingford JB, Marcotte EM<br />
|journal=eLife<br />
|pub_year=2022<br />
|page=e81977<br />
|pubmed=36346217<br />
|volume=11<br />
|link=https://elifesciences.org/articles/81977<br />
|comment=[https://doi.org/10.1101/2022.07.05.498886 bioRxiv preprint] (deposited Jul 5, 2022)<br />
}}<br />
<li value="231"> {{Paper<br />
|title=Rapid, scalable, combinatorial genome engineering by Marker-less Enrichment and Recombination of Genetically Engineered loci (MERGE)<br />
|authors=Abdullah M, Greco BM, Laurent JM, Garge RK, Boutz DR, Vandeloo M, Marcotte EM, Kachroo AH<br />
|journal=Cell Reports Methods<br />
|pub_year=2023<br />
|page=100464<br />
|pubmed=37323580<br />
|volume=3<br />
|link=https://doi.org/10.1016/j.crmeth.2023.100464<br />
|comment=[https://doi.org/10.1101/2022.06.17.496490 bioRxiv preprint] (deposited Jun 21, 2022)<br />
}}<br />
<li value="230"> {{Paper<br />
|title=Molecular complex detection in protein interaction networks through reinforcement learning<br />
|authors=Palukuri MV, Patil RS, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2023<br />
|page=306<br />
|pubmed=37532987<br />
|volume=24<br />
|link=https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05425-7<br />
|comment=[https://doi.org/10.1101/2022.06.20.496772 bioRxiv preprint] (deposited Jun 21, 2022) [https://rdcu.be/dipi4 pdf available here]<br />
}}<br />
<li value="229"> {{Paper<br />
|title=Evaluating the Effect of Dye–Dye Interactions of Xanthene-Based Fluorophores in the Fluorosequencing of Peptides<br />
|authors=Bachman JL, Wight CD, Bardo AM, Johnson AM, Pavlich CI, Boley AJ, Wagner HR, Swaminathan J, Iverson BL, Marcotte EM, Anslyn EV<br />
|journal=Bioconjugate Chemistry<br />
|pub_year=2022<br />
|page=1156-1165<br />
|pubmed=35622964<br />
|volume=33(6)<br />
|pdf=BioconjugateChemistry_DyeDyeInteractions_2022.pdf<br />
|link=https://doi.org/10.1021/acs.bioconjchem.2c00103<br />
}}<br />
<li value="228"> {{Paper<br />
|title=An invitation to help define the challenge and goals for an understudied proteins initiative<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Biotechnology<br />
|pub_year=2022<br />
|page=815-817<br />
|pubmed=35534555<br />
|volume=40(6)<br />
|pdf=NatureBiotechnology_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41587-022-01316-z <br />
}}<br />
<li value="227"> {{Paper<br />
|title=ARVCF catenin controls force production during vertebrate convergent extension<br />
|authors=Huebner RJ, Weng S, Lee C, Sarıkaya S, Papoulas O, Cox RM, Marcotte EM, Wallingford JB<br />
|journal=Developmental Cell<br />
|pub_year=2022<br />
|volume=57<br />
|link=https://doi.org/10.1016/j.devcel.2022.04.001<br />
|page=1-13<br />
|comment=[https://doi.org/10.1101/2021.06.21.449290 bioRxiv preprint] (deposited June 22, 2021, under the title '''Cell adhesions link subcellular actomyosin dynamics to tissue scale force production during vertebrate convergent extension''') [[File:DevCellHuebnerCover_2022b.jpg|100px|right]]<br />
|pubmed=35476939<br />
|pdf=DevelopmentalCell_ARVCF_2022.pdf<br />
}}<br />
<li value="226"> {{Paper<br />
|title=Understudied proteins: Opportunities and challenges for functional proteomics<br />
|authors=Kustatscher G, Collins T, Gingras AC, Guo T, Hermjakob H, Ideker T, Lilley KS, Lundberg E, Marcotte EM, Ralser M, Rappsilber R<br />
|journal=Nature Methods<br />
|pub_year=2022<br />
|page=774–779<br />
|pubmed=35534633<br />
|volume=19<br />
|pdf=NatureMethods_UnderstudiedProteins_2022.pdf<br />
|link=https://doi.org/10.1038/s41592-022-01454-x <br />
}}<br />
</li><br />
<li value="225"> {{Paper<br />
|title=Protein sequencing, one molecule at a time<br />
|authors=Floyd BM, Marcotte EM<br />
|journal=Annual Review of Biophysics<br />
|pub_year=2022<br />
|volume=51<br />
|link=https://doi.org/10.1146/annurev-biophys-102121-103615<br />
|page=181-200<br />
|pubmed=34985940<br />
|pdf=AnnRevBiophysics_Floyd_2022.pdf<br />
|comment = [http://www.annualreviews.org/eprint/5KI4GZAHTDXJH6UZM6GX/full/10.1146/annurev-biophys-102121-103615 Author's free reprint access link]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2021 ==<br />
<ol> <br />
<li value="224"> {{Paper<br />
|title=Studies of Surface Preparation for the Fluorosequencing of Peptides<br />
|authors=Hinson CM, Bardo AM, Shannon CE, Rivera S, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=Langmuir<br />
|pub_year=2021<br />
|volume=37(51) <br />
|page=14856–14865<br />
|pdf=Langmuir_SurfacePrep_2021.pdf<br />
|link=https://doi.org/10.1021/acs.langmuir.1c02644<br />
|pubmed=34904833<br />
}}<br />
<li value="223"> {{Paper<br />
|title=HumanNet v3: an improved database of human gene networks for disease research<br />
|authors=Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, Hart T, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2021<br />
|volume=Nov 8:gkab1048<br />
|page=<br />
|pdf=NAR_HumanNet3_2021.pdf<br />
|link=https://doi.org/10.1093/nar/gkab1048<br />
|pubmed=34747468<br />
}}<br />
<li value="222"> {{Paper<br />
|title=Photoredox-catalyzed decarboxylative C-terminal differentiation for bulk and single molecule proteomics<br />
|authors=Zhang L, Floyd BM, Chilamari M, Mapes J, Swaminathan J, Bloom S, Marcotte EM, Anslyn EV<br />
|link=https://pubs.acs.org/doi/10.1021/acschembio.1c00631<br />
|journal=ACS Chem Biol<br />
|pub_year=2021<br />
|volume=16<br />
|page=2595−2603<br />
|pdf=ACSChemBio_Cterm_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.07.08.451692 bioRxiv preprint] (deposited July 9, 2021)<br />
|pubmed=34734691<br />
}}<br />
<li value="221"> {{Paper<br />
|title=Super.Complex: A supervised machine learning pipeline for molecular complex detection in protein-interaction networks<br />
|authors=Palukuri MV, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2021<br />
|volume=16(12)<br />
|page=e0262056<br />
|pdf=PLoSOne_SuperComplex_2021.pdf<br />
|comment=[https://doi.org/10.1101/2021.06.22.449395 bioRxiv preprint] (deposited October 11, 2021)<br />
|link=https://doi.org/10.1371/journal.pone.0262056<br />
|pubmed=34972161<br />
}}<br />
</li><br />
<li value="220"> {{Paper<br />
|title=Discovery of new vascular disrupting agents based on evolutionarily conserved drug action, pesticide resistance mutations, and humanized yeast<br />
|authors=Garge RK, Cha HJ, Lee, C, Gollihar JD, Kachroo AH, Wallingford JB, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2021<br />
|volume=219(1)<br />
|pdf=Genetics_VDAs_2021.pdf<br />
|link=https://doi.org/10.1093/genetics/iyab101<br />
|page=iyab101<br />
|comment=[https://doi.org/10.1101/2020.09.15.298828 bioRxiv preprint] (deposited Sept 15, 2020 under the title '''Antifungal benzimidazoles disrupt vasculature by targeting one of nine β-tubulins''') [https://genestogenomes.org/how-an-anti-fungal-medication-can-stop-new-blood-vessel-formation/ Commentary] [[File:GeneticsVDACover2021.jpg|100px|right]]<br />
|pubmed=34849907<br />
}}<br />
<li value="219"> {{Paper<br />
|title=Functional expression of opioid receptors and other human GPCRs in yeast engineered to produce human sterols<br />
|authors=Bean BDM, Mulvihill C, Garge RK, Boutz DR, Rousseau O, Floyd BM, Cheney W, Gardner EC, Ellington AD, Marcotte EM, Gollihar JD, Whiteway M, Martin VJJ<br />
|journal=Nature Communications<br />
|pub_year=2022<br />
|volume=13(1)<br />
|page=2882<br />
|pdf=NatureCommunications_OpioidReceptorStrains_2022.pdf<br />
|comment=[https://doi.org/10.1101/2021.05.12.443385 bioRxiv preprint] (deposited May 14, 2021)<br />
|pubmed=35610225<br />
}}<br />
</li><br />
<li value="218"> {{Paper<br />
|title=The emerging landscape of single-molecule protein sequencing technologies<br />
|authors=Alfaro J, Bohländer P, Dai M, Filius M, Howard CJ, van Kooten XF, Ohayon S, Pomorski A, Schmid S, Aksimentiev A, Anslyn EV, Bedran G, Chan C, Chinappi M, Coyaud E, Dekker C, Dittmar G, Drachman N, Eelkema R, Goodlett D, Hentz S, Kalathiya U, Kelleher NL, Kelly RT, Kelman Z, Kim SH, Kuster B, Rodriguez-Larrea D, Lindsey S, Maglia G, Marcotte EM, Marino JP, Masselon C, Mayer M, Samaras P, Sarthak K, Sepiashvili L, Stein D, Wanunu M, Wilhelm M, Yin P, Meller A, Joo C<br />
|journal=Nature Methods<br />
|pub_year=2021<br />
|volume=18(6)<br />
|page=604-617<br />
|pdf=NatureMethods_SMPSreview_2021.pdf<br />
|link=https://doi.org/10.1038/s41592-021-01143-1<br />
|pubmed=34099939<br />
}}<br />
</li><br />
<li value="217"> {{Paper<br />
|title=Synthetic repertoires derived from convalescent COVID-19 patients enable discovery of SARS-CoV-2 neutralizing antibodies and a novel quaternary binding modality<br />
|authors=Goike J, Hsieh C-L, Horton A, Gardner AC, Bartzoka F, Wang N, Javanmardi K, Herbert A, Abbassi S, Renberg R, Johanson MJ, Cardona JA, Segall-Shapiro T, Zhou L, Nissly RH, Gontu A, Byrom M, Maranhao AC, Battenhouse AM, Gejji V, Soto-Sierra L, Foster ER, Woodard SL, Nikolov ZL, Lavinder J, Voss WN, Annapareddy A, Ippolito GC, Ellington AD, Marcotte EM, Finkelstein IJ, Hughes RA, Musser JM, Kuchipudi SJ, Kapur V, Georgiou G, Dye JM, Boutz DR, McLellan JS, Gollihar JD<br />
|journal=bioRxiv<br />
|pub_year=2021<br />
|volume=Posted April 9<br />
|page=<br />
|link=https://doi.org/10.1101/2021.04.07.438849<br />
|pubmed=33851158<br />
}}<br />
</li><br />
<li value="216"> {{Paper<br />
|title=Co-fractionation/mass spectrometry to identify protein complexes<br />
|authors=McWhite CD, Papoulas O, Drew K, Dang V, Leggere JC, Sae-Lee W, Marcotte EM<br />
|journal=STAR Protocols<br />
|pub_year=2021<br />
|volume=2(1)<br />
|page=100370<br />
|pdf=STARProtocols_cfms_2021.pdf<br />
|link=https://www.sciencedirect.com/science/article/pii/S2666166721000770<br />
|pubmed=33748783<br />
}}<br />
</li><br />
<li value="215"> {{Paper<br />
|title=Spatiotemporal transcriptional dynamics of the cycling mouse oviduct<br />
|authors=Roberson E, Battenhouse A, Garge RK, Tran NK, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2021<br />
|volume=476 (2021)<br />
|page=240–248<br />
|comment=[https://doi.org/10.1101/2021.01.15.426867 bioRxiv preprint] (deposited Jan 15, 2021) [[File:DevBioCover_2021_small.jpg||100px|right]]<br />
|link=https://doi.org/10.1016/j.ydbio.2021.03.018<br />
|pubmed=33864778<br />
|pdf=DevelopmentalBiology_mouseoviduct_2021.pdf<br />
}}<br />
</li><br />
<li value="214"> {{Paper<br />
|title=Improving integrative 3D modeling into low- to medium- resolution EM structures with evolutionary couplings<br />
|authors=McCafferty CL, Taylor DW, Marcotte EM<br />
|journal=Protein Science<br />
|pub_year=2021<br />
|volume=30<br />
|page=1006–1021<br />
|pubmed=33759266<br />
|comment=[https://doi.org/10.1101/2021.01.14.426447 bioRxiv preprint] (deposited January 14, 2021)<br />
|link=https://doi.org/10.1002/pro.4067<br />
|pdf=ProteinScience_ECinIMP_2021.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2020 ==<br />
<ol> <br />
<li value="213"> {{Paper<br />
|title=Systematic Identification of Protein Phosphorylation-Mediated Interactions<br />
|authors=Floyd BM, Drew K, Marcotte EM<br />
|journal=J Proteome Research<br />
|pub_year=2021<br />
|volume=20(2)<br />
|page=1359-1370<br />
|pdf=JProteomeResearch_PhosphoDIFFRAC_2021.pdf<br />
|link=https://doi.org/10.1021/acs.jproteome.0c00750<br />
|comment=[https://doi.org/10.1101/2020.09.18.304121 bioRxiv preprint] (deposited Sept 19, 2020)<br />
|pubmed=33476154<br />
}}<br />
<li value="212"> {{Paper<br />
|title=hu.MAP 2.0: Integration of over 15,000 proteomic experiments builds a global compendium of human multiprotein assemblies<br />
|authors=Drew K, Wallingford JB, Marcotte EM<br />
|journal=Molecular Systems Biology<br />
|pub_year=2021<br />
|volume=17<br />
|pdf=MolecularSystemsBiology_HuMap2_2021.pdf<br />
|link=https://doi.org/10.15252/msb.202010016<br />
|page=e10016<br />
|comment=[https://doi.org/10.1101/2020.09.15.298216 bioRxiv preprint] (deposited Sept 16, 2020)<br />
|pubmed=33973408<br />
}}<br />
<li value="211"> {{Paper<br />
|title=Twinfilin1 controls lamellipodial protrusive activity and actin turnover during vertebrate gastrulation<br />
|authors=Devitt C, Lee C, Cox R, Papoulas O, Alvarado J, Marcotte EM, Wallingford JB<br />
|journal=J Cell Science<br />
|pub_year=2021<br />
|volume=134(14)<br />
|link=https://doi.org/10.1242/jcs.254011<br />
|pdf=JCellSci_Twinfilin_2021.pdf<br />
|page=jcs254011<br />
|comment=[https://doi.org/10.1101/2020.09.03.281659 bioRxiv preprint] (deposited September 3, 2020) [https://journals.biologists.com/jcs/article/134/14/e134_e1401/270993/Linking-actin-regulatory-machinery-to-vertebrate Research Highlight]<br />
|pubmed=34060614<br />
}}<br />
<li value="210"> {{Paper<br />
|title=Next-Generation TLC: A Quantitative Platform for Parallel Spotting and Imaging<br />
|authors=Boulgakov AA, Moor SR, Jo HH, Metola P, Joyce LA, Marcotte EM, Welch CJ, Anslyn EV<br />
|journal=J Org Chem<br />
|pub_year=2020<br />
|volume=85(15) <br />
|page=9447–9453<br />
|pdf=JOrgChem_NextGenTLC_2020.pdf<br />
|link=https://doi.org/10.1021/acs.joc.0c00349<br />
|comment=[[File:JOC-TLCCover2020.jpg||100px|right]]<br />
|pubmed=32559382<br />
}}<br />
<li value="209"> {{Paper<br />
|title=Systematic humanization of the yeast cytoskeleton discerns functionally replaceable from divergent human genes<br />
|authors=Garge RK, Laurent JM, Kachroo AH, Marcotte EM<br />
|journal=Genetics<br />
|pub_year=2020<br />
|volume=215(4)<br />
|pubmed=32522745<br />
|page=1153-1169<br />
|pdf=Genetics_HumanizingCytoskeleton_2020.pdf<br />
|comment=[https://doi.org/10.1101/2019.12.16.878751 bioRxiv preprint] (deposited December 17, 2019) [[File:GeneticsHumanizedYeastCover2020.jpg||100px|right]]<br />
}}<br />
<li value="208"> {{Paper<br />
|title=Humanization of yeast genes with multiple human orthologs reveals principles of functional divergence between paralogs<br />
|authors=Laurent J, Garge RK, Teufel AI, Wilke CO, Kachroo AH, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2020<br />
|volume=18(5)<br />
|page=e3000627<br />
|pdf=PLoSBiology_1tomany_2020.pdf<br />
|link=https://doi.org/10.1371/journal.pbio.3000627<br />
|pubmed=32421706<br />
|comment=[https://www.biorxiv.org/content/10.1101/668335v1 bioRxiv preprint] (deposited June 13, 2019) <br />
}}<br />
<li value="207"> {{Paper<br />
|title=Functional partitioning of a liquid-like organelle during assembly of axonemal dyneins<br />
|authors=Lee C, Cox RM, Papoulas O, Horani A, Drew K, Devitt CC, Brody SL, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2020<br />
|volume=9<br />
|pubmed=33263282<br />
|page=e58662<br />
|link=https://doi.org/10.7554/eLife.58662<br />
|pdf=eLife_DynAP_Partitioning_2020.pdf<br />
|comment=[https://doi.org/10.1101/2020.04.21.052837 bioRxiv preprint] (deposited April 21, 2020) <br />
}}<br />
<li value="206"> {{Paper<br />
|title=A pan-plant protein complex map reveals deep conservation and novel assemblies<br />
|authors=McWhite CD, Papoulas O, Drew K, Cox RM, June V, Dong OX, Kwon T, Wan C, Salmi ML, Roux, SJ Jr., Browning KS, Chen ZJ, Ronald PC, Marcotte EM<br />
|journal=Cell<br />
|pub_year=2020<br />
|volume=181(2)<br />
|pubmed=32191846<br />
|page=460-474.e14<br />
|comment=[https://doi.org/10.1101/815837 bioRxiv preprint] (deposited October 24, 2019) [http://plants.proteincomplexes.org/ plant.MAP supporting web site] [https://doi.org/10.5281/zenodo.4451263 Protein elution profile data repository on Zenodo]<br />
|link=https://doi.org/10.1016/j.cell.2020.02.049<br />
|pdf=Cell_PlantComplexes_2020.pdf<br />
}}<br />
<li value="205"> {{Paper<br />
|title=Structural Biology in the Multi-Omics Era<br />
|authors=McCafferty C, Verbeke EJ, Marcotte EM, Taylor DW<br />
|journal=Journal of Chemical Information and Modeling<br />
|pub_year=2020<br />
|volume=60(5)<br />
|pubmed=32129623<br />
|page=2424-2429<br />
|link=https://doi.org/10.1021/acs.jcim.9b01164<br />
|comment=[[File:JCIMShotgunEMCover2020.jpg||100px|right]]<br />
|pdf=JChemInfModel_Structural-Omics_2020.pdf<br />
}}<br />
<li value="204"> {{Paper<br />
|title=Abundances of transcripts, proteins, and metabolites in the cell cycle of budding yeast reveals coordinate control of lipid metabolism<br />
|authors=Blank HM, Papoulas O, Maitra N, Garge RK, Kennedy BK, Schilling B, Marcotte EM, Polymenis M<br />
|journal=Molecular Biology of the Cell<br />
|pub_year=2020<br />
|volume=31<br />
|pubmed=32129706<br />
|page=1061-1084<br />
|link=https://www.molbiolcell.org/doi/abs/10.1091/mbc.E19-12-0708<br />
|comment=[https://doi.org/10.1101/2019.12.17.880252 bioRxiv preprint] (deposited Dec 18, 2019)<br />
|pdf=MolBiolCell_YeastCellCycle_2020.pdf<br />
}}<br />
<li value="203"> {{Paper<br />
|title=A systematic, label-free method for identifying RNA-associated proteins in vivo provides insights into vertebrate ciliary beating<br />
|authors=Drew K, Lee C, Cox RM, Dang V, Devitt CC, Papoulas O, Huizar RL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pub_year=2020<br />
|volume=467(1-2)<br />
|comment=[https://doi.org/10.1101/2020.02.26.966754 bioRxiv preprint] (deposited Feb 27, 2020)<br />
|link=https://www.sciencedirect.com/science/article/abs/pii/S0012160620302293<br />
|pdf=DevelopmentalBiology_DIFFRAC-DynAPs_2020.pdf<br />
|pubmed=32898505<br />
|page=108-117<br />
}}<br />
</li><br />
<li value="202"> {{Paper<br />
|title=Mapping functional protein neighborhoods in the mouse brain<br />
|authors=Liebeskind BJ, Young RL, Halling DB, Aldrich RW, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2020<br />
|volume=Posted January 27<br />
|link=https://doi.org/10.1101/2020.01.26.920447 <br />
|pubmed=<br />
|page=<br />
}}<br />
</li><br />
<li value="201"> {{Paper<br />
|title= Solid-phase peptide capture and release for bulk and single-molecule proteomics<br />
|authors=Howard CJ, Floyd BM, Bardo AM, Swaminathan J, Marcotte EM, Anslyn EV<br />
|journal=ACS Chemical Biology<br />
|pub_year=2020<br />
|volume=15(6)<br />
|link=https://doi.org/10.1021/acschembio.0c00040<br />
|comment=[http://www.marcottelab.org/paper-pdfs/ACSChemBio_Marbles_2020_supplement.pdf Supplement] [https://doi.org/10.1101/2020.01.13.904540 bioRxiv preprint] (deposited January 14, 2020)<br />
|pdf=ACSChemBio_Marbles_2020.pdf<br />
|pubmed=32363853<br />
|page=1401-1407<br />
}}<br />
</li><br />
<li value="200"> {{Paper<br />
|title=Separating distinct structures of multiple macromolecular assemblies from cryo-EM projections<br />
|authors=Verbeke E, Zhou Y, Horton AP, Mallam AL, Taylor DW, Marcotte EM<br />
|journal=Journal of Structural Biology<br />
|pub_year=2020<br />
|volume=209(1)<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|pubmed=31726096<br />
|page=107416<br />
|pdf=JStructBiol_SLICEM_2019.pdf<br />
|link=https://doi.org/10.1016/j.jsb.2019.107416<br />
|comment=[https://github.com/marcottelab/SLICEM SLICEM code on Github] [https://www.biorxiv.org/content/10.1101/611566v1 bioRxiv preprint] (deposited Apr 20, 2019)<br />
}}<br />
<li value="199"> {{Paper<br />
|title=Synthesis of Carboxy ATTO 647N Using Redox Cycling for Xanthone Access<br />
|authors=Bachman JL, Pavlich CI, Boley AJ, Marcotte EM, Anslyn EV<br />
|journal=Org Lett<br />
|pub_year=2020<br />
|volume=22(2)<br />
|link=https://doi.org/10.1021/acs.orglett.9b03981<br />
|pubmed=31825225<br />
|page=381-385<br />
|pdf=OrganicLetters_Atto647N_2020.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acs.orglett.9b03981<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2019 ==<br />
<ol><br />
<li value="198"> {{Paper<br />
|title=Simplified geometric representations of protein structures identify complementary interaction interfaces<br />
|authors=McCafferty CL, Marcotte EM, Taylor DW<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pub_year=2021<br />
|volume=89(3)<br />
|page=348-360<br />
|pubmed=33140424<br />
|link=https://doi.org/10.1002/prot.26020<br />
|comment=[https://doi.org/10.1101/2019.12.18.880575 bioRxiv preprint] (deposited Dec 23, 2019)<br />
|journal=Proteins: Structure, Function, and Bioinformatics<br />
|pdf=Proteins_SimplifiedRepresentation_2020.pdf<br />
}}<br />
<li value="197"> {{Paper<br />
|title=Systematic bromodomain protein screens identify homologous recombination and R-loop suppression pathways involved in genome integrity<br />
|authors=Kim JJ, Lee SY, Gong F, Battenhouse AM, Boutz DR, Bashyal A, Refvik ST, Chiang CM, Xhemalce B, Paull TT, Brodbelt JS, Marcotte EM, Miller KM<br />
|journal=Genes and Development<br />
|pub_year=2019<br />
|volume=33(23-24)<br />
|pubmed=31753913<br />
|page=1751-1774<br />
|pdf=GenesDev_Bromodomains_2019.pdf<br />
|link=https://doi.org/10.1101/gad.331231.119<br />
}}<br />
<li value="196"> {{Paper<br />
|title=Systematic discovery of endogenous human ribonucleoprotein complexes<br />
|authors=Mallam AL, Sae-Lee W, Schaub JM, Tu F, Battenhouse A, Jang YJ, Kim J, Finkelstein IJ, Marcotte EM, Drew K<br />
|journal=Cell Reports<br />
|pub_year=2019<br />
|volume=29(5)<br />
|page=P1351-1368.e5<br />
|pubmed=31665645<br />
|pdf=CellReports_DIFFRAC_2019.pdf<br />
|link=https://doi.org/10.1016/j.celrep.2019.09.060<br />
|comment=[https://www.biorxiv.org/content/early/2018/11/27/480061 bioRxiv preprint] (deposited Nov 27, 2018)<br />
}}<br />
<li value="195"> {{Paper<br />
|title=Ancestral Reconstruction of Protein Interaction Networks<br />
|authors=Liebeskind B, Aldrich RW, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pub_year=2019<br />
|volume=15(10)<br />
|page=e1007396<br />
|pubmed=31658251<br />
|pdf=PLoSComputationalBiology_AncestralPPIs_2019.pdf<br />
|link= https://doi.org/10.1371/journal.pcbi.1007396<br />
|comment=[https://doi.org/10.1101/408773 bioRxiv preprint] (deposited September 9, 2018) <br />
}}<br />
<li value="194"> {{Paper<br />
|title=Advances and Applications in the Quest for Orthologs.<br />
|authors=Glover N, Dessimoz C, Ebersberger I, Forslund SK, Gabaldón T, Huerta-Cepas J, Martin MJ, Muffato M, Patricio M, Pereira C, Sousa da Silva A, Wang Y, Sonnhammer E, Thomas PD; Quest for Orthologs Consortium<br />
|journal=Mol Biol Evol<br />
|pub_year=2019<br />
|volume=36(10)<br />
|page=2157-2164<br />
|pdf=MolBiolEvol_QfO_2019.pdf<br />
|link=https://doi.org/10.1093/molbev/msz150<br />
|pubmed=31241141<br />
}}<br />
<li value="193"> {{Paper<br />
|title=Bringing Microscopy-By-Sequencing into View<br />
|authors=Boulgakov AA, Ellington AD, Marcotte EM<br />
|journal=Trends in Biotechnology<br />
|pub_year=available online 2019, published 2020<br />
|volume=38(2)<br />
|page=154-162<br />
|pubmed=31416630<br />
|pdf=TIBTech_DNAmicroscopy_2020.pdf<br />
|link=https://doi.org/10.1016/j.tibtech.2019.06.001<br />
|comment=[[File:TIBTechCover2020.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2018 ==<br />
<ol><br />
<li value="192"> {{Paper<br />
|title=Paternal chromosome loss and metabolic crisis contribute to hybrid inviability in ''Xenopus''<br />
|authors=Gibeaux R, Acker R, Kitaoka M, Georgiou G, van Kruijsbergen I, Ford B, Marcotte EM, Nomura DK, Kwon T, Veenstra GJC, Heald R<br />
|journal=Nature<br />
|volume=553<br />
|page=337–341<br />
|pubmed=29320479<br />
|pub_year=2018<br />
|pdf=Nature_XenopusHybridInviability_2017.pdf<br />
|link=http://dx.doi.org/10.1038/nature25188<br />
}}<br />
<li value="191"> {{Paper<br />
|title=A liquid-like organelle at the root of motile ciliopathy<br />
|authors=Huizar RL, Lee C, Boulgakov AA, Horani A, Tu F, Marcotte EM, Brody SL, Wallingford JB<br />
|journal=eLife<br />
|pub_year=2018<br />
|comment=[https://doi.org/10.1101/213793 bioRxiv preprint (deposited Nov 3, 2017)]<br />
|volume=7<br />
|pubmed=30561330<br />
|page=e38497<br />
|pdf=eLife_DynAPs_2018.pdf<br />
|link=https://doi.org/10.7554/eLife.38497<br />
}}<br />
<li value="190"> {{Paper<br />
|title=From Space to Sequence and Back Again: Iterative DNA Proximity Ligation and its Applications to DNA-Based Imaging<br />
|authors=Boulgakov AA, Xiong E, Bhadra S, Ellington AD, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2018<br />
|volume=posted November 14<br />
|page=<br />
|link=https://doi.org/10.1101/470211 <br />
}}<br />
<li value="189"> {{Paper<br />
|title=HumanNet v2: human gene networks for disease research<br />
|authors=Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Res<br />
|pub_year=2018,2019<br />
|volume=47 (D1)<br />
|page=D573–D580<br />
|pdf=NAR_HumanNet2_2018.pdf<br />
|link=https://doi.org/10.1093/nar/gky1126 <br />
|pubmed=30418591<br />
}}<br />
<li value="188"> {{Paper<br />
|title=Highly parallel single-molecule identification of proteins in zeptomole-scale mixtures<br />
|authors=Swaminathan J, Boulgakov AA, Hernandez ET, Bardo AM, Bachman JL, Marotta J, Johnson AM, Anslyn EV, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2018<br />
|volume=36<br />
|page=1076–1082<br />
|pubmed=30346938<br />
|pdf=NatureBiotechnology_Fluorosequencing_2018.pdf<br />
|link=https://doi.org/10.1038/nbt.4278 <br />
|comment=[https://rdcu.be/9Pjj Free access authors' view-only version at NBT] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_SupplementaryTables.pdf Supplementary Tables] [https://github.com/marcottelab/FluorosequencingImageAnalysis/ github with code] [http://doi.org/10.5281/zenodo.782860 Data repository (Zenodo)] [http://www.marcottelab.org/paper-pdfs/NatureBiotechnology_Fluorosequencing_2018_NewsAndViews-CollinsAebsersold.pdf News & Views] Commentary in [https://phys.org/news/2018-10-protein-sequencing-method-biological.html Phys.org] <br />
}}<br />
<li value="187"> {{Paper<br />
|title=The many nuanced evolutionary consequences of duplicated genes<br />
|authors=Teufel AI, Johnson MM, Laurent JM, Kachroo AH, Marcotte EM, Wilke CO<br />
|journal=Mol Bio Evol<br />
|pub_year=2018<br />
|volume=36(2)<br />
|page=304-314<br />
|pdf=MolBiolEvol_Teufel_2018.pdf<br />
|link=https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msy210 <br />
|comment = [https://doi.org/10.1101/366971 bioRxiv preprint] (deposited July 10, 2018)<br />
|pubmed=30428072<br />
}}<br />
<li value="186"> {{Paper<br />
|title=Photography Coupled with Self-Propagating Chemical Cascades. The Differentiation and Quantitation of G and V Nerve Agent Mimics via Chromaticity<br />
|authors=Sun X, Boulgakov AA, Smith L, Metola P, Marcotte EM, Anslyn EV<br />
|journal=ACS Central Science<br />
|volume=4(7)<br />
|page=854-861<br />
|pubmed=30062113<br />
|pub_year=2018<br />
|pdf=ACSCentralScience_LegoNerveGas_2018.pdf<br />
|link=https://pubs.acs.org/doi/10.1021/acscentsci.8b00193<br />
}}<br />
<li value="185"> {{Paper<br />
|title=Classification of single particles from human cell extract reveals distinct structures <br />
|authors=Verbeke EJ, Mallam AL, Drew K, Marcotte EM, Taylor DW<br />
|journal=Cell Reports<br />
|volume=(24)1 <br />
|page=259–268.e3<br />
|link=https://doi.org/10.1016/j.celrep.2018.06.022<br />
|pubmed=29972786<br />
|pdf=CellReports_ShotgunEM_2018.pdf<br />
|pub_year=2018<br />
|comment = [https://www.biorxiv.org/content/early/2018/01/14/247254 bioRxiv preprint] (deposited January 14 , 2018)<br />
}}<br />
<li value="184"> {{Paper<br />
|title=Single-step precision genome editing in yeast using CRISPR-Cas9 <br />
|authors= Akhmetov A, Laurent JM, Gollihar J, Gardner EC, Garge RK, Ellington AD, Kachroo AH, Marcotte EM <br />
|journal=Bio-protocol<br />
|volume=8(6)<br />
|page=e2765<br />
|pubmed=29770349<br />
|pub_year=2018<br />
|pdf=Bio-protocol_YeastCRISPR_2018.pdf<br />
|link=http://dx.doi.org/10.21769/BioProtoc.2765<br />
}}<br />
</li><br />
<li value="183"> {{Paper<br />
|title=Protein localization screening in vivo reveals novel regulators of multiciliated cell development and function<br />
|authors=Tu F, Sedzinski J, Ma Y, Marcotte EM, Wallingford JB<br />
|journal=J Cell Sci<br />
|volume=131 (3)<br />
|page=jcs206565<br />
|pubmed=29180514<br />
|pub_year=2018<br />
|pdf=JCellSci_CiliaScreen_2018.pdf<br />
|link=http://jcs.biologists.org/content/131/3/jcs206565<br />
|comment=[[File:JCSCiliaCover2018.jpg||100px|right]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2017 ==<br />
<ol><br />
<li value="182"> {{Paper<br />
|title=Solution-phase and solid-phase sequential, selective modification of side chains in KDYWEC and KDYWE as models for usage in single-molecule protein sequencing<br />
|authors=Hernandez ET, Swaminathan J, Marcotte EM , Anslyn EV<br />
|journal=New Journal of Chemistry<br />
|pubmed=<br />
|volume=41<br />
|pubmed=28983186<br />
|page=462-469<br />
|link=http://dx.doi.org/10.1039/C6NJ02932A<br />
|pub_year=2017<br />
|pdf=NewJChem_PeptideLabeling_2017.pdf<br />
|comment=[[File:NJCPeptideLabelingCover2017.jpg||100px|right]]<br />
}}<br />
<li value="181"> {{Paper<br />
|title=Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets<br />
|authors=Drew K, Müller CL, Bonneau R, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|volume=13(10)<br />
|page=e1005625<br />
|pubmed=29023445<br />
|pub_year=2017<br />
|pdf=PLoSComputationalBiology-ConditionalDependencePPIs-2017.pdf<br />
|link=https://doi.org/10.1371/journal.pcbi.1005625<br />
}}<br />
<li value="180"> {{Paper<br />
|title=Metabolic crosstalk regulates ''Porphyromonas gingivalis'' colonization and virulence during oral polymicrobial infection<br />
|authors=Kuboniwa M, Houser JR, Hendrickson EL, Wang Q, Alghamdi SA, Sakanaka A, Miller DP, Hutcherson JA, Wang T, Beck DAC, Whiteley M, Amano A, Wang H, Marcotte EM, Hackett M, Lamont RJ<br />
|journal=Nature Microbiology<br />
|volume=2<br />
|page=1493–1499<br />
|pubmed=28924191<br />
|pub_year=2017<br />
|pdf=NatureMicrobiology_PolymicrobialInfection_2017.pdf<br />
|link=https://doi.org/10.1038/s41564-017-0021-6<br />
}}<br />
<li value="179"> {{Paper<br />
|title=Systematic bacterialization of yeast genes identifies a near-universally swappable pathway<br />
|authors=Kachroo AH, Laurent JM, Akhmetov A, Szilagyi-Jones M, McWhite CD, Zhao A, Marcotte EM<br />
|journal=eLife<br />
|volume=6<br />
|page=e25093<br />
|pubmed=28661399<br />
|pub_year=2017<br />
|pdf=eLife_BacterializedYeast_2017.pdf<br />
|link=https://doi.org/10.7554/eLife.25093<br />
}}<br />
<li value="178"> {{Paper<br />
|title=A highly parallel strategy for storage of digital information in living cells<br />
|authors=Akhmetov A, Ellington A, Marcotte E<br />
|journal=BMC Biotechnology<br />
|volume=18<br />
|page=64<br />
|pubmed=30333005<br />
|pdf=bioRxiv_DigitalDNAStorage_2016.pdf<br />
|pub_year=2018<br />
|comment = [https://doi.org/10.1101/096792 bioRxiv preprint (deposited December 26, 2016)] [https://rdcu.be/9u6Y Open access pdf version of the article]<br />
|link=https://doi.org/10.1186/s12896-018-0476-4<br />
}}<br />
<li value="177"> {{Paper<br />
|title=Systems-wide studies uncover Commander, a multiprotein complex essential to human development<br />
|authors=Mallam A, Marcotte EM<br />
|journal=Cell Systems<br />
|volume=4<br />
|page=483-494<br />
|pubmed=28544880<br />
|link=http://www.cell.com/cell-systems/abstract/S2405-4712(17)30138-2<br />
|pdf=CellSystems_Commander_2017.pdf<br />
|pub_year=2017<br />
}}<br />
<li value="176"> {{Paper<br />
|title=Integration of over 9,000 mass spectrometry experiments builds a global map of human protein complexes<br />
|authors=Drew, K., Lee, C., Huizar, R. L., Tu, F., Borgeson, B., McWhite, C. D., Ma, Y., Wallingford, J. B., Marcotte, E. M.<br />
|journal=Molecular Systems Biology<br />
|page=932<br />
|volume=13<br />
|pubmed=28596423<br />
|link=http://msb.embopress.org/content/13/6/932<br />
|pdf=MolecularSystemsBiology_2017_HuMap.pdf<br />
|comment = [https://doi.org/10.1101/092361 bioRxiv preprint (deposited December 7, 2016)] [[File:MSBHuMAPCover2018.jpg||100px|right]]<br />
|pub_year=2017<br />
}}<br />
<li value="175"> {{Paper<br />
|title=GWAB: a web server for the network-based boosting of human genome-wide association data<br />
|authors=Shim JE, Bang C, Yang S, Lee T, Hwang S, Kim CY, Singh-Blom UM, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=28449091<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1093/nar/gkx284<br />
|pub_year=2017<br />
|pdf=NAR_GWAB_2017.pdf<br />
}}<br />
<li value="174"> {{Paper<br />
|title=The ''E. coli'' molecular phenotype under different growth conditions<br />
|authors=Caglar MU, Houser JR, Barnhart CS, Boutz DR, Carroll SM, Dasgupta A, Lenoir WF, Smith BL, Sridhara V, Sydykova DK, Vander Wood D, Marx CJ, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=Scientific Reports<br />
|pubmed=28417974<br />
|volume=7<br />
|page=45303<br />
|link=http://dx.doi.org/10.1038/srep45303<br />
|pub_year=2017<br />
|pdf=ScientificReports_EcoliMolecularPhenotype_2017.pdf<br />
}}<br />
<li value="173"> {{Paper<br />
|title=Large-scale analysis of post-translational modifications in ''E. coli'' under glucose-limiting conditions<br />
|authors=Brown CW, Sridhara V, Boutz DR, Person MD, Marcotte EM, Barrick JE, Wilke CO<br />
|journal=BMC Genomics<br />
|pubmed=28412930<br />
|volume=18(1)<br />
|page=301<br />
|link=http://dx.doi.org/10.1186/s12864-017-3676-8<br />
|pub_year=2017<br />
|pdf=BMCGenomics_EcoliPTMs_2017.pdf<br />
}}<br />
<li value="172"> {{Paper<br />
|title=Comprehensive de novo peptide sequencing from MS/MS pairs generated through complementary collision induced dissociation and 351 nm ultraviolet photodissociation<br />
|authors=AP Horton, SA Robotham, JR Cannon, DD Holden, EM Marcotte, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=28234449<br />
|volume=89(6)<br />
|page=3747–3753 <br />
|link=http://dx.doi.org/10.1021/acs.analchem.7b00130<br />
|pub_year=2017<br />
|pdf=AnalyticalChemistry_UVnovo2_2017.pdf<br />
}}<br />
<li value="171"> {{Paper<br />
|title=WheatNet: A genome-scale functional network for hexaploid bread wheat, ''Triticum aestivum''<br />
|authors=Lee T, Hwang S, Kim CY, Shim H, Kim H, Ronald PC, Marcotte EM, Lee I<br />
|journal=Molecular Plant<br />
|pubmed=28450181<br />
|volume=S1674-2052(17)<br />
|page=30108-9<br />
|link=http://dx.doi.org/10.1016/j.molp.2017.04.006<br />
|pdf=MolPlant_WheatNet_2017.pdf<br />
|pub_year=2017<br />
|comment = [http://dx.doi.org/10.1101/105098 bioRxiv preprint (deposited February 6, 2017)]<br />
}}<br />
<li value="170"> {{Paper<br />
|title=Murine Cytomegalovirus Deubiquitinase Regulates Viral Chemokine Levels To Control Inflammation and Pathogenesis<br />
|authors=Hilterbrand AT, Boutz DR, Marcotte EM, Upton JW<br />
|journal=mBio<br />
|pubmed=28096485<br />
|volume=8<br />
|page=e01864-16 <br />
|link=http://dx.doi.org/10.1128/mBio.01864-16 <br />
|pub_year=2017<br />
|pdf=mBio_CMBdeubiquitinase_2017.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2016 ==<br />
<ol><br />
<li value="169"> {{Paper<br />
|title=Computational Discovery of Pathway-Level Genetic Vulnerabilities in Non-Small-Cell Lung Cancer<br />
|authors=Young JH, Peyton M, Kim HS, McMillan E, Minna JD, White MA, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=26755624<br />
|volume=32(9)<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btw010<br />
|page=1373-9<br />
|pdf=Bioinformatics_LungCancer_2016.pdf<br />
|comment = [https://bitbucket.org/youngjh/nsclc_paper Supporting code]<br />
|pub_year=2016<br />
}}<br />
<li value="168"> {{Paper<br />
|title=Molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination<br />
|authors=Lee J, Boutz DR, Chromikova V, Joyce MG, Vollmers C, Leung K, Horton AP, DeKosky BJ, Lee CH, Lavinder JJ, Murrin EM, Chrysostomou C, Hoi KH, Tsybovsky Y, Thomas PV, Druz A, Zhang B, Zhang Y, Wang L, Kong WP, Park D, Popova LI, Dekker CL, Davis MM, Carter CE, Ross TM, Ellington AD, Wilson PC, Marcotte EM, Mascola JR, Ippolito GC, Krammer F, Quake SR, Kwong PD, Georgiou G<br />
|journal=Nature Medicine<br />
|pubmed=27820605<br />
|volume=22(12)<br />
|page=1456-1464<br />
|pdf=NatureMedicine_FluIgGSeq_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nm.4224<br />
|comment=[[File:NatureMedicineIgSeqCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="167"> {{Paper<br />
|title=Genome evolution in the allotetraploid frog ''Xenopus laevis''<br />
|authors=Session AM*, Uno Y*, Kwon T*, et al.<br />
|journal=Nature<br />
|pubmed=27762356<br />
|volume=538<br />
|page=336–343<br />
|pdf=Nature_XenopusGenome_2016.pdf<br />
|link=http://dx.doi.org/10.1038/nature19840<br />
|comment=[http://www.nature.com/nature/journal/v538/n7625/full/538320a.html News&Views] and [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_NewsAndViews_2016.pdf pdf]; [http://www.marcottelab.org/paper-pdfs/Nature_XenopusGenome_2016_SupplementIncludesFunding.pdf Supplementary Information] [[File:NatureXenopusCover2016.jpg||100px|right]]<br />
|pub_year=2016<br />
}}<br />
<li value="166"> {{Paper<br />
|title=Temporal Stability and Molecular Persistence of the Bone Marrow Plasma Cell Antibody Repertoire<br />
|authors=Wu GC, Cheung NV, Georgiou G, Marcotte EM, Ippolito GC<br />
|journal=Nature Communications<br />
|pubmed=28000661<br />
|volume=7<br />
|pdf=NatureCommunications_BoneMarrow_2016.pdf<br />
|link=http://dx.doi.org/10.1038/ncomms13838<br />
|page=13838<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/066878 bioRxiv preprint (deposited August 2, 2016)]<br />
}}<br />
<li value="165"> {{Paper<br />
|title=The ciliopathy-associated CPLANE proteins direct basal body recruitment of intraflagellar transport machinery<br />
|authors=Toriyama M, Lee C, Taylor SP, Duran I, Cohn DH, Bruel AL, Tabler JM, Drew K, Kelly MR, Kim S, Park TJ, Braun D, Pierquin G, Biver A, Wagner K, Malfroot A, Panigrahi I, Franco B, Al-Lami HA, Yeung Y, Choi YJ; University of Washington Center for Mendelian Genomics, Duffourd Y, Faivre L, Rivière JB, Chen J, Liu KJ, Marcotte EM, Hildebrandt F, Thauvin-Robinet C, Krakow D, Jackson PK, Wallingford JB<br />
|journal=Nature Genetics<br />
|pubmed=27158779<br />
|volume=48(6)<br />
|link=http://dx.doi.org/10.1038/ng.3558<br />
|page=648-56<br />
|pub_year=2016<br />
|pdf=NatureGenetics_CPLANE_2016.pdf<br />
}}<br />
<li value="164"> {{Paper<br />
|title=Predicting Drug Synergy and Antagonism from Genetic Interaction Neighborhoods<br />
|authors=Young JH, Marcotte EM<br />
|journal=bioRxiv<br />
|pubmed=<br />
|volume=<br />
|link=http://dx.doi.org/10.1101/050567<br />
|page=deposited April 27<br />
|pub_year=2016<br />
}}<br />
<li value="163"> {{Paper<br />
|title=Predictability of Genetic Interactions from Functional Gene Modules<br />
|authors=Young JH, Marcotte EM<br />
|journal=G3<br />
|pubmed=28007839<br />
|volume=7<br />
|pdf=G3_GeneticInteractions_2017.pdf<br />
|link=http://www.g3journal.org/content/early/2016/12/19/g3.116.035915.abstract<br />
|page=617-624<br />
|pub_year=2016<br />
|comment = [http://dx.doi.org/10.1101/049627 bioRxiv preprint (deposited April 25, 2016)]<br />
}}<br />
<li value="162"> {{Paper<br />
|title=Sperm is epigenetically programmed to regulate gene transcription in embryos<br />
|authors=Teperek M, Simeone A, Gaggioli V, Miyamoto K, Allen G, Erkek S, Peters A, Kwon T, Marcotte E, Zegerman P, Bradshaw C, Gurdon J, Jullien J<br />
|journal=Genome Research <br />
|pubmed=27034506<br />
|volume=26<br />
|pdf=GenomeResearch_SpermEpigenetics_2016.pdf<br />
|page=1034-1046<br />
|link=http://dx.doi.org/10.1101/gr.201541.115 <br />
|pub_year=2016<br />
}}<br />
<li value="161"> {{Paper<br />
|title=Towards Consensus Gene Ages<br />
|authors=Liebeskind BJ, McWhite CD, Marcotte EM<br />
|journal=Genome Biology and Evolution<br />
|pubmed=27259914<br />
|volume=8(6)<br />
|pdf=GenomeBiolEvol_ConsensusGeneAges_2016.pdf<br />
|link=http://dx.doi.org/10.1093/gbe/evw113<br />
|page=1812-23<br />
|comment = [http://biorxiv.org/content/early/2016/03/01/042036 bioRxiv preprint (deposited March 1)] [https://github.com/marcottelab/Gene-Ages Supporting code and datasets]<br />
|pub_year=2016<br />
}}<br />
<li value="160"> {{Paper<br />
|title=UVnovo: A de Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry<br />
|authors=Robotham SA, Horton AP, Cannon JR, Cotham VC, Marcotte EM, Brodbelt JS<br />
|journal=Analytical Chemistry<br />
|pubmed=26938041<br />
|volume=88(7)<br />
|pdf=AnalyticalChemistry_UVnovo_2016.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/acs.analchem.6b00261<br />
|page=3990-7<br />
|comment = [https://github.com/marcottelab/UVnovo Supporting code]<br />
|pub_year=2016<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2015 ==<br />
<ol><br />
<li value="159"> {{Paper<br />
|title=Interrogating conserved elements of diseases using Boolean combinations of orthologous phenotypes<br />
|authors=Woods JO, Tien M, Marcotte EM<br />
|journal=bioRxiv<br />
|pub_year=2015<br />
|volume=posted April 13<br />
|page=<br />
|link=https://www.biorxiv.org/content/10.1101/017947v1<br />
}}<br />
<li value="158"> {{Paper<br />
|title=Proteome-wide dataset supporting the study of ancient metazoan macromolecular complexes<br />
|authors=Phanse S, Wan C, Borgeson B, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Data in Brief<br />
|pubmed=26870755<br />
|volume=6<br />
|link=http://dx.doi.org/10.1016/j.dib.2015.11.062<br />
|page=715-21<br />
|pub_year=2015<br />
|pdf=Data_In_Brief_AnimalComplexes_2016.pdf<br />
}}<br />
<li value="157"> {{Paper<br />
|title=MouseNet v2: A database of gene networks for studying the laboratory mouse and eight other model vertebrates<br />
|authors=Kim E, Hwang S, Kim H, Shim H, Kang B, Yang S, Shim JH, Shin SY, Marcotte EM, Lee I<br />
|journal=Nucl. Acid. Res.<br />
|pubmed=26527726<br />
|volume=44(D1)<br />
|link=http://dx.doi.org/10.1093/nar/gkv1155<br />
|page=D848-54<br />
|pdf=NAR_MouseNet2_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="156"> {{Paper<br />
|title=Intrinsic antimicrobial resistance determinants in the 'superbug' P. aeruginosa<br />
|authors=Murray J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=mBio<br />
|pubmed=26507235<br />
|volume=6(6)<br />
|link=http://dx.doi.org/10.1128/mBio.01603-15 <br />
|page=e01603-15<br />
|pdf=mBio_Murray_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="155"> {{Paper<br />
|title=Long-term neural and physiological phenotyping of a single human<br />
|authors=Poldrack RA, Laumann T, Koyejo O, Gregory B, Hover A, Chen M-Y, Luci J, Huk A, Joo S-J, Boyd R, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn D, Mckay DR, Blangero J, Frick L, Marcotte EM, Mumford JA<br />
|journal=Nature Communications<br />
|pubmed=26648521<br />
|pdf=NatureCommunications_Poldrackome_2015.pdf<br />
|volume=6<br />
|link=http://dx.doi.org/10.1038/ncomms9885<br />
|page=Article #8885<br />
|pub_year=2015<br />
}}<br />
<li value="154"> {{Paper<br />
|title=Systematic comparison of variant calling pipelines using gold standard personal exome variants<br />
|authors=Hwang S, Eiru K, Lee I, Marcotte EM<br />
|journal=Scientific Reports<br />
|pubmed=26639839<br />
|volume=5<br />
|link=http://dx.doi.org/10.1038/srep17875<br />
|comment=[http://www.marcottelab.org/paper-pdfs/VariantCallingParameterSettings.txt Example variant calling parameters] [http://www.marcottelab.org/paper-pdfs/BEDsandGoldstandardVCFs.zip Gold standard vcf and exome capture region bed files]<br />
|page=17875<br />
|pdf=ScientificReports_Variants_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="153"> {{Paper<br />
|title=Efforts to make and apply humanized yeast<br />
|authors=Laurent JM, Young JH, Kachroo AH, Marcotte EM<br />
|journal=Briefings in Functional Genomics<br />
|pubmed=26462863<br />
|volume=15(2)<br />
|link=http://dx.doi.org/10.1093/bfgp/elv041<br />
|page=155-63<br />
|pdf=BriefingsInFunctionalGenomics_HumanizedYeast_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="152"> {{Paper<br />
|title=Panorama of ancient metazoan macromolecular complexes<br />
|authors=Wan C, Borgeson B, Phanse S, Tu F, Drew K, Clark G, Xiong X, Kagan O, Kwan J, Bezginov A, Chessman K, Pal S, Cromar G, Papoulas O, Ni Z, Boutz DR, Stoilova S, Havugimana PC, Guo X, Malty RH, Sarov M, Greenblatt J, Babu M, Derry WB, R Tillier E, Wallingford JB, Parkinson J, Marcotte EM, Emili A<br />
|journal=Nature<br />
|pubmed=26344197<br />
|volume=525<br />
|page=339–344<br />
|link=http://dx.doi.org/10.1038/nature14877<br />
|pdf=Nature_AnimalComplexes_2015.pdf<br />
|comment=Supplementary data is available [http://www.nature.com/nature/journal/vaop/ncurrent/full/nature14877.html#supplementary-information here]. [http://metazoa.med.utoronto.ca/ Supporting web site]<br />
|pub_year=2015<br />
}}<br />
<li value="151"> {{Paper<br />
|title=Applications of comparative evolution to human disease genetics<br />
|authors=McWhite CD, Liebeskind BJ, Marcotte EM<br />
|journal=Current Opinion in Genetics & Development<br />
|pubmed=26338499<br />
|volume=35<br />
|page=16–24<br />
|link=http://dx.doi.org/10.1016/j.gde.2015.08.004<br />
|pdf=COGD_comparativeevolution_2015.pdf<br />
|comment=COGD supplies a direct link around their paywall for [http://authors.elsevier.com/a/1ReqI,LqAZ3H8k free access to the paper]<br />
|pub_year=2015<br />
}}<br />
<li value="150"> {{Paper<br />
|title=Controlled Measurement and Comparative Analysis of Cellular Components in E. coli Reveals Broad Regulatory Changes in Response to Glucose Starvation<br />
|authors=Houser JR, Barnhart C, Boutz DR, Carroll SM, Dasgupta A, Michener JK, Needham BD, Papoulas O, Sridhara V, Sydykova DK, Marx CJ, Trent MS, Barrick JE, Marcotte EM, Wilke CO<br />
|journal=PLoS Computational Biology<br />
|pubmed=26275208<br />
|volume=11(8)<br />
|page=e1004400<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004400<br />
|pdf=PLoSComputationalBiology_GlucoseStarvation_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="149"> {{Paper<br />
|title=Systematic humanization of yeast genes reveals conserved functions and genetic modularity<br />
|authors=Kachroo AH, Laurent JM, Yellman CM, Meyer AG, Wilke CO, Marcotte EM <br />
|journal=Science<br />
|pubmed=25999509<br />
|volume=348(6237)<br />
|page=921-925<br />
|link=http://www.sciencemag.org/content/348/6237/921.abstract.html<br />
|pdf=Science_HumanizedYeast_2015.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Science_HumanizedYeast_2015_SupplementaryMaterials.pdf Supplement] [http://www.sciencemag.org/content/348/6237/921/suppl/DC1 Supplementary Tables and Files] Science magazine supplies a direct link around their paywall for free access to the [http://www.sciencemag.org/cgi/content/full/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci manuscript] and [http://www.sciencemag.org/cgi/rapidpdf/348/6237/921?ijkey=Bbngd7YBvhX9s&keytype=ref&siteid=sci pdf reprint]. Code and data for protein interaction evolution simulations are [https://github.com/wilkelab/complex_divergence_simul here]<br />
|pub_year=2015<br />
}}<br />
<li value="148"> {{Paper<br />
|title=Modes of Interaction between Individuals Dominate the Topologies of Real World Networks<br />
|authors=Lee I, Kim E, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=25793969<br />
|volume=10(3)<br />
|page=e0121248<br />
|link=http://dx.doi.org/10.1371/journal.pone.0121248<br />
|pdf=PLoSOne_NetworkTopology_2015.pdf<br />
|pub_year=2015<br />
}}<br />
<li value="147"> {{Paper<br />
|title=The DEAH-box helicase Dhr1 dissociates U3 from the pre-rRNA to promote folding the central pseudoknot<br />
|authors=Sardana R, Liu X, Granneman S, Zhu J, Gill M, Papoulas O, Marcotte EM, Tollervey D, Correll CC, Johnson AW<br />
|journal=PLoS Biology<br />
|pubmed=25710520<br />
|volume=13(2)<br />
|page=e1002083<br />
|pdf=PLoSBiology_DHR1_2015.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1002083<br />
|pub_year=2015<br />
}}<br />
<li value="146"> {{Paper<br />
|title=A self-assembling lanthanide molecular nanoparticle for optical imaging<br />
|authors=Brown KA, Yang X, Schipper D, Hall JW, DePue LJ, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Jones RA<br />
|journal=Dalton Transactions<br />
|pubmed=25512085<br />
|volume=44(6)<br />
|page=2667-75<br />
|pub_year=2015<br />
|link=http://dx.doi.org/10.1039/c4dt02646b<br />
|pdf=DaltonTransactions_LanthanideNanoparticle_2015.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2014 ==<br />
<ol><br />
<li value="145"> {{Paper<br />
|title= A theoretical justification for single molecule peptide sequencing<br />
|authors=Swaminathan J, Boulgakov AA, Marcotte EM<br />
|journal=PLoS Computational Biology<br />
|pubmed=25714988<br />
|volume=11(2)<br />
|page=e1004080<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1004080<br />
|pdf=PLoSComputationalBiology_SingleMoleculeProteomics_2015.pdf<br />
|comment=[http://dx.doi.org/10.1101/010587 bioRxiv preprint]<br />
|pub_year=2014 bioRxiv, 2015 PLoS CB<br />
}}<br />
<li value="144"> {{Paper<br />
|title=Lanthanide nano-drums: A new class of molecular nanoparticles for potential biomedical applications<br />
|authors=Jones RA, Gnanam AJ, Arambula JF, Jones JN, Swaminathan J, Yang X, Schipper D, Hall JW, DePue LJ, Dieye Y, Vadivelu J, Chandler DJ, Marcotte EM, Sessler JL, Ehrlich LIR, Brown KA<br />
|journal=Faraday Discussions<br />
|pubmed=25284181<br />
|volume=175<br />
|page=241-55<br />
|link=http://dx.doi.org/10.1039/C4FD00117F<br />
|pub_year=2014<br />
|pdf=FaradayDiscussions_LanthanideNanodrums_2014.pdf<br />
}}<br />
<li value="143"> {{Paper<br />
|title=Identifying direct targets of transcription factor Rfx2 that coordinate ciliogenesis and cell movement<br />
|authors=Kwon T, Chung M-I, Gupta R, Baker JC, Wallingford JB, Marcotte EM<br />
|journal=Genomics Data<br />
|pubmed=25419512<br />
|volume=2<br />
|page=192-194<br />
|link=http://www.sciencedirect.com/science/article/pii/S2213596014000488<br />
|pub_year=2014<br />
|pdf=GenomicsData_RFX2_2014.pdf<br />
}}<br />
<li value="142"> {{Paper<br />
|title=MORPHIN: a web tool for human disease research by projecting model organism biology onto a human integrated gene network<br />
|authors=Hwang S, Kim E, Yang S, Marcotte EM, Lee I<br />
|journal=Nucleic Acids Research<br />
|pubmed=24861622<br />
|volume=42(Web Server issue)<br />
|page=W147-53<br />
|link=http://dx.doi.org/10.1093/nar/gku434<br />
|pub_year=2014<br />
|pdf=NAR_MORPHIN_2014.pdf<br />
}}<br />
<li value="141"> {{Paper<br />
|title=Protein-to-mRNA ratios are conserved between <i>Pseudomonas aeruginosa</i> strains<br />
|authors=Kwon T, Huse HK, Vogel C, Whiteley M, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=24742327<br />
|pdf=JProteomeResearch_Pseudomonas_2014.pdf<br />
|volume=13(5)<br />
|page=2370-80<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr4011684<br />
|pub_year=2014<br />
}}<br />
<li value="140"> {{Paper<br />
|title=Proteomic identification of monoclonal antibodies from serum<br />
|authors=Boutz DR, Horton AP, Wine Y, Lavinder JJ, Georgiou G, Marcotte EM<br />
|journal=Analytical Chemistry<br />
|pubmed=24684310<br />
|volume=86(10)<br />
|page=4758-66<br />
|pdf=AnalyticalChemistry_IgGProteomics_2014.pdf<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac4037679<br />
|pub_year=2014<br />
}}<br />
<li value="139"> {{Paper<br />
|title=Formation of intracellular glutamine synthetase bodies depends strongly upon cellular age and glucose availability<br />
|authors=O’Connell JD, Tsechansky M, West-Driga M, Marcotte EM<br />
|journal=PeerJ PrePrints<br />
|pubmed=<br />
|pdf=PeerJPreprints_GSBodies_2014.pdf<br />
|volume=2<br />
|page=e270v1<br />
|link=http://dx.doi.org/10.7287/peerj.preprints.270v1<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="138"> {{Paper<br />
|title=A proteomic survey of widespread protein aggregation in yeast<br />
|authors=O’Connell JD, Tsechansky M, Royall A, Boutz DR, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24488121<br />
|volume=10<br />
|pdf=MolecularBioSystems_Aggregates_2014.pdf<br />
|page=851-861<br />
|link=http://dx.doi.org/10.1039/C3MB70508K<br />
|pub_year=2014<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularBioSystems_Aggregates_2014_SupplementalTables.pdf Supplement] [http://marcottelab.org/index.php/Widespreadaggregation.2013 Supporting Datasets]<br />
}}<br />
</li><br />
<li value="137"> {{Paper<br />
|title=Bacteriophages use an expanded genetic code on evolutionary paths to higher fitness<br />
|authors=Hammerling MJ, Ellefson JW, Boutz DR, Marcotte EM, Ellington AD, Barrick JE<br />
|journal=Nature Chemical Biology<br />
|pubmed=24487692<br />
|volume=10(3)<br />
|link=http://www.nature.com/nchembio/journal/vaop/ncurrent/full/nchembio.1450.html<br />
|pdf=NatureChemBio_Phage_2014.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S1.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S2.xlsx Supplemental Data 1] [http://www.marcottelab.org/paper-pdfs/NatureChemBio_Phage_2014-S3.xlsx Supplemental Data 2]<br />
|page=178-80<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="136"> {{Paper<br />
|title=Yeast cells expressing the human mitochondrial DNA polymerase reveal correlations between polymerase fidelity and human disease progression<br />
|authors=Qian Y, Kachroo A, Yellman CM, Marcotte EM, Johnson KA<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=24398692<br />
|volume=289<br />
|pdf=JBiolChem_hPOLG_2014.pdf<br />
|page=5970-5985<br />
|link=http://dx.doi.org/10.1074/jbc.M113.526418<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="135"> {{Paper<br />
|title=Identification and characterization of the constituent human serum antibodies elicited by vaccination<br />
|authors=Lavinder JJ, Wine Y, Giesecke C, Ippolito GC, Horton AP, Lungu OI, Hoi KH, Dekosky BJ, Murrin EM, Wirth MM, Ellington AD, Dörner T, Marcotte EM, Boutz DR, Georgiou G<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=24469811<br />
|volume=111(6)<br />
|page=2259-64<br />
|pdf=PNAS_Tetanus_2014.pdf<br />
|pub_year=2014<br />
|link=http://www.pnas.org/content/early/2014/01/23/1317793111.abstract<br />
}}<br />
</li><br />
<li value="134"> {{Paper<br />
|title=Revisiting and revising the purinosome<br />
|authors=Zhao A, Tsechansky M, Ellington AD, Marcotte EM<br />
|journal=Molecular BioSystems<br />
|pubmed=24413256<br />
|volume=10(3)<br />
|link=http://dx.doi.org/10.1039/C3MB70397E <br />
|page=369-74<br />
|pdf=MolecularBioSystems_RevisitingPurinosome_2013.pdf<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="133"> {{Paper<br />
|title=Coordinated genomic control of ciliogenesis and cell movement by Rfx2<br />
|authors=Chung MI*, Kwon T*, Tu F, Brooks ER, Gupta R, Meyer M, Baker JC, Marcotte EM, Wallingford JB<br />
|journal=eLife<br />
|pubmed=24424412<br />
|pdf=eLife_RFX2_2014.pdf<br />
|volume=3<br />
|page=e01439<br />
|link=http://dx.doi.org/10.7554/eLife.01439<br />
|pub_year=2014<br />
|comment=[[ChungKwon2013_RFX2|Supplement]]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2013 ==<br />
<ol><br />
<li value="132"> {{Paper<br />
|title=Statistical approach to protein quantification<br />
|authors=Gerster S, Kwon T, Ludwig C, Matondo M, Vogel C, Marcotte E, Aebersold R, Bühlmann P<br />
|journal=Mol Cell Proteomics<br />
|pubmed=24255132<br />
|volume=13(2)<br />
|link=http://dx.doi.org/10.1074/mcp.M112.025445<br />
|pdf=MolecularCellularProteomics_Gerster_2014.pdf<br />
|page=666-77<br />
|pub_year=2014<br />
}}<br />
</li><br />
<li value="131"> {{Paper<br />
|title=<i>Pseudomonas aeruginosa</i> enhances production of a non-alginate exopolysaccharide during long-term colonization of the cystic fibrosis lung<br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=PLoS One<br />
|pubmed=24324811<br />
|volume=8(12)<br />
|page=e82621<br />
|pdf=PLoSOne_PsI_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0082621<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="130"> {{Paper<br />
|title=A bacteriophage tailspike domain promotes self-cleavage of a human membrane-bound transcription factor, the myelin regulatory factor MYRF<br />
|authors=Li Z*, Park Y*, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=<br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001624<br />
|page=e1001624<br />
|volume=11(8)<br />
|pub_year=2013<br />
|pdf=PLoSBiology_MYRF_2013.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001626 Commentary]<br />
}}<br />
</li><br />
<li value="129"> {{Paper<br />
|title=Prediction of gene-phenotype associations in humans, mice, and plants using phenologs<br />
|authors=Woods JO, Singh-Blom UM, Laurent JM, McGary KL, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pubmed=23800157<br />
|page=203<br />
|volume=14<br />
|link=http://dx.doi.org/10.1186/1471-2105-14-203<br />
|pub_year=2013<br />
|pdf=BMCBioinformatics_Phenologs_2013.pdf<br />
}}<br />
</li><br />
<li value="128"> {{Paper<br />
|title=Prediction and validation of gene-disease associations using methods inspired by social network analyses<br />
|authors=Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=<br />
|volume=8(5)<br />
|page=e58977<br />
|pub_year=2013<br />
|pubmed=23650495<br />
|pdf=PLoSOne_Catapult_2013.pdf<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0058977<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_Catapult_2013_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="127"> {{Paper<br />
|title=The proteomic response to mutants of the ''Escherichia coli'' RNA degradosome<br />
|authors=Zhou L, Zhang AB, Wang R, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1039/C3MB25513A<br />
|volume=9<br />
|page=750-757<br />
|pdf=MolecularBioSystems_RNADegradosome_2013.pdf<br />
|pubmed=23403814<br />
|pub_year=2013<br />
}}<br />
</li><br />
<li value="126"> {{Paper<br />
|title=Molecular deconvolution of the monoclonal antibodies that comprise the polyclonal serum response<br />
|authors=Wine Y, Boutz DR, Lavinder JJ, Miklos AE, Hughes RA, Hoi KH, Jung ST, Horton AP, Murrin EM, Ellington AD, Marcotte EM, Georgiou G <br />
|journal=Proc Natl Acad Sci USA <br />
|pubmed=23382245<br />
|volume=110(8)<br />
|page=2993–2998<br />
|pdf=PNAS_IgGProfiling_2013.pdf<br />
|pub_year=2013<br />
|link=http://www.pnas.org/content/early/2013/02/01/1213737110.abstract <br />
}}<br />
</li><br />
<li value="125"> {{Paper<br />
|title=Transiently transfected purine biosynthetic enzymes form stress bodies<br />
|authors=Zhao A, Tsechansky M, Swaminathan J, Cook L, Ellington AD, Marcotte EM<br />
|journal=PLoS One<br />
|pubmed=23405267<br />
|volume=8(2)<br />
|page=e56203<br />
|pdf=PLoSOne_PurinosomeAggregation_2013.pdf<br />
|link=http://dx.plos.org/10.1371/journal.pone.0056203<br />
|pub_year=2013<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2012 ==<br />
<ol><br />
<li value="124"> {{Paper<br />
|title=RIDDLE: Reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network<br />
|authors=Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM<br />
|journal=Genome Biology<br />
|pubmed=23268829<br />
|volume=13(12)<br />
|page=R125<br />
|link=http://genomebiology.com/2012/13/12/R125/abstract<br />
|pdf=GenomeBiology_RIDDLE_2012.pdf<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="123"> {{Paper<br />
|title=The role of Pseudomonas aeruginosa peptidoglycan-associated outer membrane proteins in vesicle formation<br />
|authors=Wessel AK, Liew J, Kwon T, Marcotte EM, Whiteley M<br />
|journal=J Bacteriol<br />
|pubmed=23123904<br />
|page=213-9<br />
|volume=195(2)<br />
|link=http://jb.asm.org/content/early/2012/10/30/JB.01253-12.abstract<br />
|pdf=JBacteriol_Wessel_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/index.php/PSEAE_oprF.2012 Supplemental data]<br />
}}<br />
</li><br />
<li value="122"> {{Paper<br />
|title=Flaws in evaluation schemes for pair-input computational predictions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Nature Methods<br />
|pubmed=23223166<br />
|pdf=NatureMethods_FlawedPPICrossValidation_2012.pdf<br />
|volume=9(12)<br />
|page=1134–1136<br />
|link=http://dx.doi.org/10.1038/nmeth.2259<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureMethods_FlawedPPICrossValidation_2012_Supplement.pdf Supplement]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="121"> {{Paper<br />
|title=Census of human soluble protein complexes<br />
|authors=Havugimana PC, Hart GT, Nepusz T, Yang H, Turinsky AL, Li Z, Wang P, Boutz DR, Fong V, Babu M, Craig SA, Hu P, Phanse S, Wan C, Vlasblom J, Dar V, Bezginov A, Wu GC, Wodak SJ, Tillier ERM, Paccanaro A, Marcotte EM, Emili A<br />
|journal=Cell<br />
|pubmed=22939629<br />
|volume=150<br />
|page=1068-1081<br />
|link=http://www.cell.com/abstract/S0092-8674%2812%2901006-9<br />
|pdf=Cell_HumanProteinComplexes_2012.pdf<br />
|comment=[http://human.med.utoronto.ca/ Supporting web site] [http://www.marcottelab.org/paper-pdfs/Cell_HumanProteinComplexes_2012_ResearchHighlight.pdf Research highlight]<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="120"> {{Paper<br />
|title=Id2a functions to limit Notch pathway activity and thereby influence retinoblast proliferation to differentiation of retinoblasts during zebrafish retinogenesis<br />
|authors=Uribe RA, Kwon T, Marcotte EM, Gross JM<br />
|journal=Developmental Biology<br />
|pubmed=22981606<br />
|page=280–292<br />
|volume=371<br />
|pdf=DevelopmentalBiology_Id2a_2012.pdf<br />
|link=http://www.sciencedirect.com/science/article/pii/S0012160612004915<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="119"> {{Paper<br />
|title=Evolutionarily repurposed networks reveal the well-known antifungal drug thiabendazole to be a novel vascular disrupting agent<br />
|authors=Cha HJ, Byrom M, Mead PE, Ellington AD, Wallingford JB, Marcotte EM<br />
|journal=PLoS Biology<br />
|pubmed=22927795<br />
|volume=10(8)<br />
|link=http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379<br />
|pdf=PLoSBiology_TBZ_2012.pdf<br />
|page=e1001379<br />
|pub_year=2012<br />
|comment=[http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001379#s4 Supplemental Material] [http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001380 Synopsis] [http://www.nytimes.com/2012/08/21/health/research/clues-to-fighting-cancer-are-found-in-the-genes-of-yeast.html NY Times] [http://publications.nigms.nih.gov/multimedia/repurposing-genes-drugs.html NIGMS video]<br />
}}<br />
</li><br />
<li value="118"> {{Paper<br />
|title=Dynamic reorganization of metabolic enzymes into intracellular bodies <br />
|authors=O'Connell JD, Zhao A, Ellington AD, Marcotte EM<br />
|journal=Annual Review of Cell and Developmental Biology<br />
|pubmed=23057741<br />
|volume=28 <br />
|link=http://www.annualreviews.org/doi/abs/10.1146/annurev-cellbio-101011-155841<br />
|page=89-111<br />
|pub_year=2012<br />
|pdf=AnnualReview_OConnell_2012.pdf<br />
}}<br />
</li><br />
<li value="117"> {{Paper<br />
|title=Insights into the regulation of protein abundance from proteomic and transcriptomic analyses <br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Reviews Genetics<br />
|pubmed=22411467<br />
|volume=13<br />
|link=http://dx.doi.org/10.1038/nrg3185<br />
|pdf=NatureReviewsGenetics_ProteinAbundanceRegulation_2012.pdf<br />
|page=227-232<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="116"> {{Paper<br />
|title=Proteomic and protein interaction network analysis of human T lymphocytes during cell-cycle entry <br />
|authors=Orr SJ, Boutz DR, Wang R, Chronis C, Lea NC, Thayaparan T, Hamilton E, Milewicz H, Blanc E, Mufti GJ, Marcotte EM, Thomas NSB <br />
|journal=Molecular Systems Biology<br />
|pubmed=22415777<br />
|volume=8<br />
|pdf=MolecularSystemsBiology_TCellCycleEntry_2012.pdf<br />
|link=http://www.nature.com/msb/journal/v8/n1/full/msb20125.html<br />
|comment=[http://www.nature.com/msb/journal/v8/n1/suppinfo/msb20125_S1.html Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_TCellCycleEntry_2012_Reviews.pdf Reviewer comments]<br />
|page=573<br />
|pub_year=2012<br />
}}<br />
</li><br />
<li value="115"> {{Paper<br />
|title=RFX2 is broadly required for ciliogenesis during vertebrate development<br />
|authors=Chung M-I, Peyrot S, LeBoeuf S, Park TJ, McGary KL, Marcotte EM, Wallingford JB<br />
|journal=Developmental Biology<br />
|pubmed=22227339<br />
|volume=363(1)<br />
|page=155-165<br />
|link=http://dx.doi.org/10.1016/j.ydbio.2011.12.029<br />
|pdf=DevelopmentalBiology_RFX2_2012.pdf<br />
|pub_year=2012<br />
|comment=[http://www.marcottelab.org/paper-pdfs/DevelopmentalBiology_RFX2_2011_SOM.pdf Supplement]<br />
}}<br />
</li><br />
<li value="114"> {{Paper<br />
|title=Label-free quantitation using weighted spectral counting<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Methods in Molecular Biology: Quantitative Methods in Proteomics<br />
|pubmed=22665309<br />
|pub_year=2012<br />
|volume=Marcus, K., ed., Humana Press, vol. 893(3)<br />
|page=321-341 <br />
|link=http://www.springerlink.com/content/ll221655443866x8/#section=1079488&page=1<br />
|pdf=MethodsMolBioProteomics_VogelMarcotte_2012.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2011 ==<br />
<ol><br />
<li value="113"> {{Paper<br />
|title=Genetic dissection of the biotic stress response using a genome-scale gene network for rice<br />
|authors=Lee I, Seo Y-S, Coltrane D, Hwang S, Oha T, Marcotte EM, Ronald PC<br />
|journal=Proc Natl Acad Sci USA<br />
|pubmed=22042862<br />
|page=18548-18553<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.1110384108<br />
|pdf=PNAS_RiceNet_2011_withSupplement.pdf<br />
|pub_year=2011<br />
|volume=108(45)<br />
|comment=[http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1110384108/-/DCSupplemental Supplement]<br />
}}<br />
</li><br />
<li value="112"> {{Paper<br />
|title=Predicting gene-disease associations using multiple species data<br />
|authors=Natarajan N, Blom UM, Tewari A, Woods JO, Dhillon IS, Marcotte EM<br />
|journal=UTCS Technical Report<br />
|pubmed=<br />
|page=<br />
|pdf=TechnicalReport-PhenoNets-TR-2053.pdf<br />
|link=http://apps.cs.utexas.edu/tech_reports/ncstrl/ncstrl2html.php?what=TR%20Abstracts&when=2011#UTEXAS.CS//CS-TR-11-37<br />
|pub_year=2011<br />
|volume=TR-11-37<br />
}}<br />
</li><br />
<li value="111"> {{Paper<br />
|title=Global protein expression regulation under oxidative stress<br />
|authors=Vogel C, Silva GM, Marcotte EM<br />
|journal=Molecular and Cellular Proteomics<br />
|pubmed=21933953<br />
|page=M111.009217 <br />
|link=http://dx.doi.org/10.1074/mcp.M111.009217<br />
|pdf=MolecularCellularProteomics_OxidativeProteomics_2011.pdf<br />
|pub_year=2011<br />
|volume=10(12)<br />
|comment=[http://www.mcponline.org/content/early/2011/09/20/mcp.M111.009217/suppl/DC1 Supplement]<br />
}}<br />
</li><br />
<li value="110"> {{Paper<br />
|title=Revisiting the negative example sampling problem for predicting protein-protein interactions<br />
|authors=Park Y, Marcotte EM<br />
|journal=Bioinformatics<br />
|pubmed=21908540<br />
|page=3024-3028<br />
|pub_year=2011<br />
|volume=27(21)<br />
|pdf=Bioinformatics_NegativePPISampling_2011.pdf<br />
|link=http://dx.doi.org/10.1093/bioinformatics/btr514<br />
|comment=[http://www.marcottelab.org/PPINegativeDataSampling/ Supplemental Data]<br />
}}<br />
</li><br />
<li value="109"> {{Paper<br />
|title=Systematic prediction of gene function using a probabilistic functional gene network for <i>Arabidopsis thaliana</i><br />
|authors=Hwang S, Rhee SY, Marcotte EM, Lee I<br />
|journal=Nature Protocols<br />
|pubmed=21886106<br />
|pub_year=2011<br />
|volume=6<br />
|pdf=NatureProtocols_AraNet_2011.pdf<br />
|page=1429–1442<br />
|link=http://dx.doi.org/10.1038/nprot.2011.372<br />
}}<br />
</li><br />
<li value="108"> {{Paper<br />
|title=Prioritizing candidate disease genes by network-based boosting of genome-wide association data<br />
|authors=Lee I, Blom M, Wang PI, Shim JE, Marcotte EM<br />
|journal=Genome Research<br />
|pubmed=21536720<br />
|pub_year=2011<br />
|volume=21(7)<br />
|pdf=GenomeResearch_HumanNet_2011.pdf<br />
|page=1109-21<br />
|link=http://dx.doi.org/10.1101/gr.118992.110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_HumanNet_2011_SOM.pdf Supplement] [http://www.functionalnet.org/humannet/ HumanNet web site]<br />
}}<br />
</li><br />
<li value="107"> {{Paper<br />
|title=MSblender: a probabilistic approach for integrating peptide identifications from multiple database search engines<br />
|authors=Kwon T, Choi H, Vogel C, Nesvizhskii AI, Marcotte EM<br />
|journal=Journal of Proteome Research<br />
|pubmed=21488652<br />
|pub_year=2011<br />
|volume=10(7)<br />
|pdf=JProteomeResearch_MSBlender_2011.pdf<br />
|page=2949-58<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr2002116<br />
|comment=Supplemental Figures [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S1.pdf 1] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S2.pdf 2] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S3.pdf 3] [http://www.marcottelab.org/paper-pdfs/JProteomeResearch_MSBlender_2011_S4.pdf 4] [http://www.marcottelab.org/index.php/MSblender Supporting web site]<br />
}}<br />
</li><br />
<li value="106"> {{Paper<br />
|title=A two-tiered approach identifies a network of cancer and liver diseases related genes regulated by miR-122<br />
|authors=Boutz DR, Collins P, Suresh U, Lu M, Ramírez CM, Fernández-Hernando C, Huang Y, de Sousa Abreu R, Le SY, Shapiro BA, Liu AM, Luk JM, Aldred SF, Trinklein N, Marcotte EM, Penalva LO<br />
|journal=Journal of Biological Chemistry<br />
|pubmed=21402708<br />
|pub_year=2011<br />
|volume=286(20)<br />
|pdf=JBC_miR-122_2011.pdf<br />
|page=18066-78<br />
|link=http://www.jbc.org/content/early/2011/03/14/jbc.M110.196451<br />
}}<br />
</li><br />
<li value="105"> {{Paper<br />
|title=High-throughput immunofluorescence microscopy using yeast spheroplast microarrays<br />
|authors=Niu W, Hart GT, Marcotte EM<br />
|journal=Methods in Molecular Biology: Cell-Based Microarrays<br />
|pub_year=2011<br />
|volume=Palmer, E., ed., Humana Press, vol. 706<br />
|page=83-95<br />
|pubmed=21104056<br />
|pdf=MethodsMolBioCellBasedMicroarrays_Niu_2010.pdf<br />
}}<br />
</li><br />
<li value="104"> {{Paper<br />
|title=A role for central spindle proteins in cilia structure and function<br />
|authors=Smith KR, Kieserman EK, Wang PI, Basten SG, Giles RH, Marcotte EM, Wallingford JB<br />
|journal=Cytoskeleton<br />
|pubmed=21140514<br />
|pub_year=2011<br />
|volume=68(2)<br />
|pdf=Cytoskeleton_ciliamidbody_2011.pdf<br />
|page=112-24<br />
|link=http://dx.doi.org/10.1002/cm.20498<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2010 ==<br />
<ol><br />
<br />
<li value="103"> {{Paper<br />
|title=Parallel evolution in <i>Pseudomonas aeruginosa</i> over 39,000 generations <i>in vivo</i><br />
|authors=Huse HK, Kwon T, Zlosnik JEA, Speert DP, Marcotte EM, Whiteley M<br />
|journal=mBIO<br />
|pub_year=2010<br />
|volume=1(4)<br />
|pubmed=20856824<br />
|pdf=mBIO_CFPseudomonas_2010.pdf<br />
|link=http://mbio.asm.org/content/1/4/e00199-10<br />
|page=e00199-10<br />
|comment=[http://www.sciencenews.org/view/generic/id/63939/title/To_researchers%E2%80%99_surprise,_one_Pseudomonas_infection_is_much_like_the_next ScienceNews] [http://www.marcottelab.org/index.php/PSEAE_CF.2010 Supplement] <br />
}}<br />
</li><br />
<li value="102"> {{Paper<br />
|title=Characterising and predicting haploinsufficiency in the human genome<br />
|authors=Huang N, Lee I, Marcotte EM, Hurles M<br />
|journal=PLoS Genetics<br />
|pub_year=2010<br />
|volume=6(10)<br />
|pdf=PLoSGenetics_Haploinsufficiency_2010.pdf<br />
|link=http://dx.doi.org/10.1371/journal.pgen.1001154 <br />
|page=e1001154<br />
|pubmed=20976243<br />
}}<br />
</li><br />
<li value="101"> {{Paper<br />
|title=Protein abundances are more conserved than mRNA abundances across diverse taxa<br />
|authors=Laurent J, Vogel C, Kwon T, Craig SA, Boutz DR, Huse HK, Nozue K, Walia H, Whiteley M, Ronald PC, Marcotte EM<br />
|journal=Proteomics<br />
|pub_year=2010<br />
|volume=10<br />
|pubmed=21089048<br />
|pdf=Proteomics_ProteinVersusRNAConservation_2010.pdf<br />
|link=http://onlinelibrary.wiley.com/doi/10.1002/pmic.201000327/abstract<br />
|page=4209–4212<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MProteomics_ProteinVersusRNAConservation_2010_Supplement.zip Supplement]<br />
}}<br />
</li><br />
<li value="100"> {{Paper<br />
|title=It's the machine that matters: predicting gene function and phenotype from protein networks<br />
|authors=Wang PI, Marcotte EM<br />
|journal=Journal of Proteomics<br />
|pub_year=2010<br />
|volume=73(11)<br />
|pubmed=20637909<br />
|pdf=JProteomics_GBAReview_2010.pdf<br />
|link=http://dx.doi.org/10.1016/j.jprot.2010.07.005<br />
|page=2277-89<br />
}}<br />
</li><br />
<li value="99"> {{Paper<br />
|title=Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line<br />
|authors=Vogel C, de Sousa Abreu R, Ko D, Le S-Y, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO<br />
|journal=Molecular Systems Biology<br />
|pub_year=2010<br />
|pubmed=20739923<br />
|volume=6<br />
|page=article 400<br />
|pdf=MolecularSystemsBiology_2010_HumanProteomics.pdf<br />
|link=http://www.nature.com/msb/journal/v6/n1/full/msb201059.html<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_S1.xls Supplemental Data (Excel format)] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 2 source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3A source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_Fig2SourceData.txt Fig 3B source data] [http://www.marcottelab.org/paper-pdfs/MolecularSystemsBiology_2010_HumanProteomics_NewsAndViews.pdf News and Views]<br />
}}<br />
</li><br />
<li value="98"> {{Paper<br />
|title=Defining the pathway of cytoplasmic maturation of the 60S ribosomal subunit<br />
|authors=Lo K-Y, Li Z, Bussiere C, Bresson S, Marcotte EM, Johnson AW<br />
|journal=Molecular Cell<br />
|pub_year=2010<br />
|volume=39(2)<br />
|page=196-208<br />
|pubmed=20670889<br />
|pdf=MolecularCell_60SBiogenesis_2010.pdf<br />
|link=http://www.cell.com/molecular-cell/fulltext/S1097-2765(10)00459-4<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MolecularCell_60SBiogenesis_2010_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="97"> {{Paper<br />
|title=Predicting genetic modifier loci using functional gene networks<br />
|authors=Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2010<br />
|volume=20<br />
|page=1143-1153<br />
|pubmed=20538624<br />
|pdf=GenomeResearch_GeneticModifiers_2010.pdf<br />
|link=http://dx.doi.org/10.1101/gr.102749.109<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeResearch_GeneticModifiers_2010_SOM.pdf Supplement] [http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg2836.html Nature Reviews Genetics]<br />
}}<br />
</li><br />
<li value="96"> {{Paper<br />
|title=Systematic discovery of nonobvious human disease models through orthologous phenotypes<br />
|authors=McGary KL, Park TJ, Woods JO, Cha HJ, Wallingford JB, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2010<br />
|volume=107(14)<br />
|page=6544-9<br />
|pubmed=20308572<br />
|link=http://www.pnas.org/cgi/doi/10.1073/pnas.0910200107<br />
|pdf=PNAS_Phenologs_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010_Supplement.pdf Supplement] [http://www.nature.com/news/2010/100322/full/news.2010.140.html Nature News] [http://www.the-scientist.com/blog/display/57252/ The Scientist(blog)] [http://www.nytimes.com/2010/04/27/science/27gene.html NY Times] [http://genomebiology.com/2010/11/4/116 Genome Biology]<br />
}}<br />
</li><br />
<li value="95"> {{Paper<br />
|title=Reducing MCM levels in human primary T cells during the G0->G1 transition causes genomic instability during the first cell cycle<br />
|authors=Orr SJ, Gaymes T, Ladon D, Chronis C, Czepulkowski B, Wang R, Mufti GJ, Marcotte EM, Thomas NSB<br />
|journal=Oncogene<br />
|pub_year=2010<br />
|volume=29(26)<br />
|page=3803-14<br />
|link=http://www.nature.com/onc/journal/vaop/ncurrent/abs/onc2010138a.html<br />
|pdf=Oncogene_MCM_2010.pdf<br />
|pubmed=20440261 <br />
}}<br />
</li><br />
<li value="94"> {{Paper<br />
|title=Rational association of genes with traits using a genome-scale gene network for <i>Arabidopsis thaliana</i><br />
|authors=Lee I, Ambaru B, Thakkar P, Marcotte EM, Rhee SY<br />
|journal=Nature Biotechnology<br />
|pub_year=2010<br />
|volume=28(2)<br />
|page=149-156<br />
|pubmed=20118918<br />
|link=https://www.nature.com/articles/nbt.1603<br />
|pdf=NatureBiotech_AraNet_2010.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_AraNet_2010_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/848.full.pdf Honorable Mention in the 2010 Science Visualization Challenge] [http://www.nytimes.com/slideshow/2011/02/17/science/20110217-visualize-6.html New York Times slideshow ]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2009 ==<br />
<ol><br />
<br />
<li value="93"> {{Paper<br />
|title=Rational extension of the ribosome biogenesis pathway using network-guided genetics<br />
|authors=Li Z, Lee I, Moradi E, Hung NJ, White J, Johnson AW, Marcotte EM<br />
|journal=PLoS Biology<br />
|pub_year=2009<br />
|volume=7(10) <br />
|page=e1000213<br />
|pubmed=19806183<br />
|link=http://dx.doi.org/10.1371/journal.pbio.1000213<br />
|pdf=PLoSBiology_RibosomeBiogenesis_2009.pdf<br />
|comment=[http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000213#s5 Supplemental Figures and Tables]<br />
}}<br />
</li><br />
<li value="92"> {{Paper<br />
|title=Human cell chips: adapting DNA microarray spotting technology to cell-based imaging assays<br />
|authors=Hart GT, Zhao A, Garg A, Bolusani S, Marcotte EM<br />
|journal=PLoS One<br />
|pub_year=2009<br />
|volume=4(10)<br />
|page=e7088<br />
|pubmed=19862318<br />
|link=http://dx.doi.org/10.1371/journal.pone.0007088<br />
|pdf=PLoSOne_HumanCellChips_2009.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PLoSOne_HumanCellChips_2009_TableS1.xls Table S1]<br />
}}<br />
</li><br />
<li value="91"> {{Paper<br />
|title=Ribosome stalk assembly requires the dual specificity phosphatase Yvh1 for the exchange of Mrt4 with P0<br />
|authors=Lo KY, Li Z, Wang F, Marcotte EM, Johnson AF<br />
|journal=J. Cell Biology<br />
|pub_year=2009<br />
|volume=186(6)<br />
|page=849-62<br />
|pubmed=19797078<br />
|link=http://dx.doi.org/10.1083/jcb.200904110<br />
|comment=[http://www.marcottelab.org/paper-pdfs/JCellBiol_Yvh1_2009_Supplement.pdf Supplemental material]<br />
||pdf=JCellBiol_Yvh1_2009.pdf<br />
}}<br />
</li><br />
<li value="90"> {{Paper<br />
|title=Absolute abundance for the masses<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2009<br />
|volume=27(9)<br />
|page=825-6<br />
|pubmed=19741640<br />
|link=http://dx.doi.org/10.1038/nbt0909-825<br />
|pdf=NatureBiotech_MSNewsAndViews_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="89"> {{Paper<br />
|title=Global signatures of protein and mRNA expression levels<br />
|authors=de Sousa Abreu R, Penalva LO, Marcotte EM, Vogel C<br />
|journal=Molecular BioSystems<br />
|pub_year=2009<br />
|volume=5<br />
|page=1512–1526<br />
|pubmed=20023718<br />
|link=http://www.rsc.org/Publishing/Journals/MB/article.asp?doi=b908315d<br />
|pdf=MolecularBioSystems_ProteinRNA_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="88"> {{Paper<br />
|title=The planar cell polarity effector protein Fuzzy is essential for targeted membrane trafficking, ciliogenesis, and mouse embryonic development<br />
|authors=Gray RS, Abitua PB, Wlodarczyk BJ, Blanchard O, Lee I, Weiss G, Marcotte EM, Wallingford JB, Finnell RH<br />
|journal=Nature Cell Biology<br />
|pub_year=2009<br />
|volume=11(10)<br />
|page=1225-32<br />
|pubmed=19767740<br />
|link=http://dx.doi.org/10.1038/ncb1966<br />
|comment=[http://www.nature.com/ncb/journal/v11/n10/covers/index.html Journal cover--a beautiful electron micrograph by Phil Abitua] [http://www.marcottelab.org/paper-pdfs/NatureCellBiology_Fuzzy_2009_supplement.pdf Supplemental Figures] [[File:NatureCellBiologyFuzCover2009.jpg||100px|right]]<br />
|pdf=NatureCellBiology_Fuzzy_2009.pdf<br />
}}<br />
</li><br />
<li value="87"> {{Paper<br />
|title=Disorder, promiscuity, and toxic partnerships<br />
|authors=Marcotte EM, Tsechansky M<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=138(1)<br />
|page=16-18<br />
|pubmed=19596229 <br />
|link=http://dx.doi.org/10.1016/j.cell.2009.06.024 <br />
|comment=<br />
|pdf=Cell_LehnerPreview_2009.pdf<br />
}}<br />
</li><br />
<li value="86"> {{Paper<br />
|title=Mining gene functional networks to improve mass-spectrometry based protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Kwon T, Penalva LO, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(22)<br />
|page=2955-2961<br />
|pubmed=19633097 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/reprint/btp461<br />
|pdf=Bioinformatics_MSNet_2009.pdf<br />
|comment=[http://aug.csres.utexas.edu/msnet/ Supplemental Website]<br />
}}<br />
</li><br />
<li value="85"> {{Paper<br />
|title=Widespread reorganization of metabolic enzymes into reversible assemblies upon nutrient starvation<br />
|authors=Narayanaswamy R, Levy M, Tsechansky M, Stovall GM, O'Connell J, Mirrielees J, Ellington AD, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2009<br />
|volume=106(25)<br />
|page=10147-52<br />
|pubmed=19502427 <br />
|link=http://www.pnas.org/content/106/25/10147.long<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_Supplement.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_SupplementalDataset.xls Supplemental Dataset] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS1.pdf Table S1] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS2.pdf Table S2] [http://www.marcottelab.org/paper-pdfs/PNAS_punctatebodies_2009_TableS3.pdf Table S3]<br />
|pdf=PNAS_punctatebodies_2009.pdf<br />
}}<br />
</li><br />
<li value="84"> {{Paper<br />
|title=A synthetic genetic edge detection program<br />
|authors=Tabor JJ, Salis H, Simpson ZB, Chevalier AA, Levskaya A, Marcotte EM, Voigt CA, Ellington AD<br />
|journal=Cell<br />
|pub_year=2009<br />
|volume=137(7)<br />
|page=1272-1281<br />
|pubmed=19563759 <br />
|link=http://dx.doi.org/doi:10.1016/j.cell.2009.04.048 <br />
|comment=[http://www.marcottelab.org/paper-pdfs/Cell_EdgeDetector_2009_Supplement.pdf Supplemental methods]<br />
|pdf=Cell_EdgeDetector_2009.pdf <br />
}}<br />
</li><br />
<li value="83"> {{Paper<br />
|title=Effects of functional bias on supervised learning of a gene network model<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2009<br />
|volume=541<br />
|page=463-75<br />
|pubmed=19381535<br />
|link=http://www.springerlink.com/content/j1726u1h54440624/<br />
|comment=<br />
|pdf=MethodsMolBioCompSysBio_Lee_2009_printersproofs.pdf<br />
}}<br />
</li><br />
<li value="82"> {{Paper<br />
|title=Integrating shotgun proteomics and mRNA expression data to improve protein identification<br />
|authors=Ramakrishnan SR, Vogel C, Prince JT, Wang R, Li Z, Penalva LO, Myers M, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2009<br />
|volume=25(11)<br />
|page=1397-403<br />
|pubmed=19318424 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/25/11/1397<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Bioinformatics_mspresso_2009_Supplement.pdf Supplement] [http://www.marcottelab.org/MSpresso/ Supplemental website]<br />
|pdf=Bioinformatics_mspresso_2009.pdf<br />
}}<br />
</li><br />
<li value="81"> {{Paper<br />
|title=Systematic definition of protein constituents along the major polarization axis reveals an adaptive reuse of the polarization machinery in pheromone-treated budding yeast.<br />
|authors=Narayanaswamy R, Moradi EK, Niu W, Hart GT, Davis M, McGary KL, Ellington AD, Marcotte EM.<br />
|journal=J Proteome Res. <br />
|pub_year=2009<br />
|volume=8(1)<br />
|page=6-19.<br />
|pubmed=19053807<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr800524g<br />
|comment=<br />
|pdf=JProteomeResearch_Shmoo_2008.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2008 ==<br />
<ol><br />
<li value="80"> {{Paper<br />
|authors=Hannay K, Marcotte EM, Vogel C<br />
|title=Buffering by gene duplicates: an analysis of molecular correlates and evolutionary conservation<br />
|journal=BMC Genomics<br />
|pub_year=2008<br />
|volume=9<br />
|page=609<br />
|pubmed=19087332<br />
|link=http://www.biomedcentral.com/1471-2164/9/609<br />
|pdf=BMCGenomics_Buffering_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalNotes.pdf Supplemental Notes] [http://www.marcottelab.org/paper-pdfs/BMCGenomics_Buffering_2008_SupplementalData.xls Supplemental Data]<br />
}}<br />
</li><br />
<li value="79"> {{Paper<br />
|title=The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results<br />
|authors=Braisted JC, Kuntumalla S, Vogel C, Marcotte EM, Rodrigues AR, Wang R, Huang ST, Ferlanti ES, Saeed AI, Fleischmann RD, Peterson SN, Pieper R<br />
|journal=BMC Bioinformatics<br />
|pub_year=2008<br />
|volume=9<br />
|page=529.<br />
|pubmed=19068132<br />
|link=http://www.biomedcentral.com/1471-2105/9/529<br />
|pdf=BMCBioinformatics_APEXTool_2009.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="78"> {{Paper<br />
|title=Age-dependent evolution of the yeast protein interaction network suggests a limited role of gene duplication and divergence<br />
|authors=Kim WK, Marcotte EM<br />
|journal=PLoS Comput Biol<br />
|pub_year=2008<br />
|volume=4(11)<br />
|page=e1000232<br />
|pubmed=19043579<br />
|link=http://dx.doi.org/10.1371/journal.pcbi.1000232<br />
|pdf=PLoSComputationalBiology_PPINetworkEvolution_2008.pdf<br />
|comment=Supporting python code: [http://www.marcottelab.org/paper-pdfs/network_growth_functions_fixed_module.py network_growth_functions_fixed_module.py] Note that this code used an older version of the igraph library (0.4.2); the latest version that we've tested (0.5.2) gives somewhat fewer large clusters than our published clusters due to changes in the function "G.community_fastgreedy()", possibly resulting from modifications to the handling of ties in the community merging process. The previous igraph library (0.4.2) is linked here: [http://www.marcottelab.org/paper-pdfs/python-igraph-0.4.2.tar.gz python-igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph-0.4.2.tar.gz igraph-0.4.2.tar.gz] [http://www.marcottelab.org/paper-pdfs/igraph_base.py igraph_base.py]<br />
}}<br />
</li><br />
<li value="77"> {{Paper<br />
|title=mspire: mass spectrometry proteomics in Ruby<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2008<br />
|volume=24(23)<br />
|page=2796-7<br />
|pubmed=18930952<br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/24/23/2796<br />
|pdf=Bioinformatics_mspire_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="76"> {{Paper<br />
|title=Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data<br />
|authors=Vogel C, Marcotte EM<br />
|journal=Nat Protoc<br />
|pub_year=2008<br />
|volume=3(9)<br />
|page=1444-51.<br />
|pubmed=18772871<br />
|link=http://www.nature.com/nprot/journal/v3/n9/abs/nprot.2008.132.html<br />
|pdf=NatureProtocols_APEX_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureProtocols_APEX_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="75"> {{Paper<br />
|title=Integrating functional genomics data<br />
|authors=Lee I, Marcotte EM<br />
|journal=Methods Mol Biol<br />
|pub_year=2008<br />
|volume=453<br />
|page=267-78.<br />
|pubmed=18712309<br />
|link=http://www.springerlink.com/content/h21044190m77k274/<br />
|pdf=MethodsMolBioBioinformatics_LeeMarcotte_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="74"> {{Paper<br />
|title=Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy<br />
|authors=Kim WK, Krumpelman C, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1:<br />
|page=S5<br />
|pubmed=18613949<br />
|link=http://genomebiology.com/2008/9/S1/S5<br />
|pdf=GenomeBiology_MouseNet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_MouseNet_2008_Supplement.pdf Supplement]<br />
}}<br />
</li><br />
<li value="73"> {{Paper<br />
|title=A critical assessment of <i>Mus musculus</i> gene function prediction using integrated genomic evidence<br />
|authors=Peña-Castillo L, Tasan M, Myers CL, Lee H, Joshi T, Zhang C, Guan Y, Leone M, Pagnani A, Kim WK, Krumpelman C, Tian W, Obozinski G, Qi Y, Mostafavi S, Lin GN, Berriz GF, Gibbons FD, Lanckriet G, Qiu J, Grant C, Barutcuoglu Z, Hill DP, Warde-Farley D, Grouios C, Ray D, Blake JA, Deng M, Jordan MI, Noble WS, Morris Q, Klein-Seetharaman J, Bar-Joseph Z, Chen T, Sun F, Troyanskaya OG, Marcotte EM, Xu D, Hughes TR, Roth FP<br />
|journal=Genome Biol<br />
|pub_year=2008<br />
|volume=9 Suppl 1<br />
|page=S2<br />
|pubmed=18613946 <br />
|link=http://genomebiology.com/2008/9/S1/S2<br />
|pdf=GenomeBiology_MouseFunc_2008.pdf<br />
|comment=[http://func.med.harvard.edu/ MouseFunc predictions]<br />
}}<br />
</li><br />
<li value="72"> {{Paper<br />
|title=Mechanisms of cell cycle control revealed by a systematic and quantitative overexpression screen in <i>S. cerevisiae</i><br />
|authors=Niu W, Li Z, Zhan W, Iyer VR, Marcotte EM<br />
|journal=PLoS Genet<br />
|pub_year=2008<br />
|volume=4(7)<br />
|page=e1000120<br />
|pubmed=18617996<br />
|link=http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000120<br />
|pdf=PLoSGenetics_CellCycleScreen_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/Niu_et_al_MORF_strains_cell_cnt_gt5000_Z_scores.xls Supplemental File of All ORF FACS Defects] <br />
}}<br />
</li><br />
<li value="71"> {{Paper<br />
|title=Group II intron protein localization and insertion sites are affected by polyphosphate<br />
|authors=Zhao J, Niu W, Yao J, Mohr S, Marcotte EM, Lambowitz AM<br />
|journal=PLoS Biol<br />
|pub_year=2008<br />
|volume=6(6)<br />
|page=e150<br />
|pubmed=18593213 <br />
|link=http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0060150<br />
|pdf=PLoSBiology_IntronLocalization_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="70"> {{Paper<br />
|title=A map of human protein interactions derived from co-expression of human mRNAs and their orthologs<br />
|authors=Ramani AK, Li Z, Hart GT, Carlson MW, Boutz DR, Marcotte EM<br />
|journal=Mol Syst Biol<br />
|pub_year=2008<br />
|volume=4<br />
|page=180<br />
|pubmed=18414481<br />
|link=http://dx.doi.org/10.1038/msb.2008.19<br />
|pdf=MolSysBiol_CCE_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="69"> {{Paper<br />
|title=Bud23 methylates G1575 of 18S rRNA and is required for efficient nuclear export of pre-40S subunits<br />
|authors=White J, Li Z, Sardana R, Bujnicki JM, Marcotte EM, Johnson AW<br />
|journal=Mol Cell Biol<br />
|pub_year=2008<br />
|volume=28(10)<br />
|page=3151-61<br />
|pubmed=18332120<br />
|link=http://mcb.asm.org/cgi/content/full/28/10/3151<br />
|pdf=MolCellBiol_Bud23_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="68"> {{Paper<br />
|title=The proteomic response of <i>Mycobacterium smegmatis</i> to anti-tuberculosis drugs suggests targeted pathways<br />
|authors=Wang R, Marcotte EM<br />
|journal=J Proteome Res<br />
|pub_year=2008<br />
|volume=7(3)<br />
|page=855-65<br />
|pubmed=18275136<br />
|link=http://pubs.acs.org/doi/abs/10.1021/pr0703066<br />
|pdf=JProteomeResearch_TBDrug_2008.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="67"> {{Paper<br />
|title=A single gene network accurately predicts phenotypic effects of gene perturbation in <i>Caenorhabditis elegans</i><br />
|authors=Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM<br />
|journal=Nat Genet<br />
|pub_year=2008<br />
|volume=40(2)<br />
|page=181-8<br />
|pubmed=18223650<br />
|link=http://www.nature.com/ng/journal/v40/n2/abs/ng.2007.70.html<br />
|pdf=NatureGenetics_Wormnet_2008.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureGenetics_Wormnet_2008_Supplement.pdf Supplement] [http://www.functionalnet.org/wormnet Supplemental Web Site] [[File:NatureGeneticsWormNetCover2008.jpg||100px|right]]<br />
<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2007 ==<br />
<ol><br />
<li value="66"> {{Paper<br />
|title=Broad network-based predictability of <i>Saccharomyces cerevisiae</i> gene loss-of-function phenotypes<br />
|authors=McGary KL, Lee I, Marcotte EM<br />
|journal=Genome Biol<br />
|pub_year=2007<br />
|volume=8(12)<br />
|page=R258.<br />
|pubmed=18053250 <br />
|link=http://genomebiology.com/2007/8/12/R258<br />
|pdf=GenomeBiology_YeastPhenoPred_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="65"> {{Paper<br />
|title=An improved, bias-reduced probabilistic functional gene network of baker's yeast, <i>Saccharomyces cerevisiae</i><br />
|authors=Lee I, Li Z, Marcotte EM<br />
|journal=PLoS ONE<br />
|pub_year=2007<br />
|volume=2(10)<br />
|page=e988<br />
|pubmed=17912365<br />
|link=http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000988<br />
|pdf=PLOS1_YeastNet2_2007.pdf<br />
|comment=[http://www.yeastnet.org Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="64"> {{Paper<br />
|title=How do shotgun proteomics algorithms identify proteins?<br />
|authors=Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(7)<br />
|page=755-7<br />
|pubmed=17621303<br />
|link=http://www.nature.com/nbt/journal/v25/n7/abs/nbt0707-755.html<br />
|pdf=NatureBiotech_ShotgunProteomicsPrimer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="63"> {{Paper<br />
|title=Quantitative gene expression assessment identifies appropriate cell line models for individual cervical cancer pathways<br />
|authors=Carlson MW, Iyer VR, Marcotte EM<br />
|journal=BMC Genomics<br />
|pub_year=2007<br />
|volume=8<br />
|page=117.<br />
|pubmed=17493265<br />
|link=http://www.biomedcentral.com/1471-2164/8/117<br />
|pdf=BMCGenomics_CervicalCancer_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="62"> {{Paper<br />
|title=Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation<br />
|authors=Lu P, Vogel C, Wang R, Yao X, Marcotte EM<br />
|journal=Nat Biotechnol<br />
|pub_year=2007<br />
|volume=25(1)<br />
|page=117-24<br />
|pubmed=17187058<br />
|link=http://www.nature.com/nbt/journal/v25/n1/abs/nbt1270.html<br />
|pdf=NatureBiotech_APEX_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_supplement.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_SupplementaryData.zip Supplemental Data (zipped folder)] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews.pdf News & Views 1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews2.pdf News & Views 2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_2007_newsandviews3.pdf News & Views 3] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_APEX_NBTretrospective_2011.pdf 2011 NBT Retrospective on APEX]<br />
}}<br />
</li><br />
<li value="61"> {{Paper<br />
|title=Global metabolic changes following loss of a feedback loop reveal dynamic steady states of the yeast metabolome<br />
|authors=Lu P, Rangan A, Chan SY, Appling DR, Hoffman DW, Marcotte EM<br />
|journal=Metab Eng<br />
|pub_year=2007<br />
|volume=9(1)<br />
|page=8-20<br />
|pubmed=17049899 <br />
|link=http://dx.doi.org/10.1016/j.ymben.2006.06.003<br />
|pdf=MetabolicEngineering_OneCarbonMetab_2007.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile1.xls Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile2.xls Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/MetabolicEngineering_OneCarbonMetab_2007_SupplementalFile3.xls Supplemental File 3]<br />
}}<br />
</li><br />
<li value="60"> {{Paper<br />
|title=A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality<br />
|authors=Hart GT, Lee I, Marcotte EM<br />
|journal=BMC Bioinformatics<br />
|pub_year=2007<br />
|volume=8<br />
|page=236.<br />
|pubmed=17605818 <br />
|link=http://www.biomedcentral.com/1471-2105/8/236<br />
|pdf=BMCBioinformatics_YeastComplexEssentiality_2007.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2006 ==<br />
<ol><br />
<li value="59"> {{Paper<br />
|title=How complete are current yeast and human protein-interaction networks?<br />
|authors=Hart GT, Ramani AK, Marcotte EM.<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(11)<br />
|page=120<br />
|pubmed=17147767<br />
|link=http://genomebiology.com/2006/7/11/120<br />
|pdf=GenomeBiology_HumanPPIOverview_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_HumanPPIOverview_2006_AdditionalDataFile1.pdf Additional Data File 1]<br />
}}<br />
</li><br />
<li value="58"> {{Paper<br />
|title=Chromatographic alignment of ESI-LC-MS proteomics datasets by ordered bijective interpolated warping<br />
|authors=Prince JT, Marcotte EM<br />
|journal=Anal. Chem. <br />
|pub_year=2006<br />
|volume=78(17)<br />
|page=6140-52<br />
|pubmed=16944896<br />
|link=http://pubs.acs.org/doi/abs/10.1021/ac0605344<br />
|pdf=AnalyticalChemistry_OBIWarp_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="57"> {{Paper<br />
|title=A fast coarse filtering method for peptide identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=Bioinformatics<br />
|pub_year=2006<br />
|volume=22(12)<br />
|page=1524-31<br />
|pubmed=16585069 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/22/12/1524<br />
|pdf=Bioinformatics_MoBIoSCoarseFilter_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="56"> {{Paper<br />
|title=Systematic profiling of cellular phenotypes with spotted cell microarrays reveals new pheromone response genes<br />
|authors=Narayanaswamy R, Niu W, Scouras A, Hart GT, Davies J, Ellington AD, Iyer VR, Marcotte EM<br />
|journal=Genome Biol. <br />
|pub_year=2006<br />
|volume=7(1)<br />
|page=R6<br />
|pubmed=16507139 <br />
|link=http://genomebiology.com/2006/7/1/R6<br />
|pdf=GenomeBiology_CellChips_2006.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/GenomeBiology_CellChips_Supplement_2006.pdf Supplement] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable1.xls Supplemental Table 1] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable2.xls Supplemental Table 2] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable3.xls Supplemental Table 3] [http://www.marcottelab.org/paper-pdfs/NarayanaswamySupplementalTable4.xls Supplemental Table 4]<br />
}}<br />
</li><br />
<li value="55"> {{Paper<br />
|title=Bioinformatic prediction of yeast gene function<br />
|authors=Lee I, Narayanaswamy R, Marcotte EM<br />
|journal=Yeast Gene Analysis<br />
|pub_year=2006<br />
|volume=Stansfield, I., ed., Elsevier Press<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=LeeNarayanaswamyMarcotteManuscript.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="54"> {{Paper<br />
|title=Bioinformatic challenges for the next decade(s)<br />
|authors=Eisenberg D, Marcotte E, McLachlan AD, Pellegrini M<br />
|journal=Philos Trans R Soc Lond B Biol Sci.<br />
|pub_year=2006<br />
|volume=361(1467)<br />
|page=525-7<br />
|pubmed=16524841<br />
|link=http://rstb.royalsocietypublishing.org/content/361/1467/525.long<br />
|pdf=PhilTransactionsRoyalSocB_BioinformaticChallenges_2006.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2005 ==<br />
<ol><br />
<li value="53"> {{Paper<br />
|title=Synthetic biology: Engineering ''Escherichia coli'' to see light<br />
|authors=Levskaya A, Chevalier AA, Tabor JJ, Simpson ZB, Lavery LA, Levy M, Davidson EA, Scouras A, Ellington AD, Marcotte EM, Voigt CA<br />
|journal=Nature<br />
|pub_year=2005 <br />
|volume=438(7067)<br />
|page=441-2<br />
|pubmed=16306980 <br />
|link=http://dx.doi.org/10.1038/nature04405<br />
|pdf=Nature_BacterialPhotography_2005.pdf<br />
|comment=[http://www.sciencedaily.com/releases/2005/11/051123171556.htm the Science Daily press release] [http://dx.doi.org/10.1038/4381064a <i>Nature</i> 2005 Gallery "First Glimpse"] [http://dx.doi.org/10.1038/438417a <i>Nature</i> feature on the iGEM competition featuring a bacterial portrait] [http://www.utexas.edu/features/2005/bacteria/ UT press release] [http://www.nytimes.com/2005/11/24/national/24film.html New York Times feature]<br />
}}<br />
</li><br />
<li value="52"> {{Paper<br />
|title=A fast coarse filtering method for protein identification by mass spectrometry<br />
|authors=Ramakrishnan SR, Mao R, Nakorchevskiy AA, Prince JT, Willard WS, Xu W, Marcotte EM, Miranker DP<br />
|journal=University of Texas Dept. of Computer Sciences, Technical Report<br />
|pub_year=2005 <br />
|volume=TR-05-06<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=TechnicalReport-MoBIoS-TR-05-06.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="51"> {{Paper<br />
|title=Mass spectrometry of the <i>M. smegmatis</i> proteome: Protein expression levels correlate with function, operons, and codon bias<br />
|authors=Wang R, Prince JT, Marcotte EM<br />
|journal=Genome Res.<br />
|pub_year=2005 <br />
|volume=15(8)<br />
|page=1118-26<br />
|pubmed=16077011 <br />
|link=http://genome.cshlp.org/content/15/8/1118.long <br />
|pdf=rong_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="50"> {{Paper<br />
|title=Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome<br />
|authors=Ramani AK, Bunescu RC, Mooney RJ, Marcotte EM<br />
|journal=Genome Biology<br />
|pub_year=2005 <br />
|volume=6(5)<br />
|page=R40<br />
|pubmed=15892868 <br />
|link=http://genomebiology.com/2005/6/5/R40<br />
|pdf=Arun-consolidate-human.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="49"> {{Paper<br />
|title=Comparative experiments on learning information extractors for proteins and their interactions<br />
|authors=Bunescu R, Ge R, Kate RJ, Marcotte EM, Mooney RJ, Ramani AK, Wong YW<br />
|journal=Artif Intell Med.<br />
|pub_year=2005 <br />
|volume=33(2)<br />
|page=139-55<br />
|pubmed=15811782 <br />
|link=http://dx.doi.org/10.1016/j.artmed.2004.07.016<br />
|pdf=bionlp-aimed-04.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="48"> {{Paper<br />
|title=Using biomedical literature mining to consolidate the set of known human protein-protein interactions<br />
|authors=Ramani AK, Marcotte EM, Bunescu RC, Mooney RJ<br />
|journal=Intelligent Systems in Molecular Biology-ACL Workshop<br />
|pub_year=2005 <br />
|volume=<br />
|page=<br />
|pubmed= <br />
|link=<br />
|pdf=ISMB-ACLworkshop_LitMining_2005.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="47"> {{Paper<br />
|title=Protein function prediction using the Protein Link Explorer (PLEX)<br />
|authors=Date SV, Marcotte EM<br />
|journal=Bioinformatics<br />
|pub_year=2005 <br />
|volume=21(10)<br />
|page=2558-9<br />
|pubmed=15701682 <br />
|link=http://bioinformatics.oxfordjournals.org/cgi/content/full/21/10/2558<br />
|pdf=Plex.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/plex/plex.html Supplemental Web Site]<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2004 ==<br />
<ol><br />
<li value="46"> {{Paper<br />
|title=A probabilistic functional network of yeast genes<br />
|authors=Lee I, Date SV, Adai AT, Marcotte EM<br />
|journal=Science<br />
|pub_year=2004<br />
|volume=306(5701)<br />
|page=1555-8<br />
|pubmed=15567862<br />
|pdf=Science_Lee_YeastNet.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/1099511v2s.pdf Supplemental methods] [http://www.marcottelab.org/paper-pdfs/1099511v2s_list.txt Supplemental README] [http://www.marcottelab.org/paper-pdfs/1099511v2s1.zip Supplemental File 1] [http://www.marcottelab.org/paper-pdfs/1099511v2s2.txt Supplemental File 2] [http://www.marcottelab.org/paper-pdfs/1099511v2s3 Supplemental File 3] [http://www.marcottelab.org/paper-pdfs/1099511v2s4.wrl Supplemental File 4] [http://www.marcottelab.org/paper-pdfs/1099511v2s5.wrl Supplemental File 5] (Files 4 & 5 require a VRML viewer)<br />
}}<br />
</li><br />
<li value="45"> {{Paper<br />
|authors= Baliga NS, Bonneau R, Facciotti MT, Pan M, Glusman G, Deutsch EW, Shannon P, Chiu Y, Weng RS, Gan RR, Hung P, Date SV, Marcotte E, Hood L, Ng WV<br />
|title=Genome sequence of <i>Haloarcula marismortui</i>: a halophilic archaeon from the Dead Sea <br />
|journal=Genome Res. <br />
|volume=14(11)<br />
|page=2221-34<br />
|pub_year=2004<br />
|pubmed=15520287<br />
|pdf=GenomeResearch_HaloarculumGenome.pdf<br />
|comment=[[File:GenomeResearchHaloarculaCover2004.jpg||100px|right]]<br />
}}<br />
</li><br />
<li value="44"> {{Paper<br />
|title=Development through the eyes of functional genomics<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Curr Opin Genet Dev.<br />
|pub_year=2004<br />
|volume=14(4)<br />
|page=336-42<br />
|pubmed=15261648 <br />
|link=http://dx.doi.org/10.1016/j.gde.2004.06.015 <br />
|pdf=COGD_FraserMarcotte_2004.pdf <br />
|comment=<br />
}}<br />
</li><br />
<li value="43"> {{Paper<br />
|title=Protein interaction networks from yeast to human<br />
|authors=Bork P, Jensen LJ, Von Mering C, Ramani AK, Lee I, Marcotte EM<br />
|journal=Curr Opin Struct Biol<br />
|pub_year=2004<br />
|volume=14(3)<br />
|page=292-9<br />
|pubmed=15193308 <br />
|link=http://dx.doi.org/10.1016/j.sbi.2004.05.003 <br />
|pdf=cosb-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="42"> {{Paper<br />
|title=LGL: Creating a map of protein function with an algorithm for visualizing very large biological networks<br />
|authors=Adai AT, Date SV, Wieland S, Marcotte EM<br />
|journal=J Mol Biol<br />
|pub_year=2004<br />
|volume=340(1)<br />
|page=179-90<br />
|pubmed=15184029 <br />
|link=http://dx.doi.org/10.1016/j.jmb.2004.04.047 <br />
|pdf=jmb-lgl.pdf <br />
|comment=[http://bioinformatics.icmb.utexas.edu/lgl/index.html Supplemental Web Site] [http://sourceforge.net/projects/lgl/ Sourceforge Site] For more recent support of LGL, see the LGL guide by [http://clairemcwhite.github.io/lgl-guide/ Claire McWhite] and the latest updates from [http://www.opte.org/lgl/ the Opte Project]<br />
}}<br />
</li><br />
<li value="41"> {{Paper<br />
|title=A probabilistic view of gene function<br />
|authors=Fraser AG, Marcotte EM<br />
|journal=Nature Genetics<br />
|pub_year=2004<br />
|volume=36(6)<br />
|page=559-64<br />
|pubmed=15167932 <br />
|link=http://dx.doi.org/10.1038/ng1370 <br />
|pdf=ng-fraser-review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="40"> {{Paper<br />
|title=Practical computational approaches to infer protein function<br />
|authors=Marcotte EM<br />
|journal=Biosilico<br />
|pub_year=2004<br />
|volume=2<br />
|page=24-29<br />
|pubmed=<br />
|link= <br />
|pdf=Biosilico_Marcotte_2004_proofs.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="39"> {{Paper<br />
|title=The need for a public proteomics repository<br />
|authors=Prince JT, Carlson MW, Wang R, Lu P, Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2004<br />
|volume=22(4)<br />
|page=471-472<br />
|pubmed=15085804 <br />
|link=http://dx.doi.org/10.1038/nbt0404-471<br />
|nbt-MS-review.pdf<br />
|comment=[http://bioinformatics.icmb.utexas.edu/OPD/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="38"> {{Paper<br />
|title=Response to McDermott and Samudrala: Enhanced functional information from predicted protein networks<br />
|authors=Date SV, Marcotte EM<br />
|journal=TRENDS in Biotechnology<br />
|pub_year=2004<br />
|volume=22(2)<br />
|page=62-63<br />
|pubmed=<br />
|link=http://dx.doi.org/10.1016/j.tibtech.2003.11.008 <br />
|pdf=trends-biotech.pdf <br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2003 ==<br />
<ol><br />
<li value="37"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U<br />
|journal=Bioinformatics<br />
|pub_year=2003<br />
|volume=19(13)<br />
|pubmed=12967956<br />
|page=1612-9<br />
|pdf=diametrical.pdf<br />
}}<br />
</li><br />
<li value="36"> {{Paper<br />
|title=Expression deconvolution: a reinterpretation of DNA microarray data reveals dynamic changes in cell populations<br />
|authors=Lu P, Nakorchevskiy A, Marcotte EM<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=2003<br />
|volume=100(18)<br />
|page=10370-5<br />
|pubmed=12934019<br />
|pdf=peng-pnas.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/PNAS_deconvolution_2003-supplementalfiles.zip Supplemental files] (zipped folder containing executable .jar file, yeast test data and cell cycle basis vectors) <br />
}}<br />
</li><br />
<li value="35"> {{Paper<br />
|title=Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages<br />
|authors=Date SV, Marcotte EM<br />
|journal=Nat Biotechnol.<br />
|pub_year=2003<br />
|volume=21(9)<br />
|page=1055-62<br />
|pubmed=12923548<br />
|pdf=shailesh-natbt.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS1.pdf Fig S1] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_FigS2.gif Fig S2] [http://www.marcottelab.org/paper-pdfs/NatureBiotech_SystematicNewPathways_TableS1.pdf Table S1] <br />
}}<br />
</li><br />
<li value="34"> {{Paper<br />
|title=Assembling a jigsaw puzzle with 20,000 parts<br />
|authors=Marcotte EM<br />
|journal=Genome Biol.<br />
|pub_year=2003<br />
|volume=4(6)<br />
|page=323<br />
|pubmed=12801408<br />
|pdf=genome-biology.pdf<br />
}}<br />
</li><br />
<li value="33"> {{Paper<br />
|title=Exploiting the co-evolution of interacting proteins to discover interaction specificity<br />
|authors=Ramani AK, Marcotte EM<br />
|journal=J Mol Biol.<br />
|pub_year=2003<br />
|volume=327(1)<br />
|page=273-84<br />
|pubmed=12614624<br />
|pdf=jmb_2003.pdf<br />
|comment=[http://orion.icmb.utexas.edu/matrix/ Supplemental Web Site]<br />
}}<br />
</li><br />
<li value="32"> {{Paper<br />
|title=The genome sequence of the filamentous fungus <i>Neurospora crassa</i><br />
|authors=Galagan JE, Calvo SE, Borkovich KA, Selker EU, Read ND, Jaffe D, FitzHugh W, Ma LJ, Smirnov S, Purcell S, Rehman B, Elkins T, Engels R, Wang S, Nielsen CB, Butler J, Endrizzi M, Qui D, Ianakiev P, Bell-Pedersen D, Nelson MA, Werner-Washburne M, Selitrennikoff CP, Kinsey JA, Braun EL, Zelter A, Schulte U, Kothe GO, Jedd G, Mewes W, Staben C, Marcotte E, Greenberg D, Roy A, Foley K, Naylor J, Stange-Thomann N, Barrett R, Gnerre S, Kamal M, Kamvysselis M, Mauceli E, Bielke C, Rudd S, Frishman D, Krystofova S, Rasmussen C, Metzenberg RL, Perkins DD, Kroken S, Cogoni C, Macino G, Catcheside D, Li W, Pratt RJ, Osmani SA, DeSouza CP, Glass L, Orbach MJ, Berglund JA, Voelker R, Yarden O, Plamann M, Seiler S, Dunlap J, Radford A, Aramayo R, Natvig DO, Alex LA, Mannhaupt G, Ebbole DJ, Freitag M, Paulsen I, Sachs MS, Lander ES, Nusbaum C, Birren B<br />
|journal=Nature<br />
|pub_year=2003<br />
|volume=422(6934)<br />
|page=859-68<br />
|pubmed=12712197<br />
|pdf=Ncrassa.pdf<br />
}}<br />
</li><br />
<li value="31"> {{Paper<br />
|authors=Bunescu R, Ge R, Kate R, Mooney R, Wong Y, Marcotte E, Ramani A<br />
|title=Learning to extract proteins and their interactions from Medline abstracts<br />
|journal=ICML Workshop<br />
|pub_year=2003<br />
|volume=<br />
|page=<br />
|pdf=icmlws.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2002 ==<br />
<ol><br />
<li value="30"> {{Paper<br />
|title=Making sense of proteomics: Using bioinformatics to discover a protein's structure, functions, and interactions<br />
|authors=Mallick P, Marcotte EM<br />
|journal=Proteins and Proteomics: A Laboratory Manual<br />
|pub_year=2002<br />
|volume=Simpson RJ, ed., Cold Spring Harbor Press<br />
|page=<br />
|link=<br />
|comment= <br />
}}<br />
</li><br />
<li value="29"> {{Paper<br />
|title=Diametrical clustering for identifying anti-correlated gene clusters<br />
|authors=Dhillon IS, Marcotte EM, Roshan U.<br />
|journal=The University of Texas at Austin, Department of Computer Sciences<br />
|pub_year=2002<br />
|volume=Technical Report TR-02-49<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=TechnicalReport_DiametricClustering_tr02-49.pdf<br />
}}<br />
</li><br />
<li value="28"> {{Paper<br />
|title=Predicting protein function and networks on genome-wide scale<br />
|authors=Marcotte EM<br />
|journal=Gene Regulation and Metabolism: Post-Genomic Computational Approaches<br />
|pub_year=2002<br />
|volume=Collado-Vides J, Holfstadt R, eds., MIT press<br />
|pubmed=<br />
|page=<br />
|link=<br />
|comment=<br />
|pdf=Marcotte-ColladoVidesChapter-2002.pdf<br />
}}<br />
</li><br />
<li value="27"> {{Paper<br />
|title=Predicting functional linkages from gene fusions with confidence<br />
|authors=Verjovsky Marcotte CJ, Marcotte EM<br />
|journal=Applied Bioinformatics<br />
|pub_year=2002<br />
|volume=1(2)<br />
|pubmed=12967956<br />
|page=1-8<br />
|link=<br />
|comment=<br />
|pdf=RS_statistics.pdf<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2001 ==<br />
<ol><br />
<li value="26"> {{Paper<br />
|title=Exploiting big biology: Integrating large-scale biological data for functional inference<br />
|authors=Marcotte EM, Date SV<br />
|journal=Brief Bioinform<br />
|pub_year=2001<br />
|volume=2(4)<br />
|page=363-74<br />
|pubmed=11808748<br />
|link=<br />
|pdf=BIB_review.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="25"> {{Paper<br />
|title=The path not taken<br />
|authors=Marcotte EM<br />
|journal=Nature Biotechnology<br />
|pub_year=2001<br />
|volume=19(7)<br />
|page=626-7<br />
|pubmed=11433271<br />
|link=<br />
|pdf=path-not-taken.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="24"> {{Paper<br />
|title=Measuring the dynamics of the proteome<br />
|authors=Marcotte EM<br />
|journal=Genome Research<br />
|pub_year=2001<br />
|volume=11(2)<br />
|page=191-3<br />
|pubmed=11157781<br />
|link=<br />
|pdf=measuring-dynamics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="23"> {{Paper<br />
|title=Mining literature for protein interactions<br />
|authors=Marcotte EM, Xenarios I, Eisenberg D<br />
|journal=Bioinformatics <br />
|pub_year=2001<br />
|volume=17(4)<br />
|page=359-63<br />
|pubmed=11301305<br />
|link=<br />
|pdf=Bioinformatics_lit_mining.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/README README] [http://www.marcottelab.org/paper-pdfs/500_abstracts_with_PMID 500_abstracts_with_PMID] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions Discriminating_words_for_interactions] [http://www.marcottelab.org/paper-pdfs/Discriminating_words_for_interactions_edited Discriminating_words_for_interactions_edited] [http://www.marcottelab.org/paper-pdfs/score_abstracts score_abstracts Perl script]<br />
}}<br />
</li><br />
<li value="22"> {{Paper<br />
|title=From genome sequences to protein interactions<br />
|authors=Eisenberg D, Marcotte E, Pellegrini M, Thompson M, Xenarios I, Yeates T<br />
|journal=FASEB J<br />
|pub_year=2001<br />
|volume=15<br />
|page=A724-A724<br />
|pubmed= <br />
|link=<br />
|pdf=<br />
|comment=<br />
}}<br />
</li><br />
<li value="21"> {{Paper<br />
|title=DIP: the database of interacting proteins: 2001 update<br />
|authors=Xenarios I, Fernandez E, Salwinski L, Duan XJ, Thompson MJ, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res<br />
|pub_year=2001<br />
|volume=29(1)<br />
|page=239-41<br />
|pubmed=11125102<br />
|link=<br />
|pdf=NAR_DIP_2001.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 2000 ==<br />
<ol><br />
<li value="20"> {{Paper<br />
|title=Protein function in the post-genomic era<br />
|authors=Eisenberg D, Marcotte EM, Xenarios I, Yeates TO<br />
|journal=Nature<br />
|pub_year=2000<br />
|volume=405(6788)<br />
|page=823-6 <br />
|pubmed=10866208 <br />
|link=http://dx.doi.org/10.1038/35015694<br />
|pdf=Nature_Review_2000.taf<br />
|comment=<br />
}}<br />
</li><br />
<li value="19"> {{Paper<br />
|title=Localizing proteins in the cell from their phylogenetic profiles<br />
|authors=Marcotte EM, Xenarios I, van der Bliek A, Eisenberg D<br />
|journal=Proc Natl Acad Sci U S A.<br />
|pub_year=2000<br />
|volume=97(22)<br />
|page=12115-20<br />
|pubmed=11035803 <br />
|link=http://www.pnas.org/content/97/22/12115.long<br />
|pdf=PNAS_mito_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="18"> {{Paper<br />
|title=Computational genetics: Finding function by non-homology methods<br />
|authors=Marcotte EM<br />
|journal=Curr Opin Struct Biol. <br />
|pub_year=2000<br />
|volume=10(3)<br />
|page=359-65<br />
|pubmed=10851184 <br />
|link=http://dx.doi.org/10.1016/S0959-440X(00)00097-X <br />
|pdf=cosb_compgenetics_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="17"> {{Paper<br />
|title=Characterization of a thermostable DNA glycosylase specific for U/G and T/G mismatches from the hyperthermophilic archaeon <i>Pyrobaculum aerophilum</i><br />
|authors=Yang H, Fitz-Gibbon S, Marcotte EM, Tai JH, Hyman EC, Miller JH<br />
|journal=J Bacteriol.<br />
|pub_year=2000<br />
|volume=182(5)<br />
|page=1272-9<br />
|pubmed=10671447 <br />
|link=http://jb.asm.org/cgi/content/full/182/5/1272?view=long&pmid=10671447<br />
|pdf=JBacti_Pyrobaculum_glycosylase.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="16"> {{Paper<br />
|title=Increasing the specificity of protein functional inference by the Rosetta Stone method<br />
|authors=Thompson M, Marcotte E, Pellegrini M, Yeates T, Eisenberg D<br />
|journal=Currents in Computational Molecular Biology <br />
|pub_year=2000<br />
|volume=Miyano S, Shamir R, Takagi T, eds., Universal Academy Press, Inc.<br />
|page=<br />
|pubmed=<br />
|link=<br />
|pdf=CurrentsinCompMolBio_Thompson_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="15"> {{Paper<br />
|title=DIP: the database of interacting proteins<br />
|authors=Xenarios I, Rice DW, Salwinski L, Baron MK, Marcotte EM, Eisenberg D<br />
|journal=Nucleic Acids Res.<br />
|pub_year=2000<br />
|volume=28(1)<br />
|page=289-91<br />
|pubmed=10592249 <br />
|link=http://nar.oxfordjournals.org/cgi/content/full/28/1/289<br />
|pdf=NAR_DIP_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1999 ==<br />
<ol><br />
<li value="14"> {{Paper<br />
|title=A combined algorithm for genome-wide prediction of protein function<br />
|authors=Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg D<br />
|journal=Nature<br />
|pub_year=1999<br />
|volume=402(6757)<br />
|page=83-6<br />
|pubmed=10573421 <br />
|link=http://www.nature.com/nature/journal/v402/n6757/full/402083a0.html<br />
|pdf=nature_genomewidepred.pdf<br />
|comment=See also Sali, A. Genomics: Functional links between proteins. Nature 402, 23-26 (1999), Boston Globe (Nov. 3, 1999), Los Angeles Times (Nov. 4, 1999).<br />
}}<br />
</li><br />
<li value="13"> {{Paper<br />
|title=Detecting protein function and protein-protein interactions from genome sequences<br />
|authors=Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D<br />
|journal=Science<br />
|pub_year=1999<br />
|volume=285(5428)<br />
|page=751-3<br />
|pubmed=10427000 <br />
|link=http://dx.doi.org/10.1126/science.285.5428.751<br />
|pdf=RS_science.pdf<br />
|comment=See also Doolittle, R. F. Do you dig my groove? Nature: Genetics 23, 6-8 (1999).<br />
}}<br />
</li><br />
<li value="12"> {{Paper<br />
|title=A census of protein repeats<br />
|authors=Marcotte EM, Pellegrini M, Yeates TO, Eisenberg D<br />
|journal=J Mol Biol.<br />
|pub_year=1999<br />
|volume=293(1)<br />
|page=151-60<br />
|pubmed=10512723 <br />
|link=http://dx.doi.org/10.1006/jmbi.1999.3136 <br />
|pdf=JMB_Census_2000.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="11"> {{Paper<br />
|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles<br />
|authors=Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO<br />
|journal=Proc Natl Acad Sci U S A<br />
|pub_year=1999<br />
|volume=96(8)<br />
|page=4285-8<br />
|pubmed=10200254 <br />
|link=http://www.pnas.org/content/96/8/4285.long<br />
|pdf=PNAS_phylogenetic_profiles.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="10"> {{Paper<br />
|title=A fast algorithm for genome-wide analysis of proteins with repeated sequences<br />
|authors=Pellegrini M, Marcotte EM, Yeates TO<br />
|journal=Proteins: Struct. Funct. Genet.<br />
|pub_year=1999<br />
|volume=35(4)<br />
|page=440-6<br />
|pubmed=10382671 <br />
|link=http://www3.interscience.wiley.com/journal/65000326/abstract?CRETRY=1&SRETRY=0<br />
|pdf=Proteins_repeats_in_proteins.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== 1998 ==<br />
<ol><br />
<li value="9"> {{Paper<br />
|title=Chicken prion tandem repeats form a stable, protease-resistant domain<br />
|authors=Marcotte EM, Eisenberg D<br />
|journal=Biochemistry<br />
|pub_year=1998<br />
|volume=38(2)<br />
|page=667-76<br />
|pubmed=9888807 <br />
|link=http://dx.doi.org/10.1021/bi981487f<br />
|pdf=chickenprion.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="8"> {{Paper<br />
|title=A look at the future of macromolecular structure determination<br />
|authors=Cascio D, Goodwill K, Marcotte E<br />
|journal=Rigaku J.<br />
|pub_year=1998<br />
|volume=15<br />
|page=1-5<br />
|pubmed=<br />
|link=<br />
|pdf=RigakuJournal_look_at_xtal_future.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="7"> {{Paper<br />
|title=Structural analysis shows five glycohydrolase families diverged from a common ancestor<br />
|authors=Robertus JD, Monzingo AF, Marcotte EM, Hart PJ<br />
|journal=J Exp Zool.<br />
|pub_year=1998<br />
|volume=282(1-2)<br />
|page=127-32<br />
|pubmed=9723170 <br />
|link=http://www3.interscience.wiley.com/journal/75837/abstract<br />
|pdf=JExpZool_chitinase_evolution.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Pre - 1998 ==<br />
<ol><br />
<br />
<li value="6"> {{Paper<br />
|title=Kinetic analysis of barley chitinase<br />
|authors=Hollis T, Honda Y, Fukamizo T, Marcotte E, Day PJ, Robertus JD<br />
|journal=Arch Biochem Biophys.<br />
|pub_year=1997 <br />
|volume=344(2)<br />
|page=335-42<br />
|pubmed=9264547 <br />
|link=http://dx.doi.org/10.1006/abbi.1997.0225 <br />
|pdf=ArchBiochemBiophys_chitinase_kinetics.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="5"> {{Paper<br />
|title=X-ray structure of an anti-fungal chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte EM, Monzingo AF, Ernst SR, Brzezinski R, Robertus JD<br />
|journal=Nat Struct Biol.<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=155-62<br />
|pubmed=8564542 <br />
|link=<br />
|pdf=NatureStructuralBiology_Chitosanase_1996.pdf<br />
|comment=[http://www.marcottelab.org/paper-pdfs/NatureStructuralBiology_ChitosanaseCommentary_1996.pdf News & Views]<br />
}}<br />
</li><br />
<li value="4"> {{Paper<br />
|title=Chitinases, chitosanases, and lysozymes can be divided into procaryotic and eucaryotic families sharing a conserved core<br />
|authors=Monzingo AF, Marcotte EM, Hart PJ, Robertus JD<br />
|journal=Nat Struct Biol<br />
|pub_year=1996 <br />
|volume=3(2)<br />
|page=133-40<br />
|pubmed=8564539 <br />
|link=<br />
|pdf=NatureStructuralBiology_ConservedCore_1996.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="3"> {{Paper<br />
|title=The structure of chitinases and prospects for structure-Based drug design<br />
|authors=Robertus, J. D., Hart, P. J., Monzingo, A. F., Marcotte, E. & Hollis, T<br />
|journal=Can. J. Bot.<br />
|pub_year=1995<br />
|volume=73 (Suppl. 1)<br />
|page=S1142-S1146<br />
|pdf=CanadianJournalOfBotany_Chitinase_1995.pdf<br />
|pubmed=<br />
|link=<br />
|comment=<br />
}}<br />
</li><br />
<li value="2"> {{Paper<br />
|title=Control of cellular morphogenesis by the Ip12/Bem2 GTPase-activating protein: possible role of protein phosphorylation<br />
|authors=Kim YJ, Francisco L, Chen GC, Marcotte E, Chan CS<br />
|journal=J Cell Biol.<br />
|pub_year=1994 <br />
|volume=127(5)<br />
|page=1381-94<br />
|pubmed=7962097 <br />
|link=http://jcb.rupress.org/cgi/reprint/127/5/1381<br />
|pdf=JCellBiol_KimChan_Ipl2Bem2_1994.pdf<br />
|comment=<br />
}}<br />
</li><br />
<li value="1"> {{Paper<br />
|title=Crystallization of a chitosanase from <i>Streptomyces</i> N174<br />
|authors=Marcotte E, Hart PJ, Boucher I, Brzezinski R, Robertus JD<br />
|journal=J Mol Biol<br />
|pub_year=1993<br />
|volume=232(3)<br />
|page=995-6<br />
|pubmed=8355284 <br />
|link=http://dx.doi.org/10.1006/jmbi.1993.1447<br />
|pdf=JMB_chitosanase_xtal_1993.pdf<br />
|comment=<br />
}}<br />
</li><br />
</ol><br />
<br />
== Patents ==<br />
<ol><br />
<li value="18"> [https://patents.google.com/patent/WO2021236716A2 Publication # WO 2021236716 A2] '''Methods, systems and kits for polypeptide processing and analysis'''. PCT filed May 19, 2021.<br />
<li value="17"> [https://patents.google.com/patent/WO2021168083A1 Publication # WO 2021168083 A1] '''Peptide and protein c-terminus labeling'''. PCT filed Feb 18, 2021.<br />
<li value="16"> [https://patents.google.com/patent/WO2020072907A1 Publication # WO 2020072907 A1] '''Solid-phase N-terminal peptide capture and release'''. PCT filed Oct 04, 2019.<br />
<li value="15"> [https://patents.google.com/patent/WO2020037046A1 Publication # WO 2020037046 A1] '''Single molecule sequencing peptides bound to the major histocompatibility complex'''. PCT filed Aug 14, 2019. [https://patents.google.com/patent/GB2591384B/en UK patent GB 2591384 B] issued July 26, 2023. [https://patents.google.com/patent/GB2607829B/en UK patent GB 2607829 B] issued August 30, 2023.<br />
<li value="14"> [https://patents.google.com/patent/WO2020023488A1/ Publication # WO 2020023488 A1] '''Single molecule sequencing identification of post-translational modifications on proteins'''. PCT filed July 23, 2018.<br />
<li value="13"> [https://patents.google.com/patent/WO2020014586A1/ Publication # WO 2020014586 A1] '''Molecular neighborhood detection by oligonucleotides'''. PCT filed July 12, 2018.<br />
<li value="12"> [https://patents.google.com/patent/US10175249B2 10,175,249 B2], issued January 8, 2019. '''Proteomic identification of antibodies'''. Lavinder, Jason; Boutz, Danny; Wine, Yariv; Marcotte, Edward; Georgiou, George. <br />
<li value="11"> [https://patents.google.com/patent/US10545153B2/ 10,545,153 B2], issued January 28, 2020. '''Single molecule peptide sequencing'''. [https://patentscope.wipo.int/search/en/detail.jsf?docId=WO2016069124 Publication # WO/2016/069124], Intl appl # PCT/US2015/050099, International filing date 15.09.2015. Marcotte, Edward; Anslyn, Eric; Ellington, Andrew; Swaminathan, Jagannath; Hernandez, Erik; Johnson, Amber; Boulgakov, Alexander; Bachman, Logan; Seifert, Helen. '''Improved single molecule sequencing'''. [https://patents.google.com/patent/US11162952B2/ 11,162,952 B2], issued November 2, 2021. [https://patents.google.com/patent/CA2961493C/en?oq=2%2c961%2c493 Canadian patent 2,961,493] issued October 3, 2023.<br />
<li value="10"> [https://patents.google.com/patent/US9625469 9,625,469], issued April, 18, 2017. '''Identifying peptides at the single molecule level'''. Marcotte, Edward; Swaminathan, Jagannath; Ellington, Andrew; Anslyn, Eric. Appl # 14128247, filed 22.06.2012; publication # US20140349860, 27.11.2014. [https://www.ipo.gov.uk/p-ipsum/Case/PublicationNumber/GB2510488 UK patent GB2510499] issued April 8, 2020. [https://patents.google.com/patent/US11105812B2 11,105,812 B2], issued August 31, 2021. [https://patents.google.com/patent/CA2839702C/en Canadian patent CA 2,839,702 C] issued April 20, 2021. [https://patents.google.com/patent/US11435358B2 US 11,435,358 B2], issued September 6, 2022. [https://patents.google.com/patent/DE112012002570T5/en German patent DE 112012002570T5] issued August 10, 2023.<br />
<li value="9"> [https://patents.google.com/patent/WO2013067308A2 Publication # WO 2013067308 A2], '''Compositions and methods for inducing disruption of blood vasculature and for reducing angiogenesis''', PCT filed Nov 2, 2012; provisional patent # 61/555,212 filed Nov 3, 2011.</li><br />
<li value="8"> [https://patents.google.com/patent/WO2013055867A1 Publication # WO 2013055867 A1], '''Genes involved in stress response in plants''', PCT filed Oct 11, 2012.</li><br />
<li value="7"> [http://www.freshpatents.com/-dt20120823ptan20120215458.php USPTO Application # 20120215458], '''Orthologous phenotypes and non-obvious human disease models''', PCT filed July 13, 2010; provisional patent # 61/225,427 filed July 14, 2009.</li><br />
<li value="6"> [https://patents.google.com/patent/US9146241 9,146,241], issued September 29, 2015. '''Proteomic identification of antibodies'''. Lavinder, Jason; Wine, Yariv; Boutz, Danny; Marcotte, Edward; Georgiou, George. Appl # 13/684,395, filed November 23, 2012.<br />
<li value="5"> [https://patents.google.com/patent/US9090674B2 9,090,674 B2], issued July 28, 2015. '''Rapid isolation of monoclonal antibodies from animals'''. Reddy, Sai; Ge, Xin; Lavinder, Jason; Boutz, Danny; Ellington, Andrew D.; Marcotte, Edward M.; Georgiou, George. <br />
<li value="4"> [https://patents.google.com/patent/US6892139 6,892,139], issued May 10, 2005. '''Determining the functions and interactions of proteins by comparative analysis'''.</li><br />
<li value="3"> [https://patents.google.com/patent/US6772069 6,772,069], issued August 3, 2004. '''Determining protein function and interaction from genome analysis'''.</li><br />
<li value="2"> [https://patents.google.com/patent/US6564151 6,564,151], issued May 13, 2003. '''Assigning protein functions by comparative genome analysis protein phylogenetic profiles'''.</li><br />
<li value="1"> [https://patents.google.com/patent/US6466874 6,466,874], issued October 15, 2002. '''Rosetta stone method for detecting protein function and protein-protein interactions from genome sequences'''.</li><br />
</ol></div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-01-29T18:23:36Z<p>Marcotte: </p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
<!--<br />
'''ROSALIND ANNOUNCEMENT'''<br />
* It looks like some people are struggling with the Rosalind problem titled ''Protein Translation''. As an alternative option, I've assigned a problem titled ''Translating RNA into Protein''. Choose one; you'll get credit regardless of which of them you do. Also, it looks like the problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it should work fine.<br />
--><br />
<br />
<!--<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
* '''WEATHER WARNING #2: Change of plans!''' UT has now officially canceled in-person classes, but more to the point, >100,000 people have lost power in Austin today. We're going to cancel the live zoom class tomorrow, and Matt will instead record the lecture and upload it to Canvas for viewing.<br />
* Matt is an expert in the bioinformatic analyses of plasmid sequences and developed the popular [http://plannotate.barricklab.org/ pLannotate tool] to annotate and visualize plasmid features, based on a large database of genetic parts and protein sequences. Funny enough, he first described an early version of pLannotate as his project for this class back in 2019. He'll be introducing several useful Python libraries, including the Pandas package for handling large tables and a data visualization library for plotting data.<br />
--><br />
<br />
<!--<br />
'''Jan 6, 2024 - Biological databases'''<br />
* WEATHER WARNING: UT just announced a campus closure for the morning, so for those of you that are able to attend online, I'll plan to hold it at the normal time on the class zoom channel (link available on Canvas). However, for those that can't make it, don't stress! We'll record the lecture and post the video to Canvas so that you can watch it later. Note: the next Rosalind homework is assigned below.<br />
* Science news of the day: [https://www.theguardian.com/science/2024/jan/26/science-journals-ban-listing-of-chatgpt-as-co-author-on-papers Cell, Nature, Science, eLife, and the Lancet ban listing ChatGPT as a co-author]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 8''':<br />
* Besides giving a bit more programming experience, these questions will also introduce you to the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you need to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [https://stat.utexas.edu/people/lauren-ancel-meyers Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
--><br />
<br />
<!--<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
* The [https://blast.ncbi.nlm.nih.gov/Blast.cgi NCBI Blast server]<br />
* The [http://www.marcottelab.org/paper-pdfs/FoldSeek_NBT_2023.pdf FoldSeek paper] and a link to the [https://search.foldseek.com/search FoldSeek server] if you want to try it out<br />
--><br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-01-29T17:57:22Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
<!--<br />
'''ROSALIND ANNOUNCEMENT'''<br />
* It looks like some people are struggling with the Rosalind problem titled ''Protein Translation''. As an alternative option, I've assigned a problem titled ''Translating RNA into Protein''. Choose one; you'll get credit regardless of which of them you do. Also, it looks like the problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it should work fine.<br />
--><br />
<br />
<!--<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
* '''WEATHER WARNING #2: Change of plans!''' UT has now officially canceled in-person classes, but more to the point, >100,000 people have lost power in Austin today. We're going to cancel the live zoom class tomorrow, and Matt will instead record the lecture and upload it to Canvas for viewing.<br />
* Matt is an expert in the bioinformatic analyses of plasmid sequences and developed the popular [http://plannotate.barricklab.org/ pLannotate tool] to annotate and visualize plasmid features, based on a large database of genetic parts and protein sequences. Funny enough, he first described an early version of pLannotate as his project for this class back in 2019. He'll be introducing several useful Python libraries, including the Pandas package for handling large tables and a data visualization library for plotting data.<br />
--><br />
<br />
<!--<br />
'''Jan 6, 2024 - Biological databases'''<br />
* WEATHER WARNING: UT just announced a campus closure for the morning, so for those of you that are able to attend online, I'll plan to hold it at the normal time on the class zoom channel (link available on Canvas). However, for those that can't make it, don't stress! We'll record the lecture and post the video to Canvas so that you can watch it later. Note: the next Rosalind homework is assigned below.<br />
* Science news of the day: [https://www.theguardian.com/science/2024/jan/26/science-journals-ban-listing-of-chatgpt-as-co-author-on-papers Cell, Nature, Science, eLife, and the Lancet ban listing ChatGPT as a co-author]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 8''':<br />
* Besides giving a bit more programming experience, these questions will also introduce you to the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you need to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [https://stat.utexas.edu/people/lauren-ancel-meyers Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
--><br />
<br />
<!--<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
--><br />
<br />
<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies", please turn in the absolute count of each nucleotide (or dinucleotide) as well as the percentages of the total <br />
<!--<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
--><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcottehttp://www.marcottelab.org/index.php/BCH394P_BCH364C_2024BCH394P BCH364C 20242024-01-25T15:26:19Z<p>Marcotte: /* Lectures & Handouts */</p>
<hr />
<div>== BCH394P/BCH364C Systems Biology & Bioinformatics ==<br />
<br />
'''Course unique #:''' 54430/54305<br><br />
'''Lectures:''' Tues/Thurs 11 – 12:30 PM WEL 2.110<br><br />
'''Instructor:''' Edward Marcotte, marcotte @ utexas.edu<br><br />
* '''Office hours:''' Mon 4 – 5 PM on the class Zoom channel (available on Canvas)<br><br />
'''TA:''' Vicki Deng, dengv @ utexas.edu<br><br />
*'''TA Office hours:''' Tues 1 - 2 PM / Fri 12 - 1 PM in MBB 3.204 or by appointment on Zoom<br><br />
'''Class Canvas site:''' https://utexas.instructure.com/courses/1379402<br />
<br />
== Lectures & Handouts ==<br />
<!--<br />
'''Apr 18 - 25, 2024 - Final Project Presentations'''<br />
* Welcome to the end of the course! You made it! The last 3 days will be presentations of your class projects.<br />
* We'll spend 5 minutes on the [https://utdirect.utexas.edu/ctl/ecis/ Course - Instructor Survey] Thursday morning.<br />
Here's a sampling of some of the completed course projects (posted with permission, with more to come):<br />
* [https://sites.google.com/utexas.edu/hanlin-ren-bioinformatics-proj/home Relative Depth of Aromatic Residues in Membrane Bilayer, by Hanlin Ren]<br />
* [https://sites.google.com/utexas.edu/bch394p-influenza/home Influenza Sequence Analysis, by Travis Beck & Evelyn Rocha]<br />
* [https://sites.google.com/view/subcellularloc/projects Signal peptides and subcellular localisation, by Sophia Zhou]<br />
* [https://sites.google.com/utexas.edu/bch394pbioinformaticsproject/introduction?authuser=0 Hidden Markov Models for Predicting Protein Secondary Structures, by Anant Beechar, Grace Hu, Rayna Taniguchi]<br />
* [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 A Structural Investigation into Scospondin & the Reissner Fiber, by Brittney Voigt]<br />
* [https://sites.google.com/utexas.edu/csra-orthogonality-project/results Development of a Model to predict CsrA-RNA binding, by Ryan Buchser & Vinya Bhuvan]<br />
* [https://sites.google.com/view/bch-364c-final-project/home Extending Cascade Models of Synaptic Plasticity, Argha Bandyopadhyay]<br />
* [https://sites.google.com/view/ama1-polymorphism/home?authuser=0 Genetic diversity of Plasmodium falciparum apical membrane antigen-1, by Christopher Smith, Jeffrey Marchioni, Jin Eyun Kim]<br />
* [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 Identifying putative stabilizing disulfide bond mutations for viral fusion protein vaccine design with machine learning, by Doug Townsend & W. Chase Sanders]<br />
* [https://sites.google.com/view/finalproject-com/title?authuser=0 Investigation of Unique Intron Associated RT, by Jose Alvarado]<br />
* [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home Breast Cancer Classification Using Tumor Characteristics: An Analysis through Pandas and Numpy, by Oishika Das]<br />
* [https://sites.google.com/view/kcgslc30a10 Regulators of Manganese Efflux Transporter SLC30A10, by Kerem Gurol]<br />
* [https://sites.google.com/view/bioinformaticsprojectjustin/references You discovered an antibody, now what?, by Justin Lerma]<br />
* [https://sites.google.com/view/bch394p-project/home Predicting ISGylation Sites with Machine Learning Models, Xu Zhao]<br />
--><br />
<br />
<!--<br />
'''April 16, 2024 - Synthetic Biology, highly compressed'''<br />
* '''Reminder: All projects are due by 10PM, April 12'''. Turn them in as a URL to the web site you created, sent by email to the TA AND PROFESSOR. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_SyntheticBio_Spring2024.pdf Today's slides]<br />
A collection of further reading, if you're so inclined:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MinimalMycoplasma-2016.pdf Minimal Mycoplasma]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GenomeTransplantation.pdf Genome Transplantation]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/JCVI-1.0.pdf JCVI-1.0]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/OneStepAssemblyInYeast.pdf One step genome assembly in yeast]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/StrainsFromYeastGenomicClones.pdf New cells from yeast genomic clones]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.pdf A new cell from a chemically synthesized genome], [http://www.marcottelab.org/users/BCH394P_364C_2024/NewCellFromChemicalGenome.SOM.pdf SOM]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSynthCsome.pdf 1/2 a synthetic yeast chromosome] and [http://syntheticyeast.org/ Build-A-Genome]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Science-2014-Annaluru-55-8.pdf Entire synthetic yeast chromosome] <br />
* [http://science.sciencemag.org/content/355/6329/1040.long Sc 2.0, as of 2017], with the [http://science.sciencemag.org/content/355/6329/1038 computational genome design]<br />
* [http://en.wikipedia.org/wiki/Gillespie_algorithm The Gillespie algorithm]<br />
* [https://www.igem.org/Main_Page iGEM], and an example part ([http://parts.igem.org/Featured_Parts:Light_Sensor the light sensor])<br />
* [http://www.popsci.com/diy/article/2013-08/grow-photo Take your own coliroids]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/repressilator.pdf The infamous repressilator]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BacterialPhotography.pdf Bacterial photography], and [http://www.marcottelab.org/users/BIO337_2014/UTiGEM2012.pdf UT's 2012 iGEM entry]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EdgeDetector.pdf Edge detector]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt.2510.pdf A nice example of digital logic]<br />
[https://colossal.com/ Food for thought]<br />
--><br />
<br />
<!--<br />
'''April 11, 2024 - Orthologs and Phenologs'''<br />
* '''Remember: The final project web page is due by 10PM April 17, 2024, turned in as a URL emailed to the TA+Professor. Please indicate in the email if you are willing to let us post the project to the course web site. Also, note that ''late days can't be used for the final project'' ''' <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Phenologs_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/paper-pdfs/PNAS_Phenologs_2010.pdf Phenologs] and the [http://www.marcottelab.org/paper-pdfs/PLoSBiology_TBZ_2012.pdf drug discovery story] we'll discuss in class. This is a fun example of the power of opportunistic data mining aka [http://researchparasite.com/ "research parasitism"] in biomedical research.<br />
* Search for phenologs [http://www.phenologs.org/ here]. You can get started by rediscovering the plant model of Waardenburg syndrome. Search among the known diseases for "Waardenburg", or enter the human genes linked to Waardenburg (Entrez gene IDs 4286, 5077, 6591, 7299) to get a feel for how this works.<br />
Tools for finding orthologs:<br><br />
* One good tool for discovering orthologs is [https://inparanoidb.sbc.su.se/ InParanoid]. Note: InParanoid annotation lags a bit, so you'll need to find the [http://www.ensembl.org/index.html Ensembl] protein id, or try a text search for the common name. Or, just link there from [http://www.uniprot.org/ Uniprot]. InParanoid tends towards higher recall, lower precision for finding orthologs. Approaches with higher precision include [http://omabrowser.org/oma/home/ OMA] (introduced in [http://www.marcottelab.org/users/BCH394P_364C_2024/OMA.pdf this paper]), [http://phylomedb.org/ PhylomeDB], and [http://eggnogdb.embl.de/#/app/home EggNOG]. The various algorithms basically have different trade-offs with regard to precision vs recall, and ease of use. For example, we use EggNOG in the lab for annotating genes in new genomes/transcriptomes because the EggNOG HMM ortholog models are easily downloadable/re-run on any set of genes you happen to be interested in.<br />
* All (well, at least some) of [http://www.marcottelab.org/users/BCH394P_364C_2024/Sonnhammer2002TiG.pdf your ortholog definition questions answered!]<br />
--><br />
<br />
<!--<br />
'''Apr 11, 2024 - Deep learning'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=AOYsDhsAAAAJ&view_op=list_works&sortby=pubdate Dr. Claire McWhite], who is a Lewis-Sigler Fellow at Princeton where she develops protein language models using deep learning. She previously completed her B.S. at Rice University, interned at the National Cancer Institute, earned her Ph.D. at UT Austin working extensively in computational biology and proteomics, and appeared as a contestant in [http://bahfest.com/houston2017/ BahFest].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/ClaireMcWhite-BCH394p-364c_2024.pdf Today's slides] <br />
* [https://www.youtube.com/watch?v=CfAL_cL3SGQ Why neural networks aren't neural networks]<br />
--><br />
<br />
<!--<br />
'''Apr 9, 2024 - Networks'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Networks_Spring2024.pdf Today's slides]<br />
* Metabolic networks: [https://web.expasy.org/pathways/ The wall chart] (it's interactive. For example, can you find enolase?), the [https://metabolicatlas.org/ human metabolic reaction network], a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/ChIP-profiling-review.pdf mapping transcriptional networks by Chip-SEQ] (with the current record holder in this regard probably held by [https://www.encodeproject.org/ ENCODE]), and a review of [http://www.marcottelab.org/users/BCH394P_364C_2024/PPIsAndDiseaseReview.pdf protein interaction mapping in humans] and how it is informing disease genetics.<br />
* Useful gene network resources include:<br />
** [http://www.reactome.org/ Reactome]), which we've seen before, links human genes according to reactions and pathways, and also calculated functional linkages from various high-throughput data.<br />
** [https://www.inetbio.org/humannet/ HumanNet] (older versions for other organisms at [https://netbiolab.org/w/Software netbiolab.org] and [http://www.functionalnet.org FunctionalNet]), which provides interactive searches of a human functional gene network. The earlier versions helped my own group find genes for a wide variety of biological processes. <br />
** [http://string-db.org/ STRING] is available for many organisms, including large numbers of prokaryotes. Try searching on the <i>E. coli</i> enolase (Eno) as an example.<br />
** [http://www.genemania.org/ GeneMania], which aggregates many individual gene networks.<br />
** The best interactive tool for network visualization is [http://www.cytoscape.org/ Cytoscape]. You can download and install it locally on your computer, then visualize and annotated any gene network, such as are output by the network tools linked above. There is also a web-based network viewer that can be incorporated into your own pages (e.g., as used in [http://www.inetbio.org/yeastnet/ YeastNet]). Here's an example file to visualize, the [http://humap2.proteincomplexes.org/static/downloads/humap2/humap2_protein_complex_map_20200821.cys human protein complex map] from [http://humap2.proteincomplexes.org/ Hu.MAP2].<br />
** Clustering algorithms can be applied to networks. For example, we frequently use the [http://www.marcottelab.org/users/BCH394P_364C_2024/WalktrapAlgorithm.pdf Walktrap algorithm] developed by Pascal Pons and Matthieu Latapy, which is available in the Python iGraph library. Here's [https://towardsdatascience.com/detecting-communities-in-a-language-co-occurrence-network-f6d9dfc70bab a nice blog demonstration] using it.<br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/YeastSGA-2016.pdf The Yeast SGA map]<br />
* [http://www.marcottelab.org/paper-pdfs/Cell_PlantComplexes_2020.pdf The pan-plant PPI map]<br />
* [http://www.marcottelab.org/paper-pdfs/ng-fraser-review.pdf Functional networks]<br />
* [http://www.marcottelab.org/paper-pdfs/JProteomics_GBAReview_2010.pdf Review of predicting gene function and phenotype from protein networks]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-NetworkVisualization.pdf Primer on visualizing networks]<br />
--><br />
<br />
<!--<br />
'''Apr 4, 2024 - Principal Component Analysis (& the curious case of European genotypes)'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_PCA_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EuropeanGenesPCA.pdf European men, their genomes, and their geography]<br />
* [http://projector.tensorflow.org/ The tSNE interactive visualization tool also performs PCA]<br />
* Relevant to today's lecture for his eponymous distance measure: [http://en.wikipedia.org/wiki/Prasanta_Chandra_Mahalanobis Mahalanobis]<br />
A smattering of links on PCA:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBT_primer_PCA.pdf NBT Primer on PCA]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/PrincipalComponentAnalysis.docx A PCA overview (.docx format)] & the [http://horicky.blogspot.com/2009/11/principal-component-analysis.html original post]<br />
* Science Signaling (more specifically, Neil R. Clark and Avi Ma’ayan!) had a nice introduction to PCA that I've reposted [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToPCA.pdf here] (with [http://www.marcottelab.org/users/BCH394P_364C_2024/2001967Slides-FINAL.ppt slides])<br />
* Python code for [http://sebastianraschka.com/Articles/2015_pca_in_3_steps.html performing PCA yourself]. This example gives a great intro to several important numerical/statistical/data mining packages in Python, including pandas and numpy.<br />
--><br />
<br />
<!--<br />
'''Apr 2, 2024 - Classifiers'''<br />
* [https://twitter.com/JedMSP/status/1247920130941538304 A topical tSNE visualization]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_Classifiers_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/MachineLearningReview.pdf A nice review explaining Support Vector Machines and k-NN classifiers]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/AMLALLclassification.pdf Classifying leukemias], and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6036716/ a 2018 review] and [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8000474/ 2021 review] of how that field has led to commercial cancer diagnostics, such as the Prosigna breast cancer diagnostic. If you're curious, the authors of the AMLALL classification paper [http://www.marcottelab.org/users/BCH394P_364C_2024/LanderGolubPatentOnExpressionClassification.pdf patented their approach]<br />
* For those of you interesting in trying out classifiers on your own, here's the best stand-alone open software for do-it-yourself classifiers and data mining: [http://www.cs.waikato.ac.nz/ml/weka/ Weka]. There is a great introduction to using Weka in this book chapter [http://link.springer.com/protocol/10.1007/978-1-4939-3578-9_17 Introducing Machine Learning Concepts with WEKA], as well as the very accessible Weka-produced book [http://www.cs.waikato.ac.nz/ml/weka/book.html Data Mining: Practical Machine Learning Tools and Techniques].<br />
* & to do this directly in Python, there's a really excellent library of simple, easy-to-use, classification, regression, machine learning and data mining tools called [https://scikit-learn.org/stable/ scikit-learn]. I highly recommend using scikit-learn in combination with the [https://pandas.pydata.org/ pandas library], which makes it easy to work with large, tabular datasets. Here's [https://www.youtube.com/watch?v=PcvsOaixUh8 a helpful pandas tutorial] to get you started.<br />
--><br />
<br />
<!--<br />
'''Mar 28, 2024 - Proteomics'''<br />
* Guest speaker: [https://scholar.google.com/citations?hl=en&user=vnlxkVwAAAAJ&view_op=list_works Dr. Peter Faull], who earned his Ph.D. at the University of Edinburgh and subsequently served as Head of Proteomics at the MRC UK Clinical Sciences Centre and as a senior lab research scientist at the Francis Crick Institute in London before joining us at UT, where he now serves as Principal Proteomics Scientist in the [https://research.utexas.edu/cbrs/cores/bms/ UT Biological Mass Spectrometry core].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/IntroToProteomics2-03-24-2024.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Mar 26, 2024 - 3D Protein Structure Modeling'''<br />
* '''Reminder: Your project topic is due today, and Problem Set #3 is due tomorrow.'''<br />
* Guest speaker: [https://sites.cns.utexas.edu/zhanglab/bio Prof. Y. Jessie Zhang], an expert on RNA polymerase, its post-translational modifications, and their effects on eukaryotic transcription. She combines experimental structure determination by X-ray crystallography with computational structure prediction using techniques like AlphaFold, and will talk about protein 3D structure modeling and prediction.<br />
* 3D macromolecular structural modeling software: [https://www.cgl.ucsf.edu/chimerax/ UCSF ChimeraX], the [https://www.rosettacommons.org/software Rosetta] software suite, and [http://www.marcottelab.org/users/BCH394P_364C_2024/RosettaReview.pdf an overview] of what it can do for you, and last but not least: [https://alphafold.ebi.ac.uk/ AlphaFold predicted structures] and the [https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb AlphaFold colab] where you can run your own structure predictions.<br />
* & a few other useful 3D structure tools: The [http://www.rcsb.org/ Protein Data Bank], [https://salilab.org/modeller/ MODELLER], and [http://www.pymol.org/ Pymol]<br />
--><br />
<br />
<!--<br />
'''Mar 21, 2024 - Clustering II'''<br />
* We'll be continuing the slides from last time<br />
* I'm also posting the next (last) problem set:<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/ProblemSet3_2024.pdf '''Problem Set 3], due before 10PM Mar. 22, 2024'''. You will need the following software and datasets:<br><br />
* The clustering software is available [https://software.broadinstitute.org/morpheus/ here]. There is an alternative package [http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm here] that you can download and install on your local computer if you prefer.<br> <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteins.fasta Amino acid sequences of 1832 human proteins]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsPhyloprofiles.txt Human protein phylogenetic profiles]. These data come from [http://www.marcottelab.org/users/BCH394P_364C_2024/CiliaPhyloProfiles.pdf this paper].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/1832HumanProteinsCFMS.txt Human protein co-fractionation/mass spectrometry profiles]. These data come from [http://www.marcottelab.org/paper-pdfs/Nature_AnimalComplexes_2015.pdf this paper].<br />
Reading:<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nature_review_2000.pdf Review of phylogenetic profiles]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FuzzyK-Means.pdf Fuzzy k-means]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SOM-geneexpression.pdf SOM gene expression]<br />
** Links to various applications of SOMs: [http://en.wikipedia.org/wiki/Self-organizing_map 1], [http://vizier.u-strasbg.fr/kohonen.htx 2], [http://wn.com/Self_Organizing_Maps_Application 3]. You can run SOM clustering with the [http://bonsai.hgc.jp/~mdehoon/software/cluster Open Source Clustering package] with the '-s' option, or GUI option (here's the [http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/SOM.html#SOM manual]). (FYI, it also supports PCA). If you are not happy with Cluster's SOM function, the statistical package R also provides a package for calculating SOMs (http://cran.r-project.org/web/packages/som/index.html). <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/tSNE.pdf t-SNE] and [https://umap-learn.readthedocs.io/en/latest/how_umap_works.html UMAP]<br />
** Links to various applications of t-SNE: [https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding 1], [http://lvdmaaten.github.io/tsne/ 2], [https://www.youtube.com/watch?v=RJVL80Gg3lA 3], [http://distill.pub/2016/misread-tsne/ 4]. You can run t-SNE and UMAP on the [http://projector.tensorflow.org/ following web site]. <br />
--><br />
<br />
<!--<br />
'''Mar 19, 2024 - Functional Genomics & Data Mining - Clustering I'''<br />
* '''Due March 21 by email to the TA+Instructor''' - One to two (full) paragraphs describing your plans for a final project, along with the names of your collaborators. Please limit to no more than 3 per group, please. It's also fine to do this independently, if you prefer. (Do you have a particular skill/interest/exciting dataset you need help analyzing? There is a class_projects channel on the slack where you can ask around for partners.) This assignment (planning out your project) will account for 5 points out of your 25 total points for your course project. Here are a few examples of final projects from previous years: [https://sites.google.com/view/bioinformaticsproject/introduction-and-goals?authuser=0 1] [https://sites.google.com/view/bch394ssy/home 2] [https://sites.google.com/view/bch394p-project/home 3] [https://sites.google.com/site/modelingpyrosequencingerror/ 4] [http://sites.google.com/site/pathtarandmore/ 5] [http://sites.google.com/site/zlutexas/Home/project-for-ch391l 6] [https://sites.google.com/view/subcellularloc/projects 7] [https://sites.google.com/utexas.edu/voigt-final-project/home?authuser=0 8] [https://sites.google.com/site/ch391lchipseq/ 9] [https://sites.google.com/utexas.edu/oishika-das-bioinformatics-pro/home 10] [https://sites.google.com/site/biogridviewer/home 11] [https://sites.google.com/a/utexas.edu/immunoglobulin-team/home 12] [https://metabolicnetworkpathways.wordpress.com/ 13] [https://sites.google.com/a/utexas.edu/quantum-tunneling-on-enzymatic-kinetics/home 14]<br> <br />
* Science news of the day: [https://www.cell.com/cell/fulltext/S0092-8674(23)00107-1 The genome of Antarctic krill (the crustacean E. superba) has been sequenced] and is crazy. It's 48 Gb in size, so 15x the human genome (!), one of the largest genomes ever assembled. And >92% of that is repetitive DNA. Solved with a combination of short and long read DNA sequencing.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P_364C_LargeScaleExperiments_Spring2024.pdf Today's slides]<br />
Reading:<br><br />
* [http://en.wikipedia.org/wiki/Cluster_analysis Clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-MicroarrayClustering.pdf Primer on clustering]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/K-means-Example.ppt K-means example (.ppt)]<br />
* Here's [https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa a nice explanation] of some of the various distance measures used for clustering<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Bcelllymphoma.pdf B cell lymphomas]<br />
* [http://en.wikipedia.org/wiki/RNA-Seq RNA-Seq]<br />
--><br />
<br />
<!--<br />
'''Mar 12,14, 2024 - SPRING BREAK'''<br />
* Don't forget to turn in the proposal for your course project by '''March 21st''' and finish HW3 by '''March 22nd'''.<br />
--><br />
<br />
<!--<br />
'''Mar 7, 2024 - Motifs'''<br />
* We'll talk about motif finding today. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Motifs_Spring2024.pdf Today's slides]<br />
* Wordle as an excuse to learn about [https://www.youtube.com/watch?v=v68zYyaEmEA information theory & entropy] and [https://www.youtube.com/watch?v=OvTriQWQvUg sequence logos and motifs]!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0406-423-primer-whataremotifs.pdf NBT Primer - What are motifs?]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/nbt0806-959-primer-howdoesmotifdiscoverywork.pdf NBT Primer - How does motif discovery work?]<br />
* [http://www.rcsb.org/pdb/explore/explore.do?structureId=1L1M The biochemical basis of a particular motif]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GibbsSampling.pdf Gibbs Sampling]<br />
--><br />
<br />
<!--<br />
'''Mar 5, 2024 - NGS analysis best practices'''<br />
* Homework #3 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10:00PM March 9'''. In past years, we've run into problems with Rosalind timing out before Meme completes although it usually runs eventually, so be warned you may have to try it a couple of times. Meme also runs faster using the "zero to one" or "one" occurrence per sequence option, rather than the "any number of repeats" option.<br />
* Guest speaker: [https://www.linkedin.com/in/anna-battenhouse-abba1/ Anna Battenhouse] from the [https://research.utexas.edu/cbrs/ Center for Biomedical Research Support], where she maintains the [https://wikis.utexas.edu/display/RCTFusers Biomedical Research Computing Facility]. <br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/2024-02-NGS_IntroForEdM.pdf Today's slides]<br />
--><br />
<br />
<!--<br />
'''Feb 29, 2024 - Genome Assembly/Mapping II'''<br><br />
* We're finishing up the slides from last time. Note that we give short shrift to read mapping/alignment algorithms, of which there are now [https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Short-Read_Sequence_Alignment a very long list]. Here's an interesting discussion by Lior Pachter of the [https://liorpachter.wordpress.com/2015/11/01/what-is-a-read-mapping/ major developments in that field.]<br />
* Here is [https://web.archive.org/web/20221208084304/http://blog.thegrandlocus.com/2016/07/a-tutorial-on-burrows-wheeler-indexing-methods an excellent explanation (now archived) of how the BWT relates to a suffix tree and enables fast read mapping to a genome]<br />
* If you want a more detailed explanation, the [http://www.marcottelab.org/users/BCH394P_364C_2024/BWApaper.pdf BWA paper] more formally describes how the Burrows–Wheeler transform can be used to construct an index.<br />
Supporting reading:<br><br />
* Two notable advances in genome assembly: [http://www.marcottelab.org/users/BCH394P_364C_2024/StringGraphAssembly.pdf String Graphs] and more recently, [http://www.marcottelab.org/users/BCH394P_364C_2024/MultiplexDeBruijnGraphs.pdf multiplexed De Bruijn graphs]. Both have been used to assemble a [http://www.marcottelab.org/users/BCH394P_364C_2024/CompleteHumanGenomeSequence.pdf fully complete human genome sequence] (check out the [https://www.biorxiv.org/content/biorxiv/early/2021/05/27/2021.05.26.445798/F2.large.jpg?width=800&height=600&carousel=1 beautiful string graph visualizations] of the final assemblies, which capture gapless telomere-to-telomere assemblies for all 22<br />
human autosomes and Chromosome X)<br />
--><br />
<br />
<!--<br />
'''Feb 27, 2024 - Genome Assembly'''<br />
* Science news of the day: [https://www.cell.com/molecular-cell/fulltext/S1097-2765(23)00075-8 New evidence for very short human ORFs coding for real microproteins & peptides]<br />
* & [https://twitter.com/simocristea/status/1626304239931912192?t=mH-gk3V7PLd7mvyZAgKzRw&s=03 A compilation of advances in the last 2 years on deep learning protein structure prediction]<br />
* Relevant to the last lecture, some definitions of [https://en.wikipedia.org/wiki/Sensitivity_and_specificity sensitivity/specificity] & [https://en.wikipedia.org/wiki/Precision_and_recall precision/recall]. Note that the gene finding community settled early on to a different definition of specificity that corresponds to the precision or PPV in other fields. Other fields define specificity as the true negative rate.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GenomeAssembly_Spring2024.pdf Today's slides]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnPrimer.pdf DeBruijn Primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/DeBruijnSupplement.pdf Supplement]<br />
--><br />
<br />
<!--<br />
'''Feb 26, 2024''' - Apologies, no office hours today. Feel free to reach out by email or attend the TA office hours this week.<br />
--><br />
<br />
<!--<br />
'''PROBLEM SET #2 ANNOUNCEMENT'''<br />
* If you would like a few examples of proteins annotated with their transmembrane and soluble regions (according to UniProt) to help troubleshoot your homework, here are some [http://www.marcottelab.org/images/5/5a/Annotated_peptides.txt example yeast protein sequences].<br />
--><br />
<br />
<!--<br />
'''Feb 22, 2024 - Gene finding II'''<br />
* [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ Short classes at UT] start this week in genome sequencing, proteomics, and bioinformatics<br />
* Several of you have asked about programming the Viterbi algorithm for the homework, so I wanted to make sure everyone realized that you are not required to program it. The sequence is short enough that you can solve it in a spreadsheet if that's easier for you.<br />
* We're finishing up the slides from last time.<br />
Reading:<br><br />
* Reposting this so it doesn't fall through the cracks: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf The current state of gene annotation]<br />
* [https://news.usc.edu/16163/he-s-got-algorithm/ Why do we call it the Viterbi algorithm?]<br />
--><br />
<br />
<!--<br />
'''Feb 20, 2024 - Gene finding'''<br />
* Happy Valentine's Day!<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-GeneFinding-Spring2024.pdf Today's slides on gene finding] <br />
* A nice commentary on gene finding: [http://www.marcottelab.org/users/BCH394P_364C_2024/2019StateOfGeneAnnotation.pdf Next-generation genome annotation: we still struggle to get it right]<br />
* For a few more examples of HMMs in action, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/MinionHumanGenome.pdf paper on sequencing the human genome by nanopore], which used HMMs in 3-4 different ways for polishing, contig inspection, repeat analysis and 5-methylcytosine detection.<br />
* [http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr11%3A61755389-61788517&hgsid=477602291_ccTRfcOcZIQHnMkBKGzbQLBRc6HL The UCSC genome browser]<br />
* A few useful links about programming: [http://www.marcottelab.org/users/BCH394P_364C_2024/GoodEnoughPracticesInScientificComputing.pdf Recommendations for "good enough" programming habits] and a great [https://www.youtube.com/playlist?list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7 Python beginners Youtube tutorial]<br />
Reading (a couple of old classics + a review + better splice site detection):<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EukGeneAnnotation.pdf Eukaryotic gene finding], [http://www.marcottelab.org/users/BCH394P_364C_2024/GeneMark.hmm.pdf GeneMark.hmm], and [http://www.marcottelab.org/users/BCH394P_364C_2024/BurgeKarlin-main.pdf GENSCAN]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/SplicingAI-jaganathan2019.pdf Deep learning for splice set identification]<br />
--><br />
<br />
<!--<br />
'''Feb 15, 2024 - HMMs II'''<br />
* Science news of the day: [https://doi.org/10.1101/2024.01.24.525373 a fun preprint] illustrating the scale of efforts to identify protein families. This one clustered "19 billion sequences in 18 days on 27 high performance computing nodes, using 250,000 CPU hours in total". In all, they found 544 million sequence families (clusters) capturing ~94% of all known proteins, giving a sense of the overall size of the universe of proteins.<br />
'''Problem Set 2, due before 10 PM, Feb. 20, 2024''':<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_ProblemSet2_Spring2024.pdf '''Problem Set 2''']. <br />
* You'll need these 3 files: [http://www.marcottelab.org/users/BCH394P_364C_2024/state_sequences State sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/soluble_sequences Soluble sequences], [http://www.marcottelab.org/users/BCH394P_364C_2024/transmembrane_sequences Transmembrane sequences]<br />
* Link to [http://setosa.io/blog/2014/07/26/markov-chains/ a great interactive visualization of Markov chains], by Victor Powell & Lewis Lehe. It's worth checking out to build some intuition. They correctly point out that [https://en.wikipedia.org/wiki/PageRank Google's PageRank algorithm] is based on Markov chains. There, the ranking of pages in a web search relates to how random walks across linked web pages spend more time on some pages than on others.<br />
* A non-biological example of using log odds ratios & Bayesian stats [https://priceonomics.com/how-statistics-solved-a-175-year-old-mystery-about/ to learn the authors of the Federalist Papers]. In a related example, [https://arstechnica.com/science/2024/02/lost-and-found-code-breakers-decipher-50-letters-of-mary-queen-of-scots/ researchers just decoded >50 coded letters from a French archive] and discovered they were lost correspondence from Mary, Queen of Scots, before she was executed in 1587 for treason against Elizabeth I. The researchers used an approach closely related to computing log odds ratios of 5-mer frequencies between putative decoded texts and known free text to figure out the correct ciphers. If you're curious, you can read about it in [https://www.tandfonline.com/doi/full/10.1080/01611194.2022.2160677 Appendix A of their paper]<br />
--><br />
<br />
<!--<br />
'''Feb 13, 2024 - Hidden Markov Models'''<br />
* Don't forget: Rosalind Homework #2 (worth 10% of your final course grade) is '''due by 10 PM February 8'''. Note: choose one of the two protein translation problems and see the update below on the IUPAC code example.<br />
* More stats for comp biologists worth checking out: [https://www.huber.embl.de/msmb/ Modern Statistic for Modern Biology], by Susan Holmes and Wolfgang Huber. It's currently available online and [https://www.amazon.com/Modern-Statistics-Biology-Susan-Holmes/dp/1108705294/ available on dead tree]. (FYI, all code is in R.)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-HMMs-Spring2024.pdf Today's slides]<br><br />
Reading:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-HMMs.pdf HMM primer] and [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-Bayes.pdf Bayesian statistics primer #1], [http://www.marcottelab.org/users/BCH394P_364C_2024/BayesPrimer-NatMethods.pdf Bayesian statistics primer #2], [http://en.wikipedia.org/wiki/Bayes'_theorem Wiki Bayes]<br />
* Care to practice your [http://en.wikipedia.org/wiki/Regular_expression regular expressions]? (In [https://www.tutorialspoint.com/python3/python_reg_expressions.htm python?] & a [https://www.pcwdld.com/python-regex-cheat-sheet Python regexp cheat sheet])<br />
--><br />
<br />
<!--<br />
'''ROSALIND ANNOUNCEMENT'''<br />
* It looks like some people are struggling with the Rosalind problem titled ''Protein Translation''. As an alternative option, I've assigned a problem titled ''Translating RNA into Protein''. Choose one; you'll get credit regardless of which of them you do. Also, it looks like the problem titled "Complementing a Strand of DNA" uses a now out-of-date call for IUPAC codes in the Programming Shortcut. Just delete the "from Bio.Alphabet import IUPAC" line & delete the ", IUPAC.unambiguous_dna" portion of the Seq() functions and it should work fine.<br />
--><br />
<br />
<!--<br />
'''Feb 8, 2024 - We'll have a guest lecture from your TA Matt McGuffie on advancing your Python data analysis skills'''<br />
* '''WEATHER WARNING #2: Change of plans!''' UT has now officially canceled in-person classes, but more to the point, >100,000 people have lost power in Austin today. We're going to cancel the live zoom class tomorrow, and Matt will instead record the lecture and upload it to Canvas for viewing.<br />
* Matt is an expert in the bioinformatic analyses of plasmid sequences and developed the popular [http://plannotate.barricklab.org/ pLannotate tool] to annotate and visualize plasmid features, based on a large database of genetic parts and protein sequences. Funny enough, he first described an early version of pLannotate as his project for this class back in 2019. He'll be introducing several useful Python libraries, including the Pandas package for handling large tables and a data visualization library for plotting data.<br />
--><br />
<br />
<!--<br />
'''Jan 6, 2024 - Biological databases'''<br />
* WEATHER WARNING: UT just announced a campus closure for the morning, so for those of you that are able to attend online, I'll plan to hold it at the normal time on the class zoom channel (link available on Canvas). However, for those that can't make it, don't stress! We'll record the lecture and post the video to Canvas so that you can watch it later. Note: the next Rosalind homework is assigned below.<br />
* Science news of the day: [https://www.theguardian.com/science/2024/jan/26/science-journals-ban-listing-of-chatgpt-as-co-author-on-papers Cell, Nature, Science, eLife, and the Lancet ban listing ChatGPT as a co-author]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BiologicalDatabases-Spring2024.pdf Today's slides]<br><br />
Homework #2 (worth 10% of your final course grade) has been assigned on Rosalind and is '''due by 10 PM February 8''':<br />
* Besides giving a bit more programming experience, these questions will also introduce you to the [https://biopython.org/ BioPython] Python library (see the "programming shortcuts" at the bottom of several questions). If you need to install BioPython on your computer, open an Anaconda prompt window (on a PC) or launch a console window from the Anaconda Navigator & type "pip install biopython". (You can use this approach to install most Python libraries.) There's a very useful tutorial [http://biopython.org/DIST/docs/tutorial/Tutorial.html here] (also downloadable as a [http://biopython.org/DIST/docs/tutorial/Tutorial.pdf pdf file])<br />
Extra reading/classes:<br><br />
* Just a note that we'll be seeing ever more statistics as go on. Here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/StatisticsPrimer.pdf good primer] from [https://stat.utexas.edu/people/lauren-ancel-meyers Prof. Lauren Ancel Myers] (who leads the [https://covid-19.tacc.utexas.edu/ UT Austin COVID-19 Modeling Consortium]) to refresh/explain basic concepts.<br />
* Finally, here's great opportunity to hone your Python skills a bit more: The UT CBRS cores will offer [https://research.utexas.edu/cbrs/classes/short-courses/spring-2024-semester/ short courses] in Python, Unix, and Python for Data Sciences starting in March.<br />
--><br />
<br />
<!--<br />
'''Feb 1, 2024 - BLAST'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-BLAST-Spring2024.pdf Our slides today] are modified from a paper on [http://dx.doi.org/10.1371/journal.pbio.1001014 Teaching BLAST] by Cheryl Kerfeld & Kathleen Scott.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLAST.pdf The original BLAST paper]<br />
* [http://www.marcottelab.org/paper-pdfs/jmb-lgl.pdf The protein homology graph paper]. Just for fun, here's a [http://www.marcottelab.org/users/BCH394P_364C_2024/PHGinMoMA.png stylized version] of this plot that we exhibited in the engaging [https://www.moma.org/calendar/exhibitions/58 Design and the Elastic Mind] show at New York's Museum of Modern Art, now in their permanent collection.<br />
--><br />
<br />
<!--<br />
'''Jan 30, 2024 - Sequence Alignment II'''<br />
* We'll be finishing up slides from last time. <br />
* '''Problem Set 1 clarification:''' for problems asking for "nucleotide frequencies" = turn in the absolute count of each nucleotide (or dinucleotide) as well as the fractions or percentages of the total <br />
* Science news of the day: We're about 3 years from publication of the SARS-CoV-2 genome papers [https://doi.org/10.1038/s41586-020-2008-3 1] [https://doi.org/10.1038/s41586-020-2012-7 2]. The release of the genome sequences immediately launched the COVID vaccine design process. [https://www.nytimes.com/2022/01/15/health/mrna-vaccine.html Here's a great write-up in the NYT of the story of the vaccine development process], including the McLellan lab's key S2P double proline mutations introduced to stabilize the spike protein. It was just selected as [https://www.uspto.gov/blog/director/entry/recognizing-life-saving-covid-19 one of the winners of the USPTO Patents for Humanity award in the COVID-19 category]<br />
* For those of you who could use more tips on programming, '''the weekly peer-led open coding hour is starting up again'''! Every Monday, 3:30-4:30, in the MBB 2.232 lounge. It's a very informal setting where you can work and ask questions of more experienced programmers.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/FactAndFictionInAlignment.png Fact and Fiction in Sequence Alignments]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-DynamicProgramming.pdf Dynamic programming primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/GALPAS.xls An example of dynamic programming using Excel], created by [https://hoffmanlab.org/ Michael Hoffman] (a former U Texas undergraduate, now U Toronto professor, who took a prior incarnation of this class)<br />
* A few examples of proteins with internally repetitive sequences: [http://www.pdb.org/pdb/explore/explore.do?structureId=1QYY 1], [http://www.pdb.org/pdb/explore/explore.do?structureId=2BEX 2], [http://www.pdb.org/pdb/explore/explore.do?structureId=1BKV 3]<br />
--><br />
<br />
<br />
'''Jan 25, 2024 - Sequence Alignment I'''<br />
* Reminder relevant to our discussion of ChatGPT last class: CNET & other news sources used it to write articles; [https://gizmodo.com/cnet-ai-chatgpt-news-robot-1849996151 this Gizmodo story] found that "the AI-program fabricates information and bungles facts like nobody’s business" and CNET was "forced to issue multiple, major corrections". So, if you do opt to try ChatGPT to help with Python, be sure to check (and then double-check) everything.<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P-Spring2024-SequenceAlignment.pdf Today's slides]<br><br />
Problem Set I, due 10PM Feb. 5, 2024:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH364C-394P_ProblemSet1_Spring2024.pdf Problem Set 1]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Hinfluenzae.txt H. influenzae genome]. [https://en.wikipedia.org/wiki/Haemophilus_influenzae Haemophilus influenza] was the first free living organism to have its genome sequenced. '''NOTE: there are some additional characters in this file from ambiguous sequence calls. For simplicity's sake, when calculating your nucleotide and dinucleotide frequencies, you can just ignore anything other than A, C, T, and G.''' Also, if you prefer a .fasta format file (e.g. for BioPython), just add a first line to the text file starting with a ">" character, e.g. "> Hinfluenzae genome file".<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/Taquaticus.txt T. aquaticus genome]. [https://en.wikipedia.org/wiki/Thermus_aquaticus Thermus aquaticus] helped spawn the genomic revolution as the source of heat-stable Taq polymerase for PCR.<br />
* 3 mystery genes (for Problem 5): [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene1.txt MysteryGene1], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene2.txt MysteryGene2], [http://www.marcottelab.org/users/BCH394P_364C_2024/MysteryGene3.txt MysteryGene3]<br><br />
* '''*** HEADS UP FOR THE PROBLEM SET ***''' If you try to use the Python string.count function to count dinucleotides, Python counts '''non-overlapping''' instances, not '''overlapping''' instances. So, ''AAAA'' is counted as 2, not 3, dinucleotides. You want '''overlapping''' dinucleotides instead, so will have to try something else, such as the python string[counter:counter+2] command, as explained in the Rosalind homework assignment on strings.<br />
Extra reading, if you're curious:<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NBTPrimer-BLOSUM.pdf BLOSUM primer]<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM_paper.pdf The original BLOSUM paper] (hot off the presses from 1992!)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BLOSUM62Miscalculations.pdf BLOSUM miscalculations improve performance]<br />
* There is a good discussion of the alignment algorithms and different scoring schemes [http://www.bioinformaticsonline.org/ch/ch03/supp-all.html here]<br />
<br />
<br />
'''Jan 23, 2024 - Intro to Python II'''<br />
* Reminder that today will be part 2 of the "Python boot camp" for those of you with little to no previous Python coding experience. We'll be finishing the slides from last time, plus Rosalind help & programming Q/A.<br />
* *** Rosalind assignments are '''due by 10 PM January 24'''. ***<br />
* We'll talk a bit about [https://chat.openai.com/ ChatGPT] today for co-programming<br />
* Another strong recommendation (really) to the Python newbies to download Eric Matthes's GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] that he provides to accompany his [https://nostarch.com/pythoncrashcourse2e Python Crash Course] book.<br />
<br />
<br />
'''Jan 18, 2024 - Intro to Python'''<br />
* '''Remember that today and the next lecture are dedicated to the Python Boot Camp to start getting those of you with minimal coding skills up to speed on the basics. Advanced programmers can skip class!'''<br><br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-PythonPrimer-Spring2024.pdf Today's slides].<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/EcoliGenome.txt E. coli genome] (formatted as a text file with no extra lines; updated on Jan 23 to be the version matching the slides)<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/NewEcoli_genome.fasta E. coli genome] (formatted as a fasta file, which only differs here in having a header)<br />
* Don't forget that the Rosalind assignments are due by 10 PM January 24. Please do start if you haven't already, or you won't have time to get help if you have any issues installing Python. <br />
* We'll use Python version 3 (any version after 3.0 should be fine; just get the latest version in Anaconda), but Rosalind and some older materials are only available in Python 2.7, so we'll generally try to be version agnostic for compatibility. For beginners, the [http://www.practicepython.org/blog/2017/02/09/python2-and-3.html differences are quite minimal] and are [https://www.guru99.com/python-2-vs-python-3.html summarized in a table here]. There's also a great [https://python-future.org/compatible_idioms.html cheat sheet here] for writing code compatible with both versions.<br />
<br />
<br />
'''Jan 16, 2024 - Introduction'''<br />
* [http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C-IntroAndRosalind-Spring2024.pdf Today's slides]<br><br />
* We'll be conducting homework using the online environment [http://rosalind.info/faq/ Rosalind]. Go ahead and register on the site, and enroll specifically for BCH394P/364C (Spring 2024) Systems Biology/Bioinformatics using [https://rosalind.info/classes/enroll/07025c28e6/ ''this link'']. Homework #1 (worth 10% of your final course grade) has already been assigned on Rosalind and is '''due by 10:00PM January 24'''.<br />
* We'll be using the free Anaconda distribution of Python and Jupyter (download [https://www.anaconda.com/download here]). Note that there are ''many'' other options out there, such as [https://colab.research.google.com/ Google colab]. You're welcome to use those, but we'll restrict our teaching and TA help sessions to Jupyter/Anaconda for simplicity.<br />
Here are some online Python resources that you might find useful:<br />
* First and foremost, and very, very useful if you're a complete Python newbie: Eric Matthes's [https://nostarch.com/pythoncrashcourse2e Python Crash Course book]. He made some GREAT, free [https://github.com/ehmatthes/pcc/releases/download/v1.0.0/beginners_python_cheat_sheet_pcc_all.pdf Python command cheat sheets] to support the book.<br />
* [https://dabeaz-course.github.io/practical-python/ Practical Python], worth checking out!<br />
* If you have any basic experience at all in other programming languages, Google offered an extremely good, 2-day intro course to Python (albeit version 2) that is now [https://www.youtube.com/playlist?list=PLC8825D0450647509 available on Youtube].<br />
* Khan Academy has archived their older intro videos on Python [https://www.youtube.com/user/khanacademy/search?query=python here] (again, version 2)<br><br />
<br />
== Syllabus & course outline ==<br />
<br />
[http://www.marcottelab.org/users/BCH394P_364C_2024/BCH394P-364C_Spring2024_syllabus.pdf Course syllabus]<br />
<br />
An introduction to systems biology and bioinformatics, emphasizing quantitative analysis of high-throughput biological data, and covering typical data, data analysis, and computer algorithms. Topics will include introductory probability and statistics, basics of Python programming, protein and nucleic acid sequence analysis, genome sequencing and assembly, proteomics, synthetic biology, analysis of large-scale gene expression data, data clustering, biological pattern recognition, and gene and protein networks.<br><br />
<br />
Open to graduate students and upper division undergrads (with permission) in natural sciences and engineering.<br />
Prerequisites: Basic familiarity with molecular biology, statistics & computing, but realistically, it is expected that students will have extremely varied backgrounds. Undergraduates have additional prerequisites, as listed in the catalog.<br><br />
<br />
''Note that this is not a course on practical sequence analysis or using web-based tools. Although we will use a number of these to help illustrate points, the focus of the course will be on learning the underlying algorithms, exploratory data analyses, and their applications, esp. in high-throughput biology. By the end of the course, students will know the fundamentals of important algorithms in bioinformatics and systems biology, will be able to design and implement computational studies in biology, and will have performed an element of original computational biology research.''<br><br />
<br />
Most of the lectures will be from research articles and slides posted online, with some material from the...<br><br />
'''Optional text (for sequence analysis):''' [http://www.amazon.com/exec/obidos/ASIN/0521629713/qid=999041246/sr=1-1/ref=sc_b_1/002-0505297-3336044 ''Biological sequence analysis''], by R. Durbin, S. Eddy, A. Krogh, G. Mitchison (Cambridge University Press),<br />
<br />
For biologists rusty on their stats, [http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025/ref=sr_1_1?s=books&ie=UTF8&qid=1295395775&sr=1-1 ''The Cartoon Guide to Statistics''] (Gonick/Smith) is very good. A reasonable online resource for beginners is [http://www.refsmmat.com/statistics/index.html Statistics Done Wrong]. A truly excellent stats book with a free download is [https://www.statlearning.com/ ''An Introduction to Statistical Learning''], by James, Witten, Hastie, Tibshirani, and Taylor, and is accompanied by many supporting Python examples and applications.<br><br />
<br />
Two other online probability & stats references: [http://www-users.york.ac.uk/~mb55/pubs/pbstnote.htm #1], [https://seeing-theory.brown.edu/index.html #2 (which has some lovely visualizations)]<br><br />
<br />
'''No exams will be given. Grades will be based on online homework (counting 30% of the grade), 3 problem sets (given every 2-3 weeks and counting 15% each towards the final grade) and an independent course project (25% of the final grade)''', which can be collaborative (1-3 students/project). The course project will consist of a research project on a bioinformatics topic chosen by the student (with approval by the instructor) containing an element of independent computational biology research (e.g. calculation, programming, database analysis, etc.). This will be turned in as a link to a web page. '''The final project is due by 10 PM, April 17, 2024. The last 3 classes will be spent presenting your projects to each other. (The presentation will account for 5/25 points of the project grade.)'''<br><br />
<br />
If at some point, we have to go into coronavirus lockdown, that portion of the class will be web-based. We will hold lectures by Zoom during the normally scheduled class time. Log in to the UT Canvas class page for the link, or, if you are auditing, email the TA and we will send the link by return email. Slides will be posted before class so you can follow along with the material. We'll record the lectures & post the recordings afterward on Canvas so any of you who might be in other time zones or otherwise be unable to make class will have the opportunity to watch them. Note that the recordings will only be available on Canvas and are reserved only for students in this class for educational purposes and are protected under FERPA. The recordings should not be shared outside the class in any form. Violation of this restriction could lead to Student Misconduct proceedings.<br />
<br />
Online homework will be assigned and evaluated using the free bioinformatics web resource [http://rosalind.info/faq/ Rosalind].<br><br />
<br />
All projects and homework will be turned in electronically and time-stamped. No makeup work will be given. Instead, all students have 5 days of free “late time” (for the '''entire semester''', NOT per project, and counting weekends/holidays). For projects turned in late, days will be deducted from the 5-day total (or what remains of it) by the number of days late (in 1-day increments, rounding up, i.e. 10 minutes late = 1 day deducted). Once the full 5 days have been used up, assignments will be penalized 10 percent per day late (rounding up), i.e., a 50-point assignment turned in 1.5 days late would be penalized 20%, or 10 points.<br><br />
<br />
Homework, problem sets, and the project total to a possible 100 points. There will be no curving of grades, nor will grades be rounded up. We’ll use the plus/minus grading system, so: A= 92 and above, A-=90 to 91.99, etc. Just for clarity's sake, here are the cutoffs for the grades: 92% = A, 90% = A- < 92%, 88% = B+ < 90%, 82% = B < 88%, 80% = B- < 82%, 78% = C+ < 80%, 72% = C < 78%, 70% = C- < 72%, 68% = D+ < 70%, 62% = D < 68%, 60% = D- < 62%, F < 60%.<br />
<br />
Students are welcome to discuss ideas and problems with each other, but '''all programs, Rosalind homework, problem sets, and written solutions should be performed ''independently'' ''' (except for the final collaborative project). Students are expected to follow the UT honor code. '''Cheating, plagiarism, copying, & reuse of prior homework, projects, or ''programs'' from CourseHero, Github, or any other sources are all ''strictly forbidden'' and constitute breaches of academic integrity and cause for dismissal with a failing grade, possibly expulsion ([https://deanofstudents.utexas.edu/conduct/academicintegrity.php UT's academic integrity policy]).''' In particular, no materials used<br />
in this class, including, but not limited to, lecture hand-outs, videos, assessments (papers, projects, homework assignments), in-class materials, review sheets, and additional problem sets, may be shared online or with anyone outside of the class unless you have the instructor’s explicit, written permission. Any materials found online (e.g. in CourseHero) that are associated with you, or any suspected unauthorized sharing of materials, will be reported to Student Conduct and Academic Integrity in the Office of the Dean of Students. These reports can result in sanctions, including failure in the course.<br />
<br />
The use of artificial intelligence tools (such as ChatGPT or Github co-pilot) in this class shall be permitted on a limited basis for programming assignments. You are also welcome to seek my prior-approval to use AI writing tools on any assignment. In either instance, AI writing tools should be used with caution and proper citation, as the use of AI should be properly attributed. Using AI writing tools without my permission or authorization, or failing to properly cite AI even where permitted, shall constitute a violation of UT Austin’s Institutional Rules on academic integrity.<br />
<br />
'''The final project website is due by 10 PM April 17, 2024'''<br />
<br />
* How to make a website for the final project <br />
** Google Site: https://sites.google.com/new<br />
** You might also consider [https://streamlit.io/ streamlit], which lets you generate websites on the fly direct from Python</div>Marcotte