1. Microarray Data Used in This Study
We integrated all yeast microarray data sets (up to January 2003), each containing at least 8 experiments, from Stanford Microarray Database (SMD) (19 cDNA array data sets corresponding to 19 SMD Subcategories) from the NCBI Gene Expression Omnibus (GEO) (4 Affymetrix data sets). We also included two cDNA array data sets (11390663 , 10657304) and the Rosetta Compendium data generated by Rosetta Inpharmatics Company. Besides Rosetta Compendium, all data sets each contained a set of expression profiles measured under relevant and coherent conditions. The Rosetta Compendium includes 300 knockout and chemical treatment experiments. Among those experiments, we only took experiments in which the deleted genes have known functions, and we further classify this subset of experiments into 14 data sets based on the GeneOntology (GO) biological process categories of the deleted genes. This resulted in a total of 39 data sets comprising 618 expression profiles. Below is the detailed description of the data sets.
# | Publication | Platform | Experiments |
---|---|---|---|
1 | 9843569 | cDNA(SMD) | Alpha factor release |
2 | 9843569 | cDNA(SMD) | cdc15 block release6 |
3 | 11102521 | cDNA(SMD) | DTT Exposure |
4 | 9843569 | cDNA(SMD) | Elutriation |
5 | 9843569 | cDNA(SMD) | Forkhead regulation |
6 | 11102521 | cDNA(SMD) | Gamma radiation |
7 | 11102521 | cDNA(SMD) | Menadione exposure |
8 | 11598186 | cDNA(SMD) | DNA damage (MMS) response |
9 | 11102521 | cDNA(SMD) | Nitrogen depletion |
10 | 11102521 | cDNA(SMD) | Nutrition limitation |
11 | 11102521 | cDNA(SMD) | Osmotic shock |
12 | 11455386 | cDNA(SMD) | SIR proteins (Chromatin Silencing) |
13 | 11102521 | cDNA(SMD) | Sorbitol effects |
14 | 11102521 | cDNA(SMD) | H2O2 response |
15 | 11102521 | cDNA(SMD) | Heat shock |
16 | 11102521 | cDNA(SMD) | Heat steady |
17 | 11206552 | cDNA(SMD) | CellCycle Factor |
18 | 11102521 | cDNA(SMD) | YPD Stationary phase |
19 | 11102521 | cDNA(SMD) | Zinc homoeostasis |
20 | 12875747 | Affymetrix(GEO) | Aging |
21 | 14555471 | Affymetrix(GEO) | Chitin synthesis |
22 | 12702272 | Affymetrix(GEO) | Fermentation time course |
23 | 12370439 | Affymetrix(GEO) | Ume6 regulon |
24 | 10929718 | Affymetrix(Rosetta Compendium) | Cell cycle control |
25 | 10929718 | Affymetrix(Rosetta Compendium) | Cell wall organization |
26 | 10929718 | Affymetrix(Rosetta Compendium) | Chromatin assembly |
27 | 10929718 | Affymetrix(Rosetta Compendium) | Ion homeostasis |
28 | 10929718 | Affymetrix(Rosetta Compendium) | Nucleotide metabolism |
29 | 10929718 | Affymetrix(Rosetta Compendium) | Organelle biogenesis |
30 | 10929718 | Affymetrix(Rosetta Compendium) | Perception of external stimulus |
31 | 10929718 | Affymetrix(Rosetta Compendium) | Protein biosynthesis |
32 | 10929718 | Affymetrix(Rosetta Compendium) | Protein degradation |
33 | 10929718 | Affymetrix(Rosetta Compendium) | Protein metabolism |
34 | 10929718 | Affymetrix(Rosetta Compendium) | Protein phosphorylation |
35 | 10929718 | Affymetrix(Rosetta Compendium) | Protein transport |
36 | 10929718 | Affymetrix(Rosetta Compendium) | Pseudohyphal growth |
37 | 10929718 | Affymetrix(Rosetta Compendium) | Steroid metabolism |
38 | 11390663 | cDNA(Rosetta Inpharmatics Company) | Pseudohyphal growth |
39 | 10657304 | cDNA(Rosetta Inpharmatics Company) | MAPK pathway |
2.Functional annotation of uncharacterized genes
The large number of functionally homogenous clusters identified by CODENSE provides a solid foundation for functional annotation of uncharacterized genes. For those clusters containing unknown genes, we annotated them with the most dominating functional category. To assess the prediction accuracy of our method, we employed a "leave-one-out" approach by masking a known gene to be unknown, and assign its function based on the remaining known genes in the cluster.
We have assigned functions to 448 known genes, and achieved a prediction accuracy of 50%.