Access the full text.
Sign up today, get DeepDyve free for 14 days.
M. Eisen, P. Spellman, P. Brown, D. Botstein (1998)
Cluster analysis and display of genome-wide expression patterns.Proceedings of the National Academy of Sciences of the United States of America, 95 25
M. Keller, S. Addya, R. Vadigepalli, B. Banini, K. Delgrosso, Heshu Huang, S. Surrey (2006)
Transcriptional regulatory network analysis of developing human erythroid progenitors reveals patterns of coregulation and potential transcriptional regulators.Physiological genomics, 28 1
Franck Rapaport, A. Zinovyev, M. Dutreix, E. Barillot, Jean-Philippe Vert (2007)
Classification of microarray data using gene networksBMC Bioinformatics, 8
D. Wong, D. Nuyten, A. Regev, Meihong Lin, A. Adler, E. Segal, M. Vijver, Howard Chang (2008)
Revealing targeted therapy for human cancer by gene module maps.Cancer research, 68 2
Robert Delongchamp, Robert Delongchamp, Cruz Velasco, V. Desai, Taewon Lee, James Fuscoe (2007)
Designing Toxicogenomics Studies that use DNA Array TechnologyBioinformatics and Biology Insights, 2
(2007)
GSMA: gene set matrix analysis, an automated method for rapid hypothesis testing of gene expression data, 1
Arindam Bhattacharjee, William Richards, Jane Staunton, Cheng Li, Stefano Monti, Priya Vasa, C. Ladd, J. Beheshti, Raphael Bueno, Michael Gillette, Massimo Loda, G. Weber, Eugene Mark, Eric Lander, Wing Wong, Bruce Johnson, Todd Golub, D. Sugarbaker, M. Meyerson (2001)
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses.Proceedings of the National Academy of Sciences of the United States of America, 98 24
(2001)
Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses, 98
Vol. 24 no. 17 2008, pages 1957–1958 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btn357 Gene expression PathCluster: a framework for gene set-based hierarchical clustering 1 2 2 1 Tae-Min Kim , Seon-Hee Yim , Yong-Bok Jeong , Yu-Chae Jung and 1,2,∗ Yeun-Jun Chung 1 2 Department of Microbiology and Integrated Research Center for Genome Polymorphism, College of Medicine, The Catholic University of Korea, Seoul 137-701, Korea Received on March 4, 2008; revised on June 4, 2008; accepted on July 11, 2008 Advance Access publication July 15, 2008 Associate Editor: Olga Troyanskaya ABSTRACT result, which varies considerably according to the cluster methods and parameter settings. More importantly, the potential relationships Motivation: Gene clustering and gene set-based functional analysis between gene sets or clusters are difficult to identify in conventional are widely used for the analysis of expression profiles. The settings. The integration of a priori knowledge of gene set development of a comprehensive method jointly combining the two information in clustering may be an appropriate solution to these methods would allow for greater biological insights. problems (Rapaport et al., 2007); however, there are currently no Results: We developed a software package, PathCluster for gene available user-friendly tools that implement this alternate algorithm. set-based clustering via an agglomerative hierarchical clustering Thus, we developed a software package, PathCluster, which algorithm. The distances between predefined gene sets are illus- utilize an agglomerative hierarchical clustering algorithm for gene trated in a dendrogram in which the relationships between gene set-based clustering. In a given expression profile, the distance sets can be visually assessed. Valuable biological insights can be matrix is constructed between gene sets and illustrated as a obtained according to the type of gene sets, e.g. coordinated action dendrogram. The relationship between gene sets can be visually of molecular functions (functional gene sets) and putative motif assessed in the results, thereby facilitating the construction of an synergy (promoter gene set) in a biological process. The combined association map between diverse annotation categories. The related use of gene sets further enables the interrogation of different algorithms are implemented in a freely available software package. biological themes and their putative relationships, such as function- Major functionalities of PathCluster are summarized as follows: versus-regulatory motif or drug-versus-function. PathCluster can also be used for knowledge-based sample partitioning or class Gene set-based hierarchical clustering and visualization of the categorization for clinical purposes. With extended applicability, results with user-friendly graphic interface, PathCluster will facilitate the gleaning of meaningful biological insights and testable hypotheses in the contexts of given expression Identification of potential relationship between gene sets; profiles. putative interaction between molecular functions or synergism Availability: PathCluster executable files can be freely downloaded between regulatory motif sequences, at http://www.systemsbiology.co.kr/PathCluster/. Revealing previously unknown links between different Contact: [email protected] annotation categories in terms of gene sets; function-versus- regulatory motif or drug-versus-function, 1 BACKGROUND Function-based class categorization of disease samples. The objective of gene clustering is to group genes with similar expression patterns or that are expressed in a coordinated manner 2 HIERARCHICAL CLUSTERING OF GENE SETS (Eisen et al., 1998). Subsequent functional enrichment analysis Two strategies can be employed to determine the expression can provide clues as to which molecular functions or annotation similarities or distances between gene sets. First, individual gene categories are associated with individual gene clusters using sets can be scored for the mean expression of belonging genes biological knowledge. Despite its potential utility, the treatment of or enrichment scores derived from non-parametric (GSEA) or gene clusters as exclusive units may raise a number of practical parametric version (PAGE) of gene set enrichment algorithms concerns in subsequent functional analysis. For example, a large (Cheadle et al., 2007). The matrix of gene set scores with respect list of candidate functionalities is obtained as the number of to the samples can be used to calculate the gene set distance clusters increases, thus making it difficult to compare the results and hierarchical clustering. Alternatively, the distance between between clusters or to establish appropriate significance thresholds two gene sets can be calculated directly as a mean correlation considering multiple testing adjustments. Also, the performance of level of all possible gene pairs, each of which represents one enrichment analysis is profoundly dependent on prior clustering possible gene-to-gene match between corresponding gene sets. To whom correspondence should be addressed. When dealing with large gene sets and when the overlapping © 2008 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. T.-M.Kim et al. AB Fig. 1. Screenshots of PathCluster. (A) An example of analysis using publicly available expression profiles representing human erythroid differentiation (Keller et al., 2006). The dendrogram shows a clustering of immune-related functional annotations as well as signal-related functionalities and relevant sequence. (B) The function-based classification of human lung cancer samples (Bhattacharjee et al., 2001). Four histological subtypes of lung cancer samples (normal, NL; adenocarcinoma, AD; squamouse cell carcinoma, SQ; small cell carcinoma, SMCL) are distinguished at the gene set-based expression level. genes between gene sets have peculiar interests (especially the 3.2 Function-based sample classification case of promoter gene sets), the mean correlation can also be Knowledge-driven or function-based class categorization has calculated only for the gene pairs within overlapping genes between recently emerged as a highly challenging subject. This strategy gene sets. Detailed descriptions of the metrics utilized and examples has already been employed to identify the functional relationships are available in the online manual at the PathCluster homepage in a large cancer-derived expression compendium or to elucidate (http://www.systemsbiology.co.kr/PathCluster/Manual.pdf). drug-signature relationships for clinical benefits (Wong et al., PathCluster provides default gene sets covering four kinds of gene 2008). Adopting a user-friendly platform and extended reference annotation categories; molecular functions, the association with of gene sets, PathCluster provides a platform for the classification regulatory motifs corresponding to transcription factors or miRNA, or molecular diagnosis of clinical samples, also allowing for the as well as drug treatment-related expression changes. In addition, interrogation of diverse biological knowledge in terms of gene sets. gene sets from public databases such as MSigDB or user-defined Figure 1B shows that function-based classification can successfully custom query sets can be readily included in the gene set reference, distinguish between the three lung cancer subtypes, including in order to ensure the versatility of the method. normal tissues. In this cluster, eight cancer-related functions are specifically up-regulated in small cell lung cancer and squamous cell carcinoma of the lung. 3 BIOLOGICAL APPLICATION ACKNOWLEDGEMENTS 3.1 Associated molecular functions or regulatory motif This work is supported by the grant of the Korea Health 21 R&D sequences in a biological process Project, Ministry of Health & Welfare, Republic of Korea (0405- Using functional gene sets, PathCluster can identify the putative BC02-0604-0004) and (01-PJ3-PG6-01GN07-0004). associations between molecular functions, thereby providing clues on coordinated action of specific functions in a given expression Conflict of Interest: none declared. profile. Similarly, in the case of promoter gene sets, PathCluster can identify the putative motif synergy between cis-regulatory motifs or corresponding transcription factors delineating the regulatory REFERENCES crosstalks in a transcriptional regulatory network. Moreover, using Bhattacharjee,A. et al. (2001) Classification of human lung carcinomas by mRNA combined gene sets with different annotation categories, previously expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. USA, 98, 13790–13795. unknown, novel links can be revealed. In erythropoiesis-related Cheadle,C. et al. (2007) GSMA: gene set matrix analysis, an automated method for rapid expression profiles, a number of functionalities related with hypothesis testing of gene expression data. Bioinform. Biol. Insights, 1, 49–62. immunity and the major histocompatibility complex are observed in Eisen,M.B. et al. (1998) Cluster analysis and display of genome-wide expression a cluster (Fig. 1A). Within the cluster, signal-related functionalities patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868. Keller,M.A. et al. (2006) Transcriptional regulatory network analysis of developing (Ras protein signal transduction and MAPKKK cascade) as well as human erythroid progenitors reveals patterns of coregulation and potential sequence motifs corresponding transcription factors of GATA-1 and transcriptional regulators. Physiol. Genomics, 28, 114–128. c-Rel (a component of NK-κB) were also observed indicative of Rapaport,F. et al. (2007) Classification of microarray data using gene networks. BMC their potential interactions during erythropoiesis. This strategy can Bioinformatics, 8, 35. be also applied to other combinations of gene sets to reveal novel Wong,D.J. et al. (2008) Revealing targeted therapy for human cancer by gene module maps. Cancer Res., 68, 369–378. links between different biological themes such as function versus drug and function versus miRNA.
Bioinformatics – Pubmed Central
Published: Jul 15, 2008
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.