KEGG for linking genomes to life and the environment

Minoru Kanehisa; Michihiro Araki; Susumu Goto; Masahiro Hattori; Mika Hirakawa; Masumi Itoh; Toshiaki Katayama; Shuichi Kawashima; Shujiro Okuda; Toshiaki Tokimatsu; Yoshihiro Yamanishi

doi:10.1093/nar/gkm882

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

DeepDyve requires Javascript to function. Please enable Javascript on your browser to continue.

KEGG for linking genomes to life and the environment

Kanehisa, Minoru; Araki, Michihiro; Goto, Susumu; Hattori, Masahiro; Hirakawa, Mika; Itoh, Masumi; Katayama, Toshiaki; Kawashima, Shuichi; Okuda, Shujiro; Tokimatsu, Toshiaki; Yamanishi, Yoshihiro 2008-01-12 00:00:00 D480–D484 Nucleic Acids Research, 2008, Vol. 36, Database issue Published online 12 December 2007 doi:10.1093/nar/gkm882 KEGG for linking genomes to life and the environment 1,2, 2 1 1 Minoru Kanehisa *, Michihiro Araki , Susumu Goto , Masahiro Hattori , 1,3 1 2 2 Mika Hirakawa , Masumi Itoh , Toshiaki Katayama , Shuichi Kawashima , 1 1 1 Shujiro Okuda , Toshiaki Tokimatsu and Yoshihiro Yamanishi 1 2 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108-8639 and Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Chiyoda-ku, Tokyo 102-8666, Japan Received September 13, 2007; Revised September 30, 2007; Accepted October 1, 2007 initiated for uncovering genomic information in an ABSTRACT extended sense, including transcriptome and proteome, KEGG (http://www.genome.jp/kegg/) is a database as well as metabolome, glycome and other genome- of biological systems that integrates genomic, encoded information. Together with traditional genome chemical and systemic functional information. sequencing for an increasing number of organisms, we KEGG provides a reference knowledge base for are beginning to understand the genomic space of possible linking genomes to life through the process of genes and proteins that make up the biological system. In contrast, we have very limited knowledge about the PATHWAY mapping, which is to map, for example, chemical space of possible chemical substances that a genomic or transcriptomic content of genes exists as an interface between the biological world and to KEGG reference pathways to infer systemic the natural world. This situation is rapidly changing behaviors of the cell or the organism. In addition, thanks to the chemical genomics initiatives for systematic KEGG provides a reference knowledge base for screening of biologically active chemical compounds and linking genomes to the environment, such as for the metagenomics initiatives giving insights into the the analysis of drug-target relationships, through chemical environment that interacts with and drives the process of BRITE mapping. KEGG BRITE evolution of the biological system. is an ontology database representing functional The KEGG project was initiated in 1995, coincidentally hierarchies of various biological objects, including when the ﬁrst genome of a free-living organism was molecules, cells, organisms, diseases and drugs, as completely sequenced (1). KEGG PATHWAY has since well as relationships among them. KEGG PATHWAY been utilized as a reference knowledge base for under- standing higher-level functions of cellular processes and is now supplemented with a new global map of organism behaviors from large-scale molecular data metabolic pathways, which is essentially a combi- sets. The addition of KEGG BRITE, a collection of ned map of about 120 existing pathway maps. In functional hierarchies with structured vocabularies, addition, smaller pathway modules are defined and signiﬁcantly increased our ability to represent and stored in KEGG MODULE that also contains other utilize higher-level functional information, especially to functional units and complexes. The KEGG resource integrate genomic and chemical (environmental) informa- is being expanded to suit the needs for practical tion (2). Here we report another new development applications. KEGG DRUG contains all approved in KEGG, the integration of research results and drugs in the US and Japan, and KEGG DISEASE is a practical values in medical, pharmaceutical and new database linking disease genes, pathways, environmental sciences. drugs and diagnostic markers. THE KEGG RESOURCE Overview INTRODUCTION Since the completion of the Human Genome Project, As of January 2008, KEGG comprises 19 databases, cate- high-throughput experimental projects have been gorized into systems information, genomic information *To whom correspondence should be addressed. Tel: +81 774 38 3270; Fax: +81 774 38 3269; Email: kanehisa@kuicr.kyoto-u.ac.jp 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2008, Vol. 36, Database issue D481 Table 1. KEGG databases Table 2. KEGG object identiﬁers Category Database Content Release Database Object identiﬁer Systems KEGG PATHWAY Pathway maps 1995 KEGG PATHWAY map number information KEGG BRITE Functional hierarchies KEGG GENOME organism code (T number) KEGG MODULE Pathway modules KEGG GENES locus_tag/NCBI GeneID (released January 2008) KEGG ENZYME EC number KEGG DISEASE Diseases (released KEGG COMPOUND C number January 2008) 2001 KEGG REACTION R number 2002 KEGG ORTHOLOGY K number Genomic KEGG ORTHOLOGY KEGG orthology 2003 KEGG GLYCAN G number information (KO) groups 2004 KEGG RPAIR A number KEGG GENOME KEGG organisms 2005 KEGG BRITE br number KEGG GENES Genes in high-quality KEGG DRUG D number genomes 2008 KEGG MODULE M number KEGG DGENES Genes in draft genomes KEGG DISEASE H number KEGG EGENES Genes as EST contigs KEGG VGENOME Viral genomes (to be fully See http://www.genome.jp/kegg/kegg3.html for details. integrated) KEGG VGENES Genes in viral genomes (to be fully integrated) KEGG OGENES Genes in organelle already many databases that are linked to/from KEGG. genomes (to be fully Such outside links will continue to be added to better integrated) integrate KEGG with various other web resources. KEGG SSDB Sequence similarities and best hit relations Genome annotation Chemical KEGG COMPOUND Metabolites and other information chemical compounds Genome annotation in KEGG assigns KO (KEGG KEGG DRUG Drugs Orthology) identiﬁers or K numbers to genes in a single KEGG GLYCAN Glycans KEGG ENZYME Enzymes genome or simultaneously to genes in multiple genomes. KEGG REACTION Enzymatic reactions With the addition or revision of a KEGG pathway map KEGG RPAIR Reactant pairs and chemical or BRITE hierarchy, KO groups (K numbers) are deﬁned transformations for the pathway nodes (boxes) or the hierarchy nodes (bottom leaves). Then the corresponding genes in selected organisms (usually in the literature) are manually anno- tated with the new K numbers, which are reﬂected in and chemical information as shown in Table 1. The six KEGG GENES. Thus, KEGG GENES can be used as a databases in the chemical information category are reference database for genome annotation. The number of collectively called KEGG LIGAND. The six databases KO groups has been increasing at a rate of about 2000 per in the lower part of the genomic information category are year, and it is now over 10 000. computationally generated, but all the other 13 databases The KO assignment is applied to a new genome as are manually curated. follows. First, the new genome is subject to SSDB The KEGG databases are highly integrated. In fact, computation, a comparison of protein coding genes KEGG should be viewed as a computer representation against all existing genomes by the SSEARCH program. of the biological system, where biological objects and The result is stored in KEGG SSDB containing sequence their relationships at the molecular, cellular and organism similarity scores and best-hit information for all gene levels are computerized as separate database entries. Each pairs. Then, computational KO assignment is done by the database entry, called a KEGG object, is given a unique KAAS-SSDB program, followed by manual veriﬁcation identiﬁer within KEGG. Table 2 summarizes the naming and additional assignment with the GFIT tool. An auto- convention of such KEGG object identiﬁers for the mated version of this genome annotation procedure is 13 core databases. Except for GENES and ENZYME made available as the KAAS web service (3), which that utilize the standard names of locus_tag and EC utilizes BLAST rather than SSEARCH for pairwise number, and for GENOME that distinguishes organisms genome comparisons. with 3–4 letter KEGG organism codes, the KEGG object The KO system is the basis for linking genomes to identiﬁer is a ﬁve-digit number preﬁxed by an upper-case biological systems through the process of pathway alphabet or a 2–4 letter code (map, br or organism code). mapping and BRITE mapping. For each organism in Examples are: C00047 for lysine, K04527 for insulin KEGG, organism-speciﬁc pathways and BRITE hierar- receptor and hsa05210 for colorectal cancer pathway. chies are computationally generated based on its assigned These identiﬁers may be used to directly obtain K numbers. Microarray gene expression proﬁle data may corresponding database entries with the ‘Get Entry’ then be mapped to these pathways and hierarchies to infer option in the KEGG website (http://www.genome.jp/ systemic functions of the cell or the organism. In addition kegg/). Interestingly, these identiﬁers may also be used to the hierarchies of genes and proteins (K numbers), in web search engines, such as Google and Yahoo, to KEGG BRITE contains the hierarchies of chemical obtain corresponding KEGG database entries. There are substances (C, D, G, R numbers) together with known D482 Nucleic Acids Research, 2008, Vol. 36, Database issue relationships to K numbers, such as ligand–receptor The other feature is KEGG MODULE, a new database interactions and drug–target relationships. By using that collects pathway modules and other functional units these relationships, the BRITE mapping will be improved as a set of K numbers. Pathway modules are smaller pieces to present clues for understanding the interactions with of subpathways (see the BRITE hierarchy ko00002), the environments. manually deﬁned as consecutive reaction steps, operon or other regulatory units, phylogenetic units obtained by Chemical annotation genome comparisons, etc. This new database also contains molecular complexes, facilitating better organization of The KO system can also be used for chemical annotation, data and knowledge, especially in KEGG BRITE. The which is the linking of genomic or transcriptomic contents hierarchy of molecular organization, such as the subunit of genes to chemical structures of endogenous molecules. organization of transporters or receptors, is represented This is achieved by ﬁner classiﬁcations of KO groups by the M number that corresponds to a set of K numbers. for speciﬁc classes of enzymes distinguishing diﬀerent Incidentally, a line segment in the new KEGG metabolism substrate speciﬁcity, as well as accumulating knowledge map that also corresponds to a set of K numbers is of biosynthetic pathways. For example, glycans are identiﬁed by the N number, representing a mechanistically synthesized by a series of reactions catalyzed by glycosyl- deﬁned network segment. transferases. With the KEGG pathway maps for glycan structures (map01030 and map01031) or the KEGG GLYCAN composite structure map (4), where edges KEGG for medical and pharmaceutical applications (glycosidic linkages) correspond to K numbers (glycosyl- transferase orthologs), the gene content in the genome As of September 2007, KEGG PATHWAY contains 26 can be converted to possible glycan structures. In a similar maps for human diseases, among which 19 were intro- but more sophisticated way, glycan structures can duced in the last 2 years. The disease pathway maps are be predicted from microarray gene expression data (5). classed in four subcategories: 6 as neurodegenerative The KEGG resource will be made suitable to cope with disorders (9), 3 as each of infectious diseases and the diversity of other molecules as well, including metabolic disorders and 14 as cancers. Although such polyketides/non-ribosomal peptides (6), polyunsaturated maps will continue to be added, they will never be fatty acids and terpenoids. suﬃcient to represent our knowledge of molecular Another type of chemical annotation is to characterize mechanisms of diseases because in many cases it is too biological meaning in the chemical structures of small fragmentary to represent as pathways. KEGG DISEASE molecules. As reported previously (2), the knowledge of is another addition to the KEGG suite of databases enzymatic reactions and associated chemical structure accumulating molecular-level knowledge on diseases transformations is stored in KEGG REACTION and including genes, drugs and biomarkers. Our current KEGG RPAIR. Each structure transformation is char- eﬀort is focused on the four subcategories of diseases acterized by the RDM pattern (7), and most of the mentioned above. patterns are found uniquely or preferentially in speciﬁc The number of entries in KEGG DRUG has also categories of KEGG pathways (8). This tendency was signiﬁcantly increased over the last 2 years, and now used to predict the metabolic fate of xenobiotic chemical covers all approved drugs in the US and Japan. KEGG compounds. Software for reaction/pathway prediction is DRUG is a structure-based database. Each entry is a being developed as an upgrade of e-zyme and PathComp unique chemical structure that is linked to standard in KEGG LIGAND. generic names, and is associated with eﬃcacy and target information as well as drug classiﬁcations. Target Enhancements to KEGG pathway information is presented in the context of KEGG path- KEGG PATHWAY has been signiﬁcantly expanded over ways and drug classiﬁcations are part of KEGG BRITE. the last 2 years with the addition of about 50 new pathway The generic names are linked to trade names and maps, mostly for signal transduction, cellular processes subsequently to outside resources of package insert and human diseases. However, the traditional KEGG information (patient information) whenever available. metabolic pathway maps are still most widely used This reﬂects our eﬀort to make KEGG more useful to including the KGML (KEGG XML) version. They are the general public. now supplemented with two new features introduced as a response to user feedback. The ﬁrst feature is a global map shown in Figure 1, which is created as an SVG ﬁle by manually combining about 120 existing maps. Each node ACCESSING KEGG (circle) is a chemical compound and each line (curved Via GenomeNet or straight) connecting two nodes is a series of reactions KEGG is made available as the major component of the (one to several reactions), which is also manually deﬁned Japanese GenomeNet service, operated by the Kyoto as a segment lacking branches. The new KEGG metabo- University Bioinformatics Center. The top pages of the lism map allows the user to view and compare the entire KEGG website (http://www.genome.jp/kegg/) have been metabolism, such as by mapping metagenomics data or microarray data. KGML users should also ﬁnd the changed for easier access to KGML, KEGG API and new KEGG metabolism map much easier to manipulate. KEGG FTP. Nucleic Acids Research, 2008, Vol. 36, Database issue D483 Figure 1. The new KEGG metabolism map created as an SVG ﬁle. D484 Nucleic Acids Research, 2008, Vol. 36, Database issue sequencing and assembly of Haemophilus inﬂuenzae Rd. Science, Via the new site 269, 496–512. 2. Kanehisa,M., Goto,S., Hattori,M., Aoki-Kinoshita,K.F., Itoh,M., Because the KEGG system has become so large and Kawashima,S., Katayama,T., Araki,M. and Hirakawa,M. (2006) complex, the entire package is being redesigned and is From genomics to chemical genomics: new developments in KEGG. presented at a new site (http://www.kegg.jp/) that Nucleic Acids Res., 34, D354–D357. currently contains a Japanese version only. 3. Moriya,Y., Itoh,M., Okuda,S., Yoshizawa,A. and Kanehisa,M. (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res., 35, W182–W185. ACKNOWLEDGEMENTS 4. Hashimoto,K., Goto,S., Kawano,S., Aoki-Kinoshita,K.F., Ueda,N., Hamajima,M., Kawasaki,T. and Kanehisa,M. (2006) THE KEGG project is supported by the Institute for KEGG as a glycome informatics resource. Glycobiology, 16, Bioinformatics Research and Development of the Japan 63R–70R. Science and Technology Agency, the 21st Century COE 5. Kawano,S., Hashimoto,K., Miyama,T., Goto,S. and Kanehisa,M. program ‘Genome Science’, and a grant-in-aid for (2005) Prediction of glycan structures from gene expression data based on glycosyltransferase reactions. Bioinformatics, 21, scientiﬁc research on the priority area ‘Comprehensive 3976–3982. Genomics’ from the Ministry of Education, Culture, 6. Minowa,Y., Araki,M. and Kanehisa,M. (2007) Comprehensive Sports, Science and Technology of Japan. The computa- analysis of distinctive polyketide and nonribosomal peptide tional resource was provided by the Bioinformatics structural motifs encoded in microbial genomes. J. Mol. Biol., 368, Center, Institute for Chemical Research, Kyoto 1500–1517. 7. Kotera,M., Okuno,Y., Hattori,M., Goto,S. and Kanehisa,M. (2004) University. Funding to pay the Open Access publication Computational assignment of the EC numbers for genomic-scale charges for this article was provided by the grant-in-aid analysis of enzymatic reactions. J. Am. Chem. Soc., 126, for scientiﬁc research. 16487–16498. 8. Oh,M., Yamada,T., Hattori,M., Goto,S. and Kanehisa,M. (2007) Conﬂict of interest statement. None declared. Systematic analysis of enzyme-catalyzed reaction patterns and prediction of microbial biodegradation pathways. J. Chem. Inf. Model., 47, 1702–1712. REFERENCES 9. Limviphuvadh,V., Tanaka,S., Goto,S., Ueda,K. and Kanehisa,M. (2007) The commonality of protein interaction networks determined 1. Fleischmann,R.D., Adams,M.D., White,O., Clayton,R.A., in neurodegenerative disorders (NDDs). Bioinformatics, 23, Kirkness,E.F., Kerlavage,A.R., Bult,C.J., Tomb,J.F., 2129–2138. Dougherty,B.A. et al. (1995) Whole-genome random http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press http://www.deepdyve.com/lp/oxford-university-press/kegg-for-linking-genomes-to-life-and-the-environment-938pUvbqiO

KEGG for linking genomes to life and the environment

Kanehisa, Minoru; Araki, Michihiro; Goto, Susumu; Hattori, Masahiro; Hirakawa, Mika; ... [+]

Nucleic Acids Research , Volume 36 (suppl_1) – Jan 12, 2008

Download PDF

Share Full Text for Free

5 pages

Loading...

Page 2

Loading...

Page 3

Loading...

Page 4

Loading...

Page 5

References (11)

Mina Oh, Takuji Yamada, M. Hattori, S. Goto, M. Kanehisa (2007)
Systematic Analysis of Enzyme-Catalyzed Reaction Patterns and Prediction of Microbial Biodegradation Pathways
Journal of chemical information and modeling, 47 4
Ranga Srinivasan, Qing Li, Xiaobo Zhou, Ju Lu, Jeff Lichtman, Stephen Wong (2010)
Reconstruction of the neuromuscular junction connectome
Bioinformatics, 26
Yohsuke Minowa, M. Araki, M. Kanehisa (2007)
Comprehensive analysis of distinctive polyketide and nonribosomal peptide structural motifs encoded in microbial genomes.
Journal of molecular biology, 368 5
K. Hashimoto, S. Goto, S. Kawano, Kiyoko Aoki-Kinoshita, Nobuhisa Ueda, Masami Hamajima, Toshisuke Kawasaki, M. Kanehisa (2006)
KEGG as a glycome informatics resource.
Glycobiology, 16 5
R. Fleischmann, M. Adams, O. White, R. Clayton, E. Kirkness, A. Kerlavage, C. Bult, J. Tomb, B. Dougherty, J. Merrick, K. Mckenney, G. Sutton, W. FitzHugh, C. Fields, Jeannie Gocyne, J. Scott, R. Shirley, Li-ing Liu, A. Glodek, J. Kelley, J. Weidman, C. Phillips, T. Spriggs, Eva Hedblom, M. Cotton (1995)
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd.
Science, 269 5223
(2008)
D484 Nucleic Acids Research
Vachiranee Limviphuvadh, Seigo Tanaka, S. Goto, K. Ueda, M. Kanehisa (2007)
The commonality of protein interaction networks determined in neurodegenerative disorders (NDDs)
Bioinformatics, 23 16
M. Kanehisa, S. Goto, M. Hattori, Kiyoko Aoki-Kinoshita, M. Itoh, S. Kawashima, Toshiaki Katayama, M. Araki, M. Hirakawa (2005)
From genomics to chemical genomics: new developments in KEGG
Nucleic Acids Research, 34
S. Kawano, K. Hashimoto, T. Miyama, S. Goto, M. Kanehisa (2005)
Prediction of glycan structures from gene expression data based on glycosyltransferase reactions
Bioinformatics, 21 21
Masaaki Kotera, Y. Okuno, M. Hattori, S. Goto, M. Kanehisa (2004)
Computational assignment of the EC numbers for genomic-scale analysis of enzymatic reactions.
Journal of the American Chemical Society, 126 50
(2007)
KAAS: an automatic genome annotation and pathway reconstruction server
Nucleic Acids Res., 35

Publisher: Oxford University Press
ISSN: 0305-1048
eISSN: 1362-4962
DOI: 10.1093/nar/gkm882
pmid: 18077471
Publisher site: See Article on Publisher Site

Abstract

D480–D484 Nucleic Acids Research, 2008, Vol. 36, Database issue Published online 12 December 2007 doi:10.1093/nar/gkm882 KEGG for linking genomes to life and the environment 1,2, 2 1 1 Minoru Kanehisa *, Michihiro Araki , Susumu Goto , Masahiro Hattori , 1,3 1 2 2 Mika Hirakawa , Masumi Itoh , Toshiaki Katayama , Shuichi Kawashima , 1 1 1 Shujiro Okuda , Toshiaki Tokimatsu and Yoshihiro Yamanishi 1 2 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108-8639 and Institute for Bioinformatics Research and Development, Japan Science and Technology Agency, Chiyoda-ku, Tokyo 102-8666, Japan Received September 13, 2007; Revised September 30, 2007; Accepted October 1, 2007 initiated for uncovering genomic information in an ABSTRACT extended sense, including transcriptome and proteome, KEGG (http://www.genome.jp/kegg/) is a database as well as metabolome, glycome and other genome- of biological systems that integrates genomic, encoded information. Together with traditional genome chemical and systemic functional information. sequencing for an increasing number of organisms, we KEGG provides a reference knowledge base for are beginning to understand the genomic space of possible linking genomes to life through the process of genes and proteins that make up the biological system. In contrast, we have very limited knowledge about the PATHWAY mapping, which is to map, for example, chemical space of possible chemical substances that a genomic or transcriptomic content of genes exists as an interface between the biological world and to KEGG reference pathways to infer systemic the natural world. This situation is rapidly changing behaviors of the cell or the organism. In addition, thanks to the chemical genomics initiatives for systematic KEGG provides a reference knowledge base for screening of biologically active chemical compounds and linking genomes to the environment, such as for the metagenomics initiatives giving insights into the the analysis of drug-target relationships, through chemical environment that interacts with and drives the process of BRITE mapping. KEGG BRITE evolution of the biological system. is an ontology database representing functional The KEGG project was initiated in 1995, coincidentally hierarchies of various biological objects, including when the ﬁrst genome of a free-living organism was molecules, cells, organisms, diseases and drugs, as completely sequenced (1). KEGG PATHWAY has since well as relationships among them. KEGG PATHWAY been utilized as a reference knowledge base for under- standing higher-level functions of cellular processes and is now supplemented with a new global map of organism behaviors from large-scale molecular data metabolic pathways, which is essentially a combi- sets. The addition of KEGG BRITE, a collection of ned map of about 120 existing pathway maps. In functional hierarchies with structured vocabularies, addition, smaller pathway modules are defined and signiﬁcantly increased our ability to represent and stored in KEGG MODULE that also contains other utilize higher-level functional information, especially to functional units and complexes. The KEGG resource integrate genomic and chemical (environmental) informa- is being expanded to suit the needs for practical tion (2). Here we report another new development applications. KEGG DRUG contains all approved in KEGG, the integration of research results and drugs in the US and Japan, and KEGG DISEASE is a practical values in medical, pharmaceutical and new database linking disease genes, pathways, environmental sciences. drugs and diagnostic markers. THE KEGG RESOURCE Overview INTRODUCTION Since the completion of the Human Genome Project, As of January 2008, KEGG comprises 19 databases, cate- high-throughput experimental projects have been gorized into systems information, genomic information *To whom correspondence should be addressed. Tel: +81 774 38 3270; Fax: +81 774 38 3269; Email: kanehisa@kuicr.kyoto-u.ac.jp 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2008, Vol. 36, Database issue D481 Table 1. KEGG databases Table 2. KEGG object identiﬁers Category Database Content Release Database Object identiﬁer Systems KEGG PATHWAY Pathway maps 1995 KEGG PATHWAY map number information KEGG BRITE Functional hierarchies KEGG GENOME organism code (T number) KEGG MODULE Pathway modules KEGG GENES locus_tag/NCBI GeneID (released January 2008) KEGG ENZYME EC number KEGG DISEASE Diseases (released KEGG COMPOUND C number January 2008) 2001 KEGG REACTION R number 2002 KEGG ORTHOLOGY K number Genomic KEGG ORTHOLOGY KEGG orthology 2003 KEGG GLYCAN G number information (KO) groups 2004 KEGG RPAIR A number KEGG GENOME KEGG organisms 2005 KEGG BRITE br number KEGG GENES Genes in high-quality KEGG DRUG D number genomes 2008 KEGG MODULE M number KEGG DGENES Genes in draft genomes KEGG DISEASE H number KEGG EGENES Genes as EST contigs KEGG VGENOME Viral genomes (to be fully See http://www.genome.jp/kegg/kegg3.html for details. integrated) KEGG VGENES Genes in viral genomes (to be fully integrated) KEGG OGENES Genes in organelle already many databases that are linked to/from KEGG. genomes (to be fully Such outside links will continue to be added to better integrated) integrate KEGG with various other web resources. KEGG SSDB Sequence similarities and best hit relations Genome annotation Chemical KEGG COMPOUND Metabolites and other information chemical compounds Genome annotation in KEGG assigns KO (KEGG KEGG DRUG Drugs Orthology) identiﬁers or K numbers to genes in a single KEGG GLYCAN Glycans KEGG ENZYME Enzymes genome or simultaneously to genes in multiple genomes. KEGG REACTION Enzymatic reactions With the addition or revision of a KEGG pathway map KEGG RPAIR Reactant pairs and chemical or BRITE hierarchy, KO groups (K numbers) are deﬁned transformations for the pathway nodes (boxes) or the hierarchy nodes (bottom leaves). Then the corresponding genes in selected organisms (usually in the literature) are manually anno- tated with the new K numbers, which are reﬂected in and chemical information as shown in Table 1. The six KEGG GENES. Thus, KEGG GENES can be used as a databases in the chemical information category are reference database for genome annotation. The number of collectively called KEGG LIGAND. The six databases KO groups has been increasing at a rate of about 2000 per in the lower part of the genomic information category are year, and it is now over 10 000. computationally generated, but all the other 13 databases The KO assignment is applied to a new genome as are manually curated. follows. First, the new genome is subject to SSDB The KEGG databases are highly integrated. In fact, computation, a comparison of protein coding genes KEGG should be viewed as a computer representation against all existing genomes by the SSEARCH program. of the biological system, where biological objects and The result is stored in KEGG SSDB containing sequence their relationships at the molecular, cellular and organism similarity scores and best-hit information for all gene levels are computerized as separate database entries. Each pairs. Then, computational KO assignment is done by the database entry, called a KEGG object, is given a unique KAAS-SSDB program, followed by manual veriﬁcation identiﬁer within KEGG. Table 2 summarizes the naming and additional assignment with the GFIT tool. An auto- convention of such KEGG object identiﬁers for the mated version of this genome annotation procedure is 13 core databases. Except for GENES and ENZYME made available as the KAAS web service (3), which that utilize the standard names of locus_tag and EC utilizes BLAST rather than SSEARCH for pairwise number, and for GENOME that distinguishes organisms genome comparisons. with 3–4 letter KEGG organism codes, the KEGG object The KO system is the basis for linking genomes to identiﬁer is a ﬁve-digit number preﬁxed by an upper-case biological systems through the process of pathway alphabet or a 2–4 letter code (map, br or organism code). mapping and BRITE mapping. For each organism in Examples are: C00047 for lysine, K04527 for insulin KEGG, organism-speciﬁc pathways and BRITE hierar- receptor and hsa05210 for colorectal cancer pathway. chies are computationally generated based on its assigned These identiﬁers may be used to directly obtain K numbers. Microarray gene expression proﬁle data may corresponding database entries with the ‘Get Entry’ then be mapped to these pathways and hierarchies to infer option in the KEGG website (http://www.genome.jp/ systemic functions of the cell or the organism. In addition kegg/). Interestingly, these identiﬁers may also be used to the hierarchies of genes and proteins (K numbers), in web search engines, such as Google and Yahoo, to KEGG BRITE contains the hierarchies of chemical obtain corresponding KEGG database entries. There are substances (C, D, G, R numbers) together with known D482 Nucleic Acids Research, 2008, Vol. 36, Database issue relationships to K numbers, such as ligand–receptor The other feature is KEGG MODULE, a new database interactions and drug–target relationships. By using that collects pathway modules and other functional units these relationships, the BRITE mapping will be improved as a set of K numbers. Pathway modules are smaller pieces to present clues for understanding the interactions with of subpathways (see the BRITE hierarchy ko00002), the environments. manually deﬁned as consecutive reaction steps, operon or other regulatory units, phylogenetic units obtained by Chemical annotation genome comparisons, etc. This new database also contains molecular complexes, facilitating better organization of The KO system can also be used for chemical annotation, data and knowledge, especially in KEGG BRITE. The which is the linking of genomic or transcriptomic contents hierarchy of molecular organization, such as the subunit of genes to chemical structures of endogenous molecules. organization of transporters or receptors, is represented This is achieved by ﬁner classiﬁcations of KO groups by the M number that corresponds to a set of K numbers. for speciﬁc classes of enzymes distinguishing diﬀerent Incidentally, a line segment in the new KEGG metabolism substrate speciﬁcity, as well as accumulating knowledge map that also corresponds to a set of K numbers is of biosynthetic pathways. For example, glycans are identiﬁed by the N number, representing a mechanistically synthesized by a series of reactions catalyzed by glycosyl- deﬁned network segment. transferases. With the KEGG pathway maps for glycan structures (map01030 and map01031) or the KEGG GLYCAN composite structure map (4), where edges KEGG for medical and pharmaceutical applications (glycosidic linkages) correspond to K numbers (glycosyl- transferase orthologs), the gene content in the genome As of September 2007, KEGG PATHWAY contains 26 can be converted to possible glycan structures. In a similar maps for human diseases, among which 19 were intro- but more sophisticated way, glycan structures can duced in the last 2 years. The disease pathway maps are be predicted from microarray gene expression data (5). classed in four subcategories: 6 as neurodegenerative The KEGG resource will be made suitable to cope with disorders (9), 3 as each of infectious diseases and the diversity of other molecules as well, including metabolic disorders and 14 as cancers. Although such polyketides/non-ribosomal peptides (6), polyunsaturated maps will continue to be added, they will never be fatty acids and terpenoids. suﬃcient to represent our knowledge of molecular Another type of chemical annotation is to characterize mechanisms of diseases because in many cases it is too biological meaning in the chemical structures of small fragmentary to represent as pathways. KEGG DISEASE molecules. As reported previously (2), the knowledge of is another addition to the KEGG suite of databases enzymatic reactions and associated chemical structure accumulating molecular-level knowledge on diseases transformations is stored in KEGG REACTION and including genes, drugs and biomarkers. Our current KEGG RPAIR. Each structure transformation is char- eﬀort is focused on the four subcategories of diseases acterized by the RDM pattern (7), and most of the mentioned above. patterns are found uniquely or preferentially in speciﬁc The number of entries in KEGG DRUG has also categories of KEGG pathways (8). This tendency was signiﬁcantly increased over the last 2 years, and now used to predict the metabolic fate of xenobiotic chemical covers all approved drugs in the US and Japan. KEGG compounds. Software for reaction/pathway prediction is DRUG is a structure-based database. Each entry is a being developed as an upgrade of e-zyme and PathComp unique chemical structure that is linked to standard in KEGG LIGAND. generic names, and is associated with eﬃcacy and target information as well as drug classiﬁcations. Target Enhancements to KEGG pathway information is presented in the context of KEGG path- KEGG PATHWAY has been signiﬁcantly expanded over ways and drug classiﬁcations are part of KEGG BRITE. the last 2 years with the addition of about 50 new pathway The generic names are linked to trade names and maps, mostly for signal transduction, cellular processes subsequently to outside resources of package insert and human diseases. However, the traditional KEGG information (patient information) whenever available. metabolic pathway maps are still most widely used This reﬂects our eﬀort to make KEGG more useful to including the KGML (KEGG XML) version. They are the general public. now supplemented with two new features introduced as a response to user feedback. The ﬁrst feature is a global map shown in Figure 1, which is created as an SVG ﬁle by manually combining about 120 existing maps. Each node ACCESSING KEGG (circle) is a chemical compound and each line (curved Via GenomeNet or straight) connecting two nodes is a series of reactions KEGG is made available as the major component of the (one to several reactions), which is also manually deﬁned Japanese GenomeNet service, operated by the Kyoto as a segment lacking branches. The new KEGG metabo- University Bioinformatics Center. The top pages of the lism map allows the user to view and compare the entire KEGG website (http://www.genome.jp/kegg/) have been metabolism, such as by mapping metagenomics data or microarray data. KGML users should also ﬁnd the changed for easier access to KGML, KEGG API and new KEGG metabolism map much easier to manipulate. KEGG FTP. Nucleic Acids Research, 2008, Vol. 36, Database issue D483 Figure 1. The new KEGG metabolism map created as an SVG ﬁle. D484 Nucleic Acids Research, 2008, Vol. 36, Database issue sequencing and assembly of Haemophilus inﬂuenzae Rd. Science, Via the new site 269, 496–512. 2. Kanehisa,M., Goto,S., Hattori,M., Aoki-Kinoshita,K.F., Itoh,M., Because the KEGG system has become so large and Kawashima,S., Katayama,T., Araki,M. and Hirakawa,M. (2006) complex, the entire package is being redesigned and is From genomics to chemical genomics: new developments in KEGG. presented at a new site (http://www.kegg.jp/) that Nucleic Acids Res., 34, D354–D357. currently contains a Japanese version only. 3. Moriya,Y., Itoh,M., Okuda,S., Yoshizawa,A. and Kanehisa,M. (2007) KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Res., 35, W182–W185. ACKNOWLEDGEMENTS 4. Hashimoto,K., Goto,S., Kawano,S., Aoki-Kinoshita,K.F., Ueda,N., Hamajima,M., Kawasaki,T. and Kanehisa,M. (2006) THE KEGG project is supported by the Institute for KEGG as a glycome informatics resource. Glycobiology, 16, Bioinformatics Research and Development of the Japan 63R–70R. Science and Technology Agency, the 21st Century COE 5. Kawano,S., Hashimoto,K., Miyama,T., Goto,S. and Kanehisa,M. program ‘Genome Science’, and a grant-in-aid for (2005) Prediction of glycan structures from gene expression data based on glycosyltransferase reactions. Bioinformatics, 21, scientiﬁc research on the priority area ‘Comprehensive 3976–3982. Genomics’ from the Ministry of Education, Culture, 6. Minowa,Y., Araki,M. and Kanehisa,M. (2007) Comprehensive Sports, Science and Technology of Japan. The computa- analysis of distinctive polyketide and nonribosomal peptide tional resource was provided by the Bioinformatics structural motifs encoded in microbial genomes. J. Mol. Biol., 368, Center, Institute for Chemical Research, Kyoto 1500–1517. 7. Kotera,M., Okuno,Y., Hattori,M., Goto,S. and Kanehisa,M. (2004) University. Funding to pay the Open Access publication Computational assignment of the EC numbers for genomic-scale charges for this article was provided by the grant-in-aid analysis of enzymatic reactions. J. Am. Chem. Soc., 126, for scientiﬁc research. 16487–16498. 8. Oh,M., Yamada,T., Hattori,M., Goto,S. and Kanehisa,M. (2007) Conﬂict of interest statement. None declared. Systematic analysis of enzyme-catalyzed reaction patterns and prediction of microbial biodegradation pathways. J. Chem. Inf. Model., 47, 1702–1712. REFERENCES 9. Limviphuvadh,V., Tanaka,S., Goto,S., Ueda,K. and Kanehisa,M. (2007) The commonality of protein interaction networks determined 1. Fleischmann,R.D., Adams,M.D., White,O., Clayton,R.A., in neurodegenerative disorders (NDDs). Bioinformatics, 23, Kirkness,E.F., Kerlavage,A.R., Bult,C.J., Tomb,J.F., 2129–2138. Dougherty,B.A. et al. (1995) Whole-genome random

Journal

Nucleic Acids Research – Oxford University Press

Published: Jan 12, 2008

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

KEGG for linking genomes to life and the environment

KEGG for linking genomes to life and the environment

Loading...

Loading...

Loading...

Loading...

References (11)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies