TY - JOUR AU - Davisson, Muriel T. AB - Abstract The Mouse Genome Database (MGD) is a comprehensive community database that integrates genetic, genomic and phenotypic information about the laboratory mouse. MGD provides detailed information about genes and genetic markers, elemental data from mapping experiments, descriptions of molecular segments including ESTs, probes, and cDNA clones, homology information between mouse and many other mammalian genomes, and phenotypic descriptions of gene mutations, gene function and mouse strains. All data are supported by citations. Interactive graphical displays of cytogenetic, genetic and physical maps are available. User support is provided through dedicated staff, bulletin boards, and user documentation. MGD can be accessed at http://www.informatics.jax.org Introduction The Mouse Genome Database focuses on the representation of elemental genetic, phenotypic and genomic data from studies of the laboratory mouse ( 1 ). MGD is one component of the Mouse Genome Informatics Project at The Jackson Laboratory, an effort that also includes the Gene Expression Database (GXD) ( 2 ) and development of a mouse Tumor database. Data acquisition and integration for MGD is a multifaceted process including manual curation of literature-derived data, bulk downloads of large datasets, electronic submissions, and scientific community reports. Until recently, MGD data acquisition has emphasized individual, peer-reviewed, research publications. Over 400 scientific journals are scanned for relevant articles, and primary research data are collected that include linkage analysis and mapping studies, details of probes, clones and PCR-amplification products, and synoptic descriptions of gene phenotypes, classifications and mutant alleles. Comparative mapping data for mouse and >60 mammalian species are maintained including evidence for homology assertions, gene localization in each species, links to primary organism databases, and supporting references. All information is integrated into MGD through the careful curation of gene nomenclature among the data sources. Increasingly, submissions and curation of electronic information is providing an additional source of information for MGD. Bulk downloads and integration of large datasets for the laboratory mouse include representation of DNA mapping panel crosses, MIT genetic and physical mapping data, I.M.A.G.E. clone and WashU EST information. Links from MGD to other electronic biological resources are established through incorporation of external database accession numbers. Community-contributed data include the Chromosome Committee Reports and some phenotype descriptions. MGD supports electronic submission of datasets in conjunction with the publication of a scientific paper. Electronic submission forms are also available for gene symbol nomenclature requests. The MGD homepage ( http://www.informatics.jax.org/mgd.html ) serves as the entry to the MGD Searches, Resources and Documentation. MGD is implemented in the Sybase relational database system. Public database access is provided through WWW interfaces and through direct SQL access. Current Status and General Enhancements Over the course of the last year, MGD has released versions 3.2 and 3.3. Each release incorporated significant enhancements to the database structure and public accessibility. There has been a marked increase in the data content within MGD as detailed below. The numbers below were gathered from database statistics as of September 1997. Genes and genetic markers The number of genes and genetic markers detailed within MGD continues to increase rising from 20 000 last year to >22 400 this year. This represents 19 768 mapped loci including >5800 mapped genes. Genetic markers in MGD include genes, chromosomal aberrations, QTLs, anonymous DNA segments and phenotypic mutations. Detailed genetic marker reports are available for all genes and genetic markers. Report details include alleles, history of the use of the symbol, links to experimental data involving this marker, references, and information about chromosomal location. A graphical mini-map shows the approximate location of the genetic marker of interest relative to other well-positioned markers on that chromosome. Figure 1 View largeDownload slide RFLP/PCR polymorphism search. Information on molecular polymorphisms is stored in the Molecular Probe and Segments records in MGD. In this example, the search query asks for all instances of polymorphisms recorded for the Polb (DNA polymerase beta) locus. The results show primers and two cDNA probes with the details of the detected polymorphisms. Figure 1 View largeDownload slide RFLP/PCR polymorphism search. Information on molecular polymorphisms is stored in the Molecular Probe and Segments records in MGD. In this example, the search query asks for all instances of polymorphisms recorded for the Polb (DNA polymerase beta) locus. The results show primers and two cDNA probes with the details of the detected polymorphisms. Molecular probes and segments There have been extensive changes in the molecular segments and probes section of the database. Information about mouse ESTs and I.M.A.G.E. cDNA clones has been added (see below) and MGD records for primers have been merged with those for probes and clones and ESTs. This enlarged dataset is called `Molecular Probes and Segments'. The search forms have changed to reflect the underlying database changes ( http://www.informatics.jax.org/probes.html ). A new RFLP/PCR Polymorphism query form has been added that allows searches for molecular polymorphisms related to a gene, chromosome or mouse strain ( Fig. 1 ). The polymorphism information is drawn from the Molecular Probes and Segments data. Mammalian homology Comparative genomics provides important information for scientists seeking to find a specific gene. Mouse orthologs of human genes are coveted since the mouse provides a controlled genetic background for further investigation of disease etiology. Over 2500 mouse/human homologies are currently found in MGD as well as a more limited number of homology assertions for >60 other mammalian species. Figure 2 View largeDownload slide Clickable linkage and cytogenetic maps and physical map display. Linkage and cytogenetic maps are constructed using Web-based interactive display tools. Genetic marker symbols on the maps are linked to detailed marker reports. The Linkage Map displayed here is chosen to display all mapped genetic markers excluding anonymous DNA markers between 0 cM and 12 cM on chromosome 5 along with any known human homologs. The Cytogenetic Map displays cytogenetic markers for chromosome 5. The chromosome is divided into cytogenetic bands and vertical bars indicate the range of bands in which the markers are placed. Cytogenetic mapping information is derived, for the most part, from in situ experiment records in MGD. The physical map shows MIT contigs set against a genetic map backdrop. The left side of the display shows the chromosome with centiMorgan (cM) positions. MIT SSLP markers are on the right associated with cM positions. To the right of the marker symbols, vertical lines represent the contigs for the physical map. A box appears on the contig for each SSLP marker typed on that contig with the red boxes indicating conflict between the genetic map order and physical map order for that marker. Figure 2 View largeDownload slide Clickable linkage and cytogenetic maps and physical map display. Linkage and cytogenetic maps are constructed using Web-based interactive display tools. Genetic marker symbols on the maps are linked to detailed marker reports. The Linkage Map displayed here is chosen to display all mapped genetic markers excluding anonymous DNA markers between 0 cM and 12 cM on chromosome 5 along with any known human homologs. The Cytogenetic Map displays cytogenetic markers for chromosome 5. The chromosome is divided into cytogenetic bands and vertical bars indicate the range of bands in which the markers are placed. Cytogenetic mapping information is derived, for the most part, from in situ experiment records in MGD. The physical map shows MIT contigs set against a genetic map backdrop. The left side of the display shows the chromosome with centiMorgan (cM) positions. MIT SSLP markers are on the right associated with cM positions. To the right of the marker symbols, vertical lines represent the contigs for the physical map. A box appears on the contig for each SSLP marker typed on that contig with the red boxes indicating conflict between the genetic map order and physical map order for that marker. MGD provides several search avenues to access homology information. In addition to the full homology query form that provides multiple search fields to formulate a query for homology between two or more species, there is an Oxford Grid query that retrieves an overview of homology between two selected species. Mammalian homologs can also be displayed as part of the detail for graphical map displays. Maps and mapping data MGD mapping data has expanded significantly within the last year. Four new DNA mapping panels have been added, seven others have been updated regularly. The MIT/Whitehead physical mapping data have been loaded (see below) and new Web-based interactive linkage and cytogenetic map displays are accessible through the Maps and Mapping Data search page ( http://www . informatics.jax.org/maptools.html, Fig. 2 ). Marker symbols on the maps are linked to MGD genetic marker detail screens. MGD currently has >12 100 mapping experiments represented. The largest portion of the data is from experimental crosses. Figure 3 View largeDownload slide EST links to MGD markers. Mouse ESTs are putatively associated with genetic markers in MGD through the curation method detailed in the text. In this figure, the Molecular Probes and Segments summary report for the gene Adn (Adipsin) show that many are ESTs putatively associated with the Adn gene. A detail of the EST's record for one of the molecular segments is displayed. Additional links from the EST detail report connect the user to further information about this EST. MGD does not store sequence data, but, instead, links to the sequence databases through incorporation and integration of external database accession numbers. Figure 3 View largeDownload slide EST links to MGD markers. Mouse ESTs are putatively associated with genetic markers in MGD through the curation method detailed in the text. In this figure, the Molecular Probes and Segments summary report for the gene Adn (Adipsin) show that many are ESTs putatively associated with the Adn gene. A detail of the EST's record for one of the molecular segments is displayed. Additional links from the EST detail report connect the user to further information about this EST. MGD does not store sequence data, but, instead, links to the sequence databases through incorporation and integration of external database accession numbers. Strains and phenotype information Source information recorded for molecular probes and segments includes species, strain, sex, age, tissue and cell line. The strain data are being normalized to facilitate further development of allele/strain representations within MGD. Currently, descriptions of mutant phenotypes are stored in text descriptions of the gene product function and phenotypes. Over 5000 genes are linked to phenotype entries. Significant Recent Enhancements and Additions Incorporation of EST data generated from I.M.A.G.E. clones Through collaborative efforts, MGD Release 3.3 includes EST data for ∼170 000 mouse ESTs sequenced by the WashU/HHMI project and data about their I.M.A.G.E. clones. The EST data consists of sequence accession identifiers, library source data and putative assignments with links to defined MGD gene information. ESTs are searchable using the Molecular Probes and Segments query form ( http://www.informatics.jax.org/probe.html ). The record for each EST includes a link to the MGD record for the clone from which it is derived as well as links to the sequence databases, ATCC and WashU. Figure 4 View largeDownload slide Search by accession ID. The accession ID query form enables searches for any database accession numbers captured in MGD including all MGD accession IDs and external database accession IDs associated with MGD data. Here a GenBank accession ID and a MEDLINE number are searched for simultaneously. The summary screen lists any records that include any of the queried accession IDs with links to the specific records. Figure 4 View largeDownload slide Search by accession ID. The accession ID query form enables searches for any database accession numbers captured in MGD including all MGD accession IDs and external database accession IDs associated with MGD data. Here a GenBank accession ID and a MEDLINE number are searched for simultaneously. The summary screen lists any records that include any of the queried accession IDs with links to the specific records. Integration of the ESTs by creating putative links to MGD genes was an automated process. The WashU putative assignment was parsed out. In all cases where EST names included a mouse GenBank accession number, that GenBank number was matched against MGD data. If the GenBank accession number was associated with a gene symbol in MGD, then that gene was assigned as the putative EST identifier ( Fig. 3 ). This procedure provided putative identification to ∼18% of the mouse ESTs (∼50% of the ESTs had been given putative assignments by WashU and, of those, 50% were associated with mouse genes). Figure 5 View largeDownload slide Description of inbred strain 129. This figure shows part of the textual record available for the mouse inbred strain 129. In addition to information about the derivation of the strain, substrains are detailed as known. Where available, specific information about complex traits such as behavior or spontaneous disease known for a specific strain or substrain is noted. Complete citations are available at the end of the record with links to the Reference details listing all strains reported in that reference. Figure 5 View largeDownload slide Description of inbred strain 129. This figure shows part of the textual record available for the mouse inbred strain 129. In addition to information about the derivation of the strain, substrains are detailed as known. Where available, specific information about complex traits such as behavior or spontaneous disease known for a specific strain or substrain is noted. Complete citations are available at the end of the record with links to the Reference details listing all strains reported in that reference. Electronic nomenclature symbol processing Biotechnology advances have resulted in the generation of massive amounts of sequence data that is only partially annotated. Naming partially-identified genes, or those identified by gene prediction programs or general motif similarities has confounded the naming of unique gene entities in the mouse genome. In addition, as more knowledge is gained about the function or evolutionary relationships among genes, scientists rename genes or redefine gene families. Recently the International Nomenclature Workshop was hosted by The Jackson Laboratory in an effort to bring together scientists annotating genome databases, sequence database managers, and nomenclature coordinators ( 3 ). As part of the effort to facilitate gene nomenclature assignments, MGD now supports the Locus Symbol Registry ( http://www.informatics.jax.org/doc/nomen.html ). The Locus Symbol Registry allows electronic submission of proposals for naming a new gene or locus. Mouse and human nomenclature coordinators work closely together to coordinate assignment of gene symbols between these two organisms. In addition, MGD is working with other genome databases to establish a gene registry of mouse, human and other species nomenclature information based at ATCC that would provide a single resource for researchers to check when they name and symbolize a new gene ( 3 ). Searching MGD by any curated accession number MGD has a new accession identifier (ID) format of the form MGI:#### where #### can be any number from 1 on up. All previously assigned accession numbers (e.g. MGD-MRK-#### or MGD-MRK-####) will continue to be supported. Reference IDs of the form J:### continue to be assigned and can be used to query the database from most query forms. An Accession ID query form provides access to information associated with any internal or external accession ID integrated into MGD ( Fig. 4 ). External databases with accession numbers curated within MGD include the Genome DataBase (GDB), the sequence databases (GenBank, GSDB, DDBJ, EMBL), SWISS-PROT, MEDLINE, ATCC and others. Incorporation of inbred mouse and rat strain characteristic data Descriptive information about inbred strains of mice and rats has been incorporated into MGD ( Fig. 5 ). These data, originating with M.F.W.Festing, provide information about the general characteristics of various inbred strains including susceptibility to certain diseases, physiology, etc. Such information is increasingly requested as scientists study complex traits and seek the utility of inbred strains. The reports are searchable via a text query. Electronic submission of chromosome committee reports Chromosome committees consist of mouse geneticists with particular interest in a given chromosome. Each committee annually submits a report that summarizes the current consensus of the organization of each chromosome. MGD has developed a Chromosome Committee Submission Interface that allows Chromosome Committee chairs to submit and edit their Chromosome Committee Reports electronically. All data can be viewed in the HTML format by the Committee Chairs prior to public release. This interface allows data submission throughout the year. Interactive physical maps and MIT physical mapping data MGD now integrates the mouse physical mapping data from the MIT/Whitehead Genome Center into the database. This information consists of contig maps of the mouse built using MIT markers and STSs mapped against a mouse YAC library. The elemental data describing which probe ‘hit’ which YAC is stored as well as STS order for each contig. MGD provides a new map display for the MIT physical mapping data ( Fig. 2 ) that links the markers to MGD marker records and the contigs to MGD mapping data records. This Web-based map tool also provides comparison of the MIT/Whitehead physical and genetic map order. The maps show contig location, MIT SSLP markers typed, and indicate by color where there are order conflicts between the genetic and physical map. Electronic links between MGD genes and SWISS-PROT protein records Recent work on interconnection of molecular databases with MGD has focused on connecting information on the genome (MGD) with information on gene products (SWISS-PROT). SWISS-PROT accession numbers have been linked to genes in MGD based on the shared curation of GenBank accession numbers. Users will be able to query for mouse genes in MGD by entering a SWISS-PROT number, and the MGD gene detail screen will provide links to the corresponding SWISS-PROT proteins detail screen. Enzyme Commission (E.C.) numbers also are being incorporated as links on the MGD marker detail screen as a result of this collaboration. The Future MGD is composed of phenotypic (strain, mutant phenotype and gene descriptions), genetic (mapping) and genomic (molecular segments, primers, links to sequences) information. GXD is focused on the display and annotation of embryonic expression data in the mouse. Both MGD and GXD share many entities in common including descriptions of genetic markers, identification of molecular segments and probes, and shared controlled vocabularies for everything from protein function to age and tissue designations. Biotechnological advances will permit extensive collection of expression data in the next few years. MGD and GXD are working together to integrate phenotypic, expression-related, structural and comparative information for the laboratory mouse, and to provide easy access to that data for the scientific community. To this end, MGD and GXD will be integrated into an overarching Mouse Genome Informatics (MGI) database resource. Other areas of imminent expansion are (i) integration of radiation hybrid maps with other MGD mapping information, (ii) more appropriate and robust handling of QTL data including allele effects and mapping information, and (iii) expansion of the physical mapping data to include and integrate other datasets into MGD. Addresses and User Support URLs and mirror sites MGD can be accessed at The Jackson Laboratory at URL http://www.informatics.jax.org . There are now four mirror sites around the world to provide users with faster local access to MGD. MGD is mirrored in the UK at: http://mgd.hgmp.mrc.ac.uk in Japan at: http://mgd.niai.affrc.go.jp/ in France at: http://www.pasteur.fr/Bio/MGD/ and in Australia at: http://mgd.wehi.edu.au:8080/ Additional mirror sites are being considered. Mirror sites have the option of downloading FTP update files on a nightly basis. Most sites update less often, but on a regular basis. Community outreach and user support MGD provides extensive user support through on-line documentation and easy email or phone access to User Support Staff. MGD staff attended over 35 meetings and symposia within the last year where posters, talks and demonstrations of the database were presented. User Support WWW access: http://www.informatics.jax.org/doc/support.html Email access: mgi-help@informatics.jax.org ; Telephone access: +1 207 288 6445; Fax access: +1 207 288 6132 MGI maintains an electronic bulletin board to promote communication among researchers on a variety of topics related to mouse genetic research and the MGI database resource. MGI-LIST has ∼900 subscribers. Subscriptions can be obtained through the WWW at http://www.informatics.jax.org/doc/lists.html Referencing MGD The following citation format is suggested when referring to specific datasets within MGD: Mouse Genome Database (MGD), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine. World Wide Web (URL: http://www.informatics.jax.org/ ). [Type in date (month, yr) when you retrieve the data cited.] To reference the database itself, please cite this article as well as others found at: http://www.informatics.jax.org/doc/citation.html Acknowledgement MGD and the Mouse Genome Informatics Project are supported by NIH grant HG00330. References 1 Blake J.A.,  Richardson J.E.,  Davisson M.T.,  Eppig J.T.,  the Mouse Genome Informatics Group. ,  Nucleic Acids Res. ,  1997, vol.  25 (pg.  85- 91) CrossRef Search ADS PubMed  2 Ringwald M.,  Baldock R.,  Bard J.,  Eppig J.T.,  Kaufmann M.,  Nadeau J.H.,  Richardson J.E.,  Davidson D.. ,  Science ,  1994, vol.  265 (pg.  2033- 2034) CrossRef Search ADS PubMed  3 Blake J.A.,  Davisson M.T.,  Eppig J.T.,  Maltais L.J.,  Povey S.,  White J.A.,  Womack J.E.. ,  Genomics ,  1997, vol.  45 (pg.  464- 468) CrossRef Search ADS PubMed  © 1998 Oxford University Press TI - The Mouse Genome Database (MGD): A community resource. Status and enhancements JF - Nucleic Acids Research DO - 10.1093/nar/26.1.130 DA - 1998-01-01 UR - https://www.deepdyve.com/lp/oxford-university-press/the-mouse-genome-database-mgd-a-community-resource-status-and-Xjk48XtFve SP - 130 EP - 137 VL - 26 IS - 1 DP - DeepDyve ER -