Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla

The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla Vol 449 | 27 September 2007 |doi:10.1038/nature06148 LETTERS The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla The French–Italian Public Consortium for Grapevine Genome Characterization* The analysis of the first plant genomes provided unexpected evid- All grapevine varieties are highly heterozygous; preliminary data ence for genome duplication events in species that had previously showed that there was as much as 13% sequence divergence between 1–3 been considered as true diploids on the basis of their genetics . alleles, which would hinder reliable contig assembly when a whole- genome shotgun strategy was used for sequencing. Our consortium These polyploidization events may have had important conse- quences in plant evolution, in particular for species radiation and therefore selected the grapevine PN40024 genotype for sequencing. 4–10 adaptation and for the modulation of functional capacities . Here This line, originally derived from Pinot Noir, has been bred close to we report a high-quality draft of the genome sequence of grapevine full homozygosity (estimated at about 93%) by successive selfings, (Vitis vinifera) obtained from a highly homozygous genotype. The permitting a high-quality whole-genome shotgun assembly. draft sequence of the grapevine genome is the fourth one produced A total of 6.2 million end-reads were produced by our consortium, so far for flowering plants, the second for a woody species and the representing an 8.4-fold coverage of the genome. Within the assem- first for a fruit crop (cultivated for both fruit and beverage). bly, performed with Arachne , 316 supercontigs represent putative Grapevine was selected because of its important place in the cul- allelic haplotypes that constitute 11.6 million bases (Mb). These tural heritage of humanity beginning during the Neolithic period . values are in good fit with the 7% residual heterozygosity of Several large expansions of gene families with roles in aromatic PN40024 assessed by using genetic markers. When considering only features are observed. The grapevine genome has not undergone one of the haplotypes in each heterozygous region, the assembly recent genome duplication, thus enabling the discovery of ancestral (Table 1a) consists of 19,577 contigs (N 5 65.9 kilobases (kb), traits and features of the genetic organization of flowering plants. where N corresponds to the size of the shorter supercontig or This analysis reveals the contribution of three ancestral genomes to contig in a subset representing half of the assembly size) and 3,514 the grapevine haploid content. This ancestral arrangement is com- supercontigs (N 5 2.07 Mb) totalling 487 Mb. This value is mon to many dicotyledonous plants but is absent from the genome close to the 475 Mb previously reported for the grapevine genome of rice, which is a monocotyledon. Furthermore, we explain the size . chronology of previously described whole-genome duplication Using a set of 409 molecular markers from the reference grapevine events in the evolution of flowering plants. map , 69% of the assembled 487 Mb, arranged into 45 ultracontigs Table 1 | Global statistics on the genome of Vitis vinifera (a) Assembly Status Number N (kb) Longest (kb) Size (Mb) Percentage of the assembly Contigs All 19,577 65.9 557 467.5 – Supercontigs All 3,514 2,065 12,675 487.1100 Anchored on chromosomes 191 3,189 12,675 335.668.9 Anchored on chromosomes 143 3,827 12,675 296.960.9 and oriented (b) Annotation Number Median size (bp) Total length (Mb) Percentage of the genome %GC Gene 30,434 3,399 225.646.336.2 Exons CDS 149,351 130 33.66.944.5 Introns CDS 118,917 213 178.636.734.7 Intergenic 30,453 3,544 261.534.733.0 tRNA* 600 73 0.04 NS 43.0 miRNA{ 164 103.50.002 NS 35.9 (c) Orthology Number of orthologous proteins Mean identity (%) P. trichocarpa 12,996 72.7 A. thaliana 11,404 65.5 O. sativa 9,731 59.8 Common to eudicotyledons{ 10,547 Common to Magnoliophyta1 8,121 * Transfer RNA (tRNA) values were computed on exons. { Micro RNAs (miRNAs) are members of known conserved miRNA families. { Eudicotyledons are represented by P. trichocarpa and A. thaliana. 1 Magnoliophyta (most flowering plants) are represented by P. trichocarpa, A. thaliana and O. sativa. *A list of participants and their affiliations appears at the end of the paper. © 2007Nature PublishingGroup LETTERS NATURE | Vol 449 | 27 September 2007 and 51 single supercontigs, were anchored along the 19 linkage homodimeric GPPS and the heterodimeric form are present; the groups. Thirty-seven ultracontigs and 22 single supercontigs were latter is present only in plants such as Mentha piperita and Clarkia oriented, representing 61% of the genome assembly (Supplemen- breweri, which produce large quantities of monoterpenes . Most of tary Tables 2 and 3). the STS and TPS genes occur as 20 clusters, including up to 33 para- logous genes located in a 680-kb stretch. This assembly has been annotated by using a combination of evid- ence. The major features of the genome annotation are presented in Because global duplication events seem to be a frequent event in Table 1b. The 8.4-fold draft sequence of the grapevine genome con- plant evolution , we searched the genome of V. vinifera for paralo- tains a set of 30,434 protein-coding genes (an average of 372 codons gous regions by using protein sequence similarity. Paralogous regions and 5 exons per gene). This value is considerably lower than the are defined as chromosome fragments in which homologous genes 45,555 protein-coding genes reported for the poplar (Populus tricho- are present in clusters. Statistical analysis of these clusters reveals carpa) genome, which has a similar size, at 485 Mb (ref. 1), and even that 94.5% have high probability of being paralogous (P, 10 ; lower than the 37,544 protein-coding genes identified in the 389 Mb Supplementary Table 11). Most Vitis gene regions have two different of the rice genome . paralogous regions, which we have grouped together as triplets Three different approaches revealed that 41.4% (average value) of (Supplementary Fig. 5; coverage details in Supplementary Table 10). We conclude that the present-day grapevine haploid genome the grapevine genome is composed of repetitive/transposable ele- ments (TEs), a slightly higher proportion than that identified in the originated from the contribution of three ancestral genomes. It is yet to be demonstrated whether this content came from a true hex- rice genome, which has a somewhat smaller size . The distribution of repeats and TEs along the chromosomes is quite uneven (see below). aploidization event or through successive genome duplications. The resulting plant had a diploid content that corresponds to the three All classes and superfamilies of TEs are represented in the grapevine genome, with a large prevalence of class I elements over class II and full diploid contents of the three ancestors; it may therefore be described as a ‘palaeo-hexaploid’ organism. A number of rearrange- helitrons (rolling-circle transposons) (Supplementary Table 7). An analysis of the distribution of the repetitive elements in the different ments have affected the original three complements after the forma- tion of the palaeo-hexaploid state. However, the gene order has been fractions of the grapevine genome based on the current annotation shows that introns are quite rich in repeats and TEs (data not shown). sufficiently conserved to permit the alignment of most regions with In addition, 12.4% of the intron sequence contains transposons as their two siblings. determined using our set of manually annotated elements, most of We explored the time of formation of the palaeo-hexaploid which (75%) correspond to LINE (long interspersed element) retro- arrangement by comparing grapevine gene regions with those of transposons, which therefore seem to have contributed specifically to other completely sequenced plant genomes. If the palaeo-hexaploid the intron size observed in grapevine (Supplementary Table 8). complement is present in another species, it should result in a one- In eukaryotes with large genomes, the coding and repeated ele- for-one pairing of gene regions between the two species considered. ments are distributed over the chromosomes and may be more or less In contrast, if another species’s genome evolved before palaeo- interlaced, hence defining gene-poor and gene-rich regions. It has hexaploid formation, it should result in a one-to-three relationship between the other species and the grapevine genome. The available previously been noticed that the distribution of the genes along 1 3 the chromosomes of rice and Arabidopsis thaliana is fairly homo- genome sequences were those of poplar , Arabidopsis and rice (Oryza 2,3 2 sativa ), of which poplar is considered to be most closely related to geneous . In contrast, we observe large regions that alternate between high and low gene density in V. vinifera (Supplementary grapevine. All clusters constructed between the orthologues in the three comparisons have P, 10 (Table 1c). When the gene order in Figs 2 and 3). As expected, the density of TEs reflects a pattern substantially complementary to gene density. We observe a similar poplar is compared with that in grapevine, there are two clear dis- tributions. First, the grapevine regions align with two poplar seg- characteristic in the genome sequence of poplar, therefore indicating a dynamic for the invasion of TEs that is shared with the grapevine ments, as would be expected from a recent whole-genome duplication (WGD) in the poplar lineage . Second, each of the three (Supplementary Fig. 3). A striking feature of the grapevine proteome lies in the existence of grapevine regions that form a homologous triplet recognizes differ- ent pairs of poplar segments (Fig. 1a and Supplementary Fig. 6). This large families related to wine characteristics, which have a higher gene copy number than in the other sequenced plants. Stilbene synthases shows that the palaeo-hexaploidy observed in grapevine was already present in its common ancestor with poplar. (STSs) drive the synthesis of resveratrol, the grapevine phytoalexin that has been associated with the health benefits associated with Poplar belongs to the Eurosid I clade. The sister clade to Eurosid I 15,16 moderate consumption of red wine . The family of genes encoding is that of Eurosid II, which contains the model species Arabidopsis. Its STSs has a noticeable expansion: 43 genes have been identified. Of gene order was compared with that in the grapevine genome. Two these, 20 have previously been shown to be expressed after infection distributions appear: first, most grapevine regions correspond to four by Plasmopara viticola, thus confirming that they are likely to be Arabidopsis segments (Supplementary Fig. 7); second, each compon- functional. The terpene synthases (TPSs) drive the synthesis of ent of a triplicated group in grapevine recognizes four different terpenoids; these secondary metabolites are major components of regions in Arabidopsis (Fig. 1b). This shows that the grapevine resins, essential oils and aromas (their relative abundance is directly palaeo-hexaploidy was present in the common ancestor to correlated with the aromatic features of wines ) and are involved in Arabidopsis and grapevine, and therefore that it is a trait common plant–environment interactions. In comparison with the 30–40 to all Eurosids. This is confirmed by the homology level distribution genes of this family in Arabidopsis, rice and poplar, the grapevine between paralogues of the grapevine, indicating a lower conservation TPS family is more than twice as large, with 89 functional genes and than between Vitis/Arabidopsis orthologues (Supplementary Fig. 4). 27 pseudogenes. Classification based on known plant homologues The Eurosid group contains many economically important flowering plants such as legumes, cotton and Brassicaceae. Our present results reveals that the subclass of putative monoterpene synthases repre- sents only 15% of the Arabidopsis TPS family whereas this subclass establish these species as having a palaeo-hexaploid common ancestor. The grapevine/Arabidopsis comparison also reveals that represents 40% of the grapevine TPS family. This result suggests a high diversification of grapevine monoterpene synthases that specif- the Arabidopsis lineage underwent two WGDs after its separation 21–24 ically produce C terpenoids present in aroma (such as geraniol, from the Eurosid I clade . This contradicts some models based linalool, cineole and a-terpineol). Furthermore, the grapevine gen- on more indirect evidence that placed the most ancient of these two 4,20–22 ome annotation has also revealed genes encoding homologues to the duplications at the base of the Eurosid group, or even earlier . two forms of geranyl diphosphate synthases (GPPSs), the enzymes Some studies had also suggested a possible third duplication event in that produce the substrate for monoterpene synthases: both the the distant past of the Arabidopsis lineage, potentially at the base of © 2007Nature PublishingGroup NATURE | Vol 449 |27 September 2007 LETTERS the angiosperm radiation. The controversy about this third event is Because rice is a monocotyledon, we assessed the presence or absence now resolved by the Vitis genome comparisons: this event corre- of palaeo-hexaploidy in its genome sequence. The observed pattern is sponds to the palaeo-hexaploidy formation that remains evident in the opposite of that seen for Arabidopsis and poplar: constituents of a the grapevine genome but has been difficult to characterize in grapevine triplet are generally orthologous to the same group of rice Arabidopsis and poplar because of the more recent WGDs. In par- regions (Fig. 1c and Supplementary Fig. 11). Because rice and grape- ticular, the Arabidopsis genome lineage has undergone many rear- vine are phylogenetically distant, it is more difficult to detect rela- rangements and chromosome fusions such that the ancestral gene tions of orthology across the two whole genomes: rearrangements, order is particularly difficult to deduce from this species (Fig. 2). duplication and gene loss have affected the gene orders differently in Grapevines, like Arabidopsis and poplar, are dicotyledonous plants the two lineages (Supplementary Fig. 10). Even with this limitation, 25,26 that diverged from monocotyledons about 130–240 Myr ago . we observed numerous cases of one-to-three relationships between a b Figure 1 | Comparison between three paralogous Vitis genomic regions and shows that the Arabidopsis/Vitis ancestor had the same palaeo-hexaploid their orthologues in P. trichocarpa, A. thaliana and O. sativa. Orthologous content. One Vitis region corresponds to four Arabidopsis segments, gene pairs are joined with a different colour for each of the three paralogous indicating the presence of two WGDs in the Arabidopsis lineage after grapevine chromosomes 6 (green), 8 (blue) and 13 (red). a, Orthologous separation from the Vitis lineage. c, Orthologous regions in rice are the same regions in the poplar genome are different for each of the three Vitis for the three paralogous chromosomes. This indicates that the triplication chromosomes, showing that the triplication predates the poplar/Vitis was not present in the common ancestor of monocotyledons and separation. One Vitis region recognizes two poplar segments because of a dicotyledons. The presence in rice of different homologous blocks is due to WGD in the poplar lineage after the separation. b, Orthologous regions with global duplications in the rice lineage after divergence from dicotyledons. Arabidopsis are different for each of the three Vitis chromosomes. This © 2007Nature PublishingGroup LETTERS NATURE | Vol 449 | 27 September 2007 ab c 1 2 3 4 5 6 7 8 9 10111213141516171819 12345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 12345 V. vinifera P. trichocarpa A. thaliana Figure 2 | Schematic representation of paralogous regions derived from The V. vinifera genome (a) is by far the closest to the ancestral arrangement, the three ancestral genomes in the karyotypes of V. vinifera, P. trichocarpa whereas that of Arabidopsis (c) is thoroughly rearranged, and P. trichocarpa and A. thaliana. Each colour corresponds to a syntenic region between the (b) presents an intermediate situation. The seven colours probably three ancestral genomes that were defined by their occurrence as linked correspond to linkage groups at the time of the palaeo-hexaploid ancestor. clusters in grapevine, independently of intrachromosomal rearrangements. rice and grapevine (Supplementary Figs 8, 9 and 11); 23% of ortho- this species, including domestication traits. A selective amplification logous blocks include the paralogous regions that originate from the of genes belonging to the metabolic pathways of terpenes and tannins has occurred in the grapevine genome, in contrast with other plant grapevine palaeo-hexaploidy. For Arabidopsis, this number is as low as 1.4% (this difference is significant at 5%: x 5 8.9; Supplementary genomes. This suggests that it may become possible to trace the diversity of wine flavours down to the genome level. Grapevine is Table 12), despite the fact that the Arabidopsis genome has suffered also a crop that is highly susceptible to a large diversity of pathogens many gene losses since its two WGDs. These gene losses would be including powdery mildew, oidium and Pierce disease. Other Vitis expected to obscure the orthologous relations with the grapevine species such as V. riparia or V. cinerea, which are known to be res- genome, but they are clearly insufficient to explain the high number istant to several of these pathogens, are interfertile with V. vinifera of one-to-three relationships observed in the rice–grapevine com- and can be used for the introduction of resistance traits by advanced parison. The most probable explanation for this excess is that the rice backcrosses or by gene transfer. Access to the Vitis sequence and the ancestor did not exhibit the palaeo-hexaploidy observed in the grape- exploitation of synteny will speed up this process of introgression of vine, poplar and Arabidopsis. pathogen resistance traits. As a consequence of this, it is hoped that it These findings are summarized in Fig. 3: the triplicated arrange- will also prompt a strong decrease in pesticide use. ment is apparent after the separation of the monocotyledons and The high quality of the assembly, due mainly to the highly homo- dicotyledons and before the spread of the Eurosid clade. Future gen- zygous nature of the PN40024 line, enables the discovery of three ome sequencing projects for other clades of dicotyledons, such as ancestral genomes constituting the diploid content of grapevine. The Solanaceae or basal eudicots, will help in situating the triplication Greek historian Thucydides wrote that Mediterranean people began event more precisely, and eventually in establishing its precise nature to emerge from ignorance when they learnt to cultivate olives and (hexaploidization or genome duplications at distant times). grapes. This first characterization of the grapevine genome, with its Public access to the grapevine genome sequence will help in the indication of a palaeo-hexaploid ancestral genome for many dico- identification of genes underlying the agricultural characteristics of tyledonous plants, addresses fundamental questions related to the origin and importance of this event in the history of flowering plants. Monocotyledons Dicotyledons Future work may help in correlating the differential fates of the three gene complements with phenotypic traits of dicotyledonous species. Eurosids I Eurosids II O. sativa P. trichocarpa V. vinifera A. thaliana METHODS SUMMARY Gene annotation. Protein-coding genes were predicted by combining ab initio models, V. vinifera complementary DNA alignments, and alignments of proteins and genomic DNA from other species. The integration of the data was performed with GAZE . Details are given in Supplementary Information. Paralogous and orthologous gene sets. Statistical testing of homologous regions was performed as described in ref. 21. Formation of the Full Methods and any associated references are available in the online version of palaeo-hexaploid the paper at www.nature.com/nature. genome Received 5 April; accepted 7 August 2007. Published online 26 August 2007. 1. Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Flowering plants Gray). Science 313, 1596–1604 (2006). 2. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005). Figure 3 | Positions of the polyploidization events in the evolution of plants 3. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering with a sequenced genome. Each star indicates a WGD (tetraploidization) plant Arabidopsis thaliana. Nature 408, 796–815 (2000). event on that branch. The question mark indicates that ancient events are 4. De Bodt, S., Maere, S. & Van de Peer, Y. Genome duplication and the origin of visible in the rice genome that would require other monocotyledon genome angiosperms. Trends Ecol. Evol. 20, 591–597 (2005). sequences to be resolved. The formation of the palaeo-hexaploid ancestral 5. Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S. & Wolfe, K. H. Multiple rounds genome occurred after divergence from monocotyledons and before the of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440, radiation of the Eurosids. 341–345 (2006). © 2007Nature PublishingGroup NATURE | Vol 449 |27 September 2007 LETTERS 6. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis was financially supported by Consortium National de Recherche en Ge´nomique, reveals the early vertebrate proto-karyotype. Nature 431, 946–957 (2004). Agence Nationale de la Recherche, INRA, and by MiPAF (VIGNA-CRA), Friuli 7. Aury, J. M. et al. Global trends of whole-genome duplications revealed by the Innovazione, Universita` di Udine, Federazione BCC, Fondazione CRUP, Fondazione ciliate Paramecium tetraurelia. Nature 444, 171–178 (2006). Carigo, Fondazione CRT, Vivai Cooperativi Rauscedo, Eurotech, Livio Felluga, 8. Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl Marco Felluga, Venica e Venica, Le Vigne di Zamo` (IGA). We thank S. Cure for Acad. Sci. USA 102, 5454–5459 (2005). correcting the manuscript; F. Caˆmara and R. Guigo for the calibration of the GeneID 9. Blanc, G. & Wolfe, K. H. Functional divergence of duplicated genes formed by gene prediction software, and the Centre Informatique National de l’Enseignement polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691 (2004). Supe´rieur for computing resources. 10. Seoighe, C. & Gehring, C. Genome duplication led to highly selective expansion of Author Information The final assembly and annotation are deposited in the EMBL/ the Arabidopsis thaliana proteome. Trends Genet. 20, 461–464 (2004). Genbank/DDBJ databases under accession numbers CU459218–CU462737 (for 11. McGovern, P. E., Hartung, U., Badler, V., Glusker, D. L. & Exner, L. J. The beginnings all scaffolds) and CU462738–CU462772 (for chromosome reconstitutions and of wine making and viniculture in the anciant Near East and Egypt. Expedition 39, unanchored scaffolds). An annotation browser and further information on the 3–21 (1997). project are available from http://www.genoscope.cns.fr/vitis, http:// 12. Jaffe, D. B. et al. Whole-genome sequence assembly for mammalian genomes: www.vitisgenome.it/ and http://www.appliedgenomics.org/. Reprints and Arachne 2. Genome Res. 13, 91–96 (2003). permissions information is available at www.nature.com/reprints. The authors 13. Lodhi, M. A., Daly, M. J., Ye, G. N., Weeden, N. F. & Reisch, B. I. A molecular marker declare no competing financial interests. Correspondence and requests for based linkage map of Vitis. Genome 38, 786–794 (1995). materials should be addressed to P.W. ([email protected]). 14. Doligez, A. et al. An integrated SSR map of grapevine based on five mapping populations. Theor. Appl. Genet. 113, 369–382 (2006). 15. Baur, J. A. et al. Resveratrol improves health and survival of mice on a high-calorie diet. Nature 444, 337–342 (2006). 16. Baur, J. A. & Sinclair, D. A. Therapeutic potential of resveratrol: the in vivo The French-Italian Public Consortium for Grapevine Genome Characterization 1 1 1 2,3 evidence. Nature Rev. Drug Discov. 5, 493–506 (2006). Olivier Jaillon *, Jean-Marc Aury *, Benjamin Noel , Alberto Policriti , Christian 4 2,5 1,4 4 17. Mateo, J. J. & Jimenez, M. Monoterpenes in grape juice and wines. J. Chromatogr. A Clepet , Alberto Casagrande , Nathalie Choisne ,Se´bastien Aubourg , Nicola 6,15 1 6,15 7 8 881, 557–567 (2000). Vitulo , Claire Jubin , Alessandro Vezzi , Fabrice Legeai , Philippe Hugueney , 1 9,15 9,15 4 1 18. Aubourg, S., Lecharny, A. & Bohlmann, J. Genomic analysis of the terpenoid Corinne Dasilva , David Horner , Erica Mica , Delphine Jublot , Julie Poulain , 4 1 1 1 synthase (AtTPS) gene family of Arabidopsis thaliana. Mol. Genet. Genomics 267, Cle´mence Bruye`re , Alain Billault ,Be´atrice Segurens , Michel Gouyvenoux , Edgardo 1 2 1 1 730–745 (2002). Ugarte , Federica Cattonaro ,Ve´ronique Anthouard , Virginie Vico , Cristian Del 2,3 7 2,5 8 2,5 19. Tholl, D. et al. Formation of monoterpenes in Antirrhinum majus and Clarkia breweri Fabbro , Michae¨l Alaux , Gabriele Di Gaspero ,Vincent Dumas , Nicoletta Felice , 4 2,5 4 2,3 flowers involves heterodimeric geranyl diphosphate synthases. Plant Cell 16, Sophie Paillard , Irena Juman , Marco Moroldo , Simone Scalabrin , Aure´lie 4 4 6,15 7 977–992 (2004). Canaguier , Isabelle Le Clainche , Giorgio Malacrida ,Ele´onore Durand , Graziano 10,11,15 12 13 8 20. Adams, K. L. & Wendel, J. F. Polyploidy and genome evolution in plants. Curr. Opin. Pesole , Vale´rie Laucou , Philippe Chatelet , Didier Merdinoglu , Massimo 14,15 15,16 4 1 Plant Biol. 8, 135–141 (2005). Delledonne , Mario Pezzotti , Alain Lecharny , Claude Scarpelli , Franc¸ois 1 9,15 6,15 2,5 21. Simillion, C., Vandepoele, K., Van Montagu, M. C., Zabeau, M. & Van de Peer, Y. Artiguenave , M. Enrico Pe` , Giorgio Valle , Michele Morgante , Michel 4 4 1 1 The hidden duplication past of Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 99, Caboche , Anne-Franc¸oise Adam-Blondon , Jean Weissenbach , Francis Que´tier & 13627–13632 (2002). Patrick Wincker 22. Bowers, J. E., Chapman, B. A., Rong, J. & Paterson, A. H. Unravelling angiosperm *These authors contributed equally to this work. genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 (2003). Affiliations for participants: Genoscope (CEA) and UMR 8030 23. Vision, T. J., Brown, D. G. & Tanksley, S. D. The origins of genomic duplications in CNRS-Genoscope-Universite´ d’Evry, 2 rue Gaston Cre´mieux, BP5706, 91057 Evry, Arabidopsis. Science 290, 2114–2117 (2000). France. Istituto di Genomica Applicata, Parco Scientifico e Tecnologico di Udine, Via 24. Blanc, G., Hokamp, K. & Wolfe, K. H. A recent polyploidy superimposed on older 3 Linussio 51, 33100 Udine, Italy. Dipartimento di Matematica ed Informatica, Universita` large-scale duplications in the Arabidopsis genome. Genome Res. 13, 137–144 degli Studi di Udine, via delle Scienze 208, 33100 Udine, Italy. URGV, UMR INRA 1165, (2003). ´ ´ ´ ´ CNRS-Universite d’Evry Genomique Vegetale, 2 rue Gaston Cremieux, BP5708, 91057 25. Wolfe, K. H., Gouy, M., Yang, Y. W., Sharp, P. M. & Li, W. H. Date of the Evry cedex, France. Dipartimento di Scienze Agrarie ed Ambientali, Universita` degli monocot–dicot divergence estimated from chloroplast DNA sequence data. Proc. Studi di Udine, via delle Scienze 208, 33100 Udine, Italy. CRIBI, Universita degli Studi di Natl Acad. Sci. USA 86, 6201–6205 (1989). Padova, viale G. Colombo 3, 35121 Padova, Italy. URGI, UR1164 Ge´nomique Info, 523, 26. Crane, P. R., Friis, E. M. & Pedersen, K. R. The origin and early diversification of Place des Terrasses, 91034 Evry Cedex, France. UMR INRA 1131, Universite´ de angiosperms. Nature 374, 27–33 (1995). Strasbourg, Sante´ de la Vigne et Qualite´ du Vin, 28 rue de Herrlisheim, BP20507, 68021 27. Eshed, Y. & Zamir, D. An introgression line population of Lycopersicon pennellii in Colmar, France. Dipartimento di Scienze Biomolecolari e Biotecnologie, Universita` degli the cultivated tomato enables the identification and fine mapping of yield- 10 Studi di Milano, via Celoria 26, 20133 Milano, Italy. Dipartimento di Biochimica e associated QTL. Genetics 141, 1147–1162 (1995). Biologia Molecolare, Universita` degli Studi di Bari, via Orabona 4, 70125 Bari, Italy. 28. Howe, K. L., Chothia, T. & Durbin, R. GAZE: a generic framework for the 11 Istituto Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, via Amendola 122/ integration of gene-prediction data by dynamic programming. Genome Res. 12, D, 70125 Bari, Italy. UMR INRA 1097, IRD-Montpellier SupAgro-Univ. Montpellier II, 1418–1427 (2002). Diversite´ et Adaptation des Plantes Cultive´es, 2 Place Pierre Viala, 34060 Montpellier Cedex 1, France. UMR INRA 1098, IRD-Montpellier SupAgro-CIRAD, De´veloppement Supplementary Information is linked to the online version of the paper at et Ame´lioration des Plantes, 2 Place Pierre Viala, 34060 Montpellier Cedex 1, France. www.nature.com/nature. Dipartimento Scientifico e Tecnologico, Universita` degli Studi di Verona Strada Le Acknowledgements The sequencing of the grapevine genome was launched and Grazie 15 – Ca’ Vignal, 37134 Verona, Italy. Dipartimento di Scienze, Tecnologie e carried out after a scientific cooperation agreement between the Ministry of Mercati della Vite e del Vino, Universita` degli Studi di Verona, via della Pieve, 70 37029 S. Agriculture in France and the Ministry of Agriculture in Italy, involving l’Institut Floriano (VR), Italy. VIGNA-CRA Initiative; Consorzio Interuniversitario Nazionale per National de la Recherche Agronomique (INRA), Consiglio per la Ricerca e la Biologia Molecolare delle Piante, c/o Universita` degli Studi di Siena, via Banchi di Sotto Sperimentazione in Agricoltura (CRA) and Friuli Venezia Giulia Region. This work 55, 53100 Siena, Italy. © 2007Nature PublishingGroup doi:10.1038/nature06148 Paralogous and orthologous gene sets. We identified orthologous genes in METHODS six pairs of genomes from four species: A. thaliana, O. sativa, P. trichocarpa Genome sequencing. The V. vinifera PN40024 genome was sequenced with the and V. vinifera. Each pair of predicted gene sets was aligned with the Smith– use of a whole-genome shotgun strategy. All data were generated by paired-end Waterman algorithm, and alignments with a score higher than 300 (BLOSUM62; sequencing of cloned inserts using Sanger technology on ABI3730xl sequencers. gapo5 10, gape5 1) were retained. Two genes, A from genome GA and B from Supplementary Table 2 gives the number of reads obtained per library. genome GB, were considered orthologues if B was the best match for gene A in Genome assembly and chromosome anchoring. All reads were assembled with 12 GB and A was the best match for B in GA. Arachne . We obtained 20,784 contigs that were linked into 3,830 supercontigs For each orthologous gene set with V. vinifera, clusters of orthologous genes of more than 2 kb. The contig N was 64 kb, and the supercontig N was 1.9 Mb. 50 50 were generated. A single linkage clustering with a euclidean distance was used to The total supercontig size was 498 Mb, remarkably close to the expected size of group genes. The distances were calculated with the gene index in each chro- 475 Mb. This indicates that the PN40024 has retained few heterozygous regions. mosome rather than the genomic position. The minimal distance between two Remaining heterozygosity was assessed by aligning all supercontigs with each orthologous genes was adapted in accordance with the selected genomes. Finally, other. We first selected the supercontigs more than 30 kb in size that were we retained only clusters that were composed of at least six genes for Arabidopsis covered over more than 40% of their length by another supercontig with more and O. sativa, and eight genes for P. trichocarpa (Supplementary Table 10). than 95% identity. After visual inspection of the alignments, we added to this list To validate the clustering quality we used a method described previously . For the supercontigs more than 10 kb in size that aligned at more than 40% of their each cluster we computed the probability of finding this cluster in the gene length with supercontigs identified previously. All potential cases were then homology matrix (Supplementary Table 11). This matrix was constructed from inspected visually to discard potential heterozygous regions (aligning relatively two compared chromosomes with genes numbered according to their position homogeneously across their complete length) and retained repeated regions on each chromosome, with no reference to physical distances. (with more heterogeneous alignments). This treatment identified 11 Mb of Paralogous genes were computed by comparing all-against-all of V. vinifera potentially allelic supercontigs. We confirmed that in most cases their coverage proteins by using blastp, and alignments with an expected value of less than 0.1 was about half the average of the homozygous supercontigs. Only one super- were retained and realigned with the Smith–Waterman algorithm . Two genes A contig of each allelic pair was therefore conserved in the final assembly, which and B were considered paralogues if B was the best match for gene A and A was consists of 3,514 supercontigs (N 5 2 Mb) containing 19,577 contigs the best match for B. Moreover, clusters of paralogous genes were constructed in (N 5 66 kb), totalling 487 Mb. If the haploid genome size of 475 Mb is con- the same fashion as orthologous clusters (Supplementary Table 10). sidered correct, then our final assembly contains only about 12 Mb of remaining heterozygosity, or 2.6%. 29. Adam-Blondon, A. F. et al. Construction and characterization of BAC libraries A set of 30,151 bacterial artificial chromosome (BAC) fingerprints of the BAC from major grapevine cultivars. Theor. Appl. Genet. 110, 1363–1371 (2005). clones of a Cabernet–Sauvignon library were assembled into 1,763 contigs with 30. Soderlund, C., Humphray, S., Dunham, A. & French, L. Contigs built with 30 31 FPC , v. 8. In parallel, 1,981 markers were anchored on a subset of BAC clones , fingerprints, markers, and FPC V4.7. Genome Res. 10, 1772–1787 (2000). among which 388 markers mapped onto the genetic map, and 77,237 BAC end 31. Lamoureux, D. et al. Anchoring of a large set of markers onto a BAC library for the 31 32 development of a draft physical map of the grapevine genome. Theor. Appl. Genet. sequences were obtained . Blat alignments (90% identity on 80% of the length, 113, 344–356 (2006). fewer than five hits) were performed with BAC end sequences on the 3,830 32. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 supercontigs of sequences with lengths over 2 kb. The results were then filtered (2002). with homemade Perl scripts to keep only the occurrences in which two paired 33. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in ends were matching at a distance of less than 300 kb and with a consistent large genomes. Bioinformatics 21 (Suppl. 1), i351–i358 (2005). orientation. Two supercontigs were considered linked to each other if two 34. Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide BAC links could be found or one BAC link and a BAC contig link. A total number analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 of 111 ultracontigs were constructed with this procedure. (2000). Genome annotation. Several resources were used to build V. vinifera gene mod- 35. Jaillon, O. et al. Genome-wide analyses based on comparative genomics. Cold Spring Harb. Symp. Quant. Biol. 68, 275–282 (2003). els automatically with GAZE . We used predictions of repetitive regions by 33 34,35 36. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, repeatscout , conserved coding regions predicted by the exofish method , 36 37 38 39 988–995 (2004). genewise alignments of proteins from Uniprot , Geneid and Snap ab initio 37. Bairoch, A. et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, gene predictions, and alignments of several cDNA resources (Supplementary D154–D159 (2005). Information). 38. Parra, G., Blanco, E. & Guigo, R. GeneID in Drosophila. Genome Res. 10, 511–515 A weight was assigned to each resource to further reflect its reliability and (2000). accuracy in predicting gene models. This weight acts as a multiplier for the score 39. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004). of each information source, before being processed by GAZE. When applied to 40. Smith, T. F. & Waterman, M. S. Identification of common molecular the entire assembled sequence, GAZE predicted 30,434 gene models. subsequences. J. Mol. Biol. 147, 195–197 (1981). © 2007Nature PublishingGroup http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nature Springer Journals

The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla

Loading next page...
 
/lp/springer-journals/the-grapevine-genome-sequence-suggests-ancestral-hexaploidization-in-xRX09aPcrw

References (43)

Publisher
Springer Journals
Copyright
Copyright © 2007 by The Author(s)
Subject
Science, Humanities and Social Sciences, multidisciplinary; Science, Humanities and Social Sciences, multidisciplinary; Science, multidisciplinary
ISSN
0028-0836
eISSN
1476-4687
DOI
10.1038/nature06148
Publisher site
See Article on Publisher Site

Abstract

Vol 449 | 27 September 2007 |doi:10.1038/nature06148 LETTERS The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla The French–Italian Public Consortium for Grapevine Genome Characterization* The analysis of the first plant genomes provided unexpected evid- All grapevine varieties are highly heterozygous; preliminary data ence for genome duplication events in species that had previously showed that there was as much as 13% sequence divergence between 1–3 been considered as true diploids on the basis of their genetics . alleles, which would hinder reliable contig assembly when a whole- genome shotgun strategy was used for sequencing. Our consortium These polyploidization events may have had important conse- quences in plant evolution, in particular for species radiation and therefore selected the grapevine PN40024 genotype for sequencing. 4–10 adaptation and for the modulation of functional capacities . Here This line, originally derived from Pinot Noir, has been bred close to we report a high-quality draft of the genome sequence of grapevine full homozygosity (estimated at about 93%) by successive selfings, (Vitis vinifera) obtained from a highly homozygous genotype. The permitting a high-quality whole-genome shotgun assembly. draft sequence of the grapevine genome is the fourth one produced A total of 6.2 million end-reads were produced by our consortium, so far for flowering plants, the second for a woody species and the representing an 8.4-fold coverage of the genome. Within the assem- first for a fruit crop (cultivated for both fruit and beverage). bly, performed with Arachne , 316 supercontigs represent putative Grapevine was selected because of its important place in the cul- allelic haplotypes that constitute 11.6 million bases (Mb). These tural heritage of humanity beginning during the Neolithic period . values are in good fit with the 7% residual heterozygosity of Several large expansions of gene families with roles in aromatic PN40024 assessed by using genetic markers. When considering only features are observed. The grapevine genome has not undergone one of the haplotypes in each heterozygous region, the assembly recent genome duplication, thus enabling the discovery of ancestral (Table 1a) consists of 19,577 contigs (N 5 65.9 kilobases (kb), traits and features of the genetic organization of flowering plants. where N corresponds to the size of the shorter supercontig or This analysis reveals the contribution of three ancestral genomes to contig in a subset representing half of the assembly size) and 3,514 the grapevine haploid content. This ancestral arrangement is com- supercontigs (N 5 2.07 Mb) totalling 487 Mb. This value is mon to many dicotyledonous plants but is absent from the genome close to the 475 Mb previously reported for the grapevine genome of rice, which is a monocotyledon. Furthermore, we explain the size . chronology of previously described whole-genome duplication Using a set of 409 molecular markers from the reference grapevine events in the evolution of flowering plants. map , 69% of the assembled 487 Mb, arranged into 45 ultracontigs Table 1 | Global statistics on the genome of Vitis vinifera (a) Assembly Status Number N (kb) Longest (kb) Size (Mb) Percentage of the assembly Contigs All 19,577 65.9 557 467.5 – Supercontigs All 3,514 2,065 12,675 487.1100 Anchored on chromosomes 191 3,189 12,675 335.668.9 Anchored on chromosomes 143 3,827 12,675 296.960.9 and oriented (b) Annotation Number Median size (bp) Total length (Mb) Percentage of the genome %GC Gene 30,434 3,399 225.646.336.2 Exons CDS 149,351 130 33.66.944.5 Introns CDS 118,917 213 178.636.734.7 Intergenic 30,453 3,544 261.534.733.0 tRNA* 600 73 0.04 NS 43.0 miRNA{ 164 103.50.002 NS 35.9 (c) Orthology Number of orthologous proteins Mean identity (%) P. trichocarpa 12,996 72.7 A. thaliana 11,404 65.5 O. sativa 9,731 59.8 Common to eudicotyledons{ 10,547 Common to Magnoliophyta1 8,121 * Transfer RNA (tRNA) values were computed on exons. { Micro RNAs (miRNAs) are members of known conserved miRNA families. { Eudicotyledons are represented by P. trichocarpa and A. thaliana. 1 Magnoliophyta (most flowering plants) are represented by P. trichocarpa, A. thaliana and O. sativa. *A list of participants and their affiliations appears at the end of the paper. © 2007Nature PublishingGroup LETTERS NATURE | Vol 449 | 27 September 2007 and 51 single supercontigs, were anchored along the 19 linkage homodimeric GPPS and the heterodimeric form are present; the groups. Thirty-seven ultracontigs and 22 single supercontigs were latter is present only in plants such as Mentha piperita and Clarkia oriented, representing 61% of the genome assembly (Supplemen- breweri, which produce large quantities of monoterpenes . Most of tary Tables 2 and 3). the STS and TPS genes occur as 20 clusters, including up to 33 para- logous genes located in a 680-kb stretch. This assembly has been annotated by using a combination of evid- ence. The major features of the genome annotation are presented in Because global duplication events seem to be a frequent event in Table 1b. The 8.4-fold draft sequence of the grapevine genome con- plant evolution , we searched the genome of V. vinifera for paralo- tains a set of 30,434 protein-coding genes (an average of 372 codons gous regions by using protein sequence similarity. Paralogous regions and 5 exons per gene). This value is considerably lower than the are defined as chromosome fragments in which homologous genes 45,555 protein-coding genes reported for the poplar (Populus tricho- are present in clusters. Statistical analysis of these clusters reveals carpa) genome, which has a similar size, at 485 Mb (ref. 1), and even that 94.5% have high probability of being paralogous (P, 10 ; lower than the 37,544 protein-coding genes identified in the 389 Mb Supplementary Table 11). Most Vitis gene regions have two different of the rice genome . paralogous regions, which we have grouped together as triplets Three different approaches revealed that 41.4% (average value) of (Supplementary Fig. 5; coverage details in Supplementary Table 10). We conclude that the present-day grapevine haploid genome the grapevine genome is composed of repetitive/transposable ele- ments (TEs), a slightly higher proportion than that identified in the originated from the contribution of three ancestral genomes. It is yet to be demonstrated whether this content came from a true hex- rice genome, which has a somewhat smaller size . The distribution of repeats and TEs along the chromosomes is quite uneven (see below). aploidization event or through successive genome duplications. The resulting plant had a diploid content that corresponds to the three All classes and superfamilies of TEs are represented in the grapevine genome, with a large prevalence of class I elements over class II and full diploid contents of the three ancestors; it may therefore be described as a ‘palaeo-hexaploid’ organism. A number of rearrange- helitrons (rolling-circle transposons) (Supplementary Table 7). An analysis of the distribution of the repetitive elements in the different ments have affected the original three complements after the forma- tion of the palaeo-hexaploid state. However, the gene order has been fractions of the grapevine genome based on the current annotation shows that introns are quite rich in repeats and TEs (data not shown). sufficiently conserved to permit the alignment of most regions with In addition, 12.4% of the intron sequence contains transposons as their two siblings. determined using our set of manually annotated elements, most of We explored the time of formation of the palaeo-hexaploid which (75%) correspond to LINE (long interspersed element) retro- arrangement by comparing grapevine gene regions with those of transposons, which therefore seem to have contributed specifically to other completely sequenced plant genomes. If the palaeo-hexaploid the intron size observed in grapevine (Supplementary Table 8). complement is present in another species, it should result in a one- In eukaryotes with large genomes, the coding and repeated ele- for-one pairing of gene regions between the two species considered. ments are distributed over the chromosomes and may be more or less In contrast, if another species’s genome evolved before palaeo- interlaced, hence defining gene-poor and gene-rich regions. It has hexaploid formation, it should result in a one-to-three relationship between the other species and the grapevine genome. The available previously been noticed that the distribution of the genes along 1 3 the chromosomes of rice and Arabidopsis thaliana is fairly homo- genome sequences were those of poplar , Arabidopsis and rice (Oryza 2,3 2 sativa ), of which poplar is considered to be most closely related to geneous . In contrast, we observe large regions that alternate between high and low gene density in V. vinifera (Supplementary grapevine. All clusters constructed between the orthologues in the three comparisons have P, 10 (Table 1c). When the gene order in Figs 2 and 3). As expected, the density of TEs reflects a pattern substantially complementary to gene density. We observe a similar poplar is compared with that in grapevine, there are two clear dis- tributions. First, the grapevine regions align with two poplar seg- characteristic in the genome sequence of poplar, therefore indicating a dynamic for the invasion of TEs that is shared with the grapevine ments, as would be expected from a recent whole-genome duplication (WGD) in the poplar lineage . Second, each of the three (Supplementary Fig. 3). A striking feature of the grapevine proteome lies in the existence of grapevine regions that form a homologous triplet recognizes differ- ent pairs of poplar segments (Fig. 1a and Supplementary Fig. 6). This large families related to wine characteristics, which have a higher gene copy number than in the other sequenced plants. Stilbene synthases shows that the palaeo-hexaploidy observed in grapevine was already present in its common ancestor with poplar. (STSs) drive the synthesis of resveratrol, the grapevine phytoalexin that has been associated with the health benefits associated with Poplar belongs to the Eurosid I clade. The sister clade to Eurosid I 15,16 moderate consumption of red wine . The family of genes encoding is that of Eurosid II, which contains the model species Arabidopsis. Its STSs has a noticeable expansion: 43 genes have been identified. Of gene order was compared with that in the grapevine genome. Two these, 20 have previously been shown to be expressed after infection distributions appear: first, most grapevine regions correspond to four by Plasmopara viticola, thus confirming that they are likely to be Arabidopsis segments (Supplementary Fig. 7); second, each compon- functional. The terpene synthases (TPSs) drive the synthesis of ent of a triplicated group in grapevine recognizes four different terpenoids; these secondary metabolites are major components of regions in Arabidopsis (Fig. 1b). This shows that the grapevine resins, essential oils and aromas (their relative abundance is directly palaeo-hexaploidy was present in the common ancestor to correlated with the aromatic features of wines ) and are involved in Arabidopsis and grapevine, and therefore that it is a trait common plant–environment interactions. In comparison with the 30–40 to all Eurosids. This is confirmed by the homology level distribution genes of this family in Arabidopsis, rice and poplar, the grapevine between paralogues of the grapevine, indicating a lower conservation TPS family is more than twice as large, with 89 functional genes and than between Vitis/Arabidopsis orthologues (Supplementary Fig. 4). 27 pseudogenes. Classification based on known plant homologues The Eurosid group contains many economically important flowering plants such as legumes, cotton and Brassicaceae. Our present results reveals that the subclass of putative monoterpene synthases repre- sents only 15% of the Arabidopsis TPS family whereas this subclass establish these species as having a palaeo-hexaploid common ancestor. The grapevine/Arabidopsis comparison also reveals that represents 40% of the grapevine TPS family. This result suggests a high diversification of grapevine monoterpene synthases that specif- the Arabidopsis lineage underwent two WGDs after its separation 21–24 ically produce C terpenoids present in aroma (such as geraniol, from the Eurosid I clade . This contradicts some models based linalool, cineole and a-terpineol). Furthermore, the grapevine gen- on more indirect evidence that placed the most ancient of these two 4,20–22 ome annotation has also revealed genes encoding homologues to the duplications at the base of the Eurosid group, or even earlier . two forms of geranyl diphosphate synthases (GPPSs), the enzymes Some studies had also suggested a possible third duplication event in that produce the substrate for monoterpene synthases: both the the distant past of the Arabidopsis lineage, potentially at the base of © 2007Nature PublishingGroup NATURE | Vol 449 |27 September 2007 LETTERS the angiosperm radiation. The controversy about this third event is Because rice is a monocotyledon, we assessed the presence or absence now resolved by the Vitis genome comparisons: this event corre- of palaeo-hexaploidy in its genome sequence. The observed pattern is sponds to the palaeo-hexaploidy formation that remains evident in the opposite of that seen for Arabidopsis and poplar: constituents of a the grapevine genome but has been difficult to characterize in grapevine triplet are generally orthologous to the same group of rice Arabidopsis and poplar because of the more recent WGDs. In par- regions (Fig. 1c and Supplementary Fig. 11). Because rice and grape- ticular, the Arabidopsis genome lineage has undergone many rear- vine are phylogenetically distant, it is more difficult to detect rela- rangements and chromosome fusions such that the ancestral gene tions of orthology across the two whole genomes: rearrangements, order is particularly difficult to deduce from this species (Fig. 2). duplication and gene loss have affected the gene orders differently in Grapevines, like Arabidopsis and poplar, are dicotyledonous plants the two lineages (Supplementary Fig. 10). Even with this limitation, 25,26 that diverged from monocotyledons about 130–240 Myr ago . we observed numerous cases of one-to-three relationships between a b Figure 1 | Comparison between three paralogous Vitis genomic regions and shows that the Arabidopsis/Vitis ancestor had the same palaeo-hexaploid their orthologues in P. trichocarpa, A. thaliana and O. sativa. Orthologous content. One Vitis region corresponds to four Arabidopsis segments, gene pairs are joined with a different colour for each of the three paralogous indicating the presence of two WGDs in the Arabidopsis lineage after grapevine chromosomes 6 (green), 8 (blue) and 13 (red). a, Orthologous separation from the Vitis lineage. c, Orthologous regions in rice are the same regions in the poplar genome are different for each of the three Vitis for the three paralogous chromosomes. This indicates that the triplication chromosomes, showing that the triplication predates the poplar/Vitis was not present in the common ancestor of monocotyledons and separation. One Vitis region recognizes two poplar segments because of a dicotyledons. The presence in rice of different homologous blocks is due to WGD in the poplar lineage after the separation. b, Orthologous regions with global duplications in the rice lineage after divergence from dicotyledons. Arabidopsis are different for each of the three Vitis chromosomes. This © 2007Nature PublishingGroup LETTERS NATURE | Vol 449 | 27 September 2007 ab c 1 2 3 4 5 6 7 8 9 10111213141516171819 12345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 12345 V. vinifera P. trichocarpa A. thaliana Figure 2 | Schematic representation of paralogous regions derived from The V. vinifera genome (a) is by far the closest to the ancestral arrangement, the three ancestral genomes in the karyotypes of V. vinifera, P. trichocarpa whereas that of Arabidopsis (c) is thoroughly rearranged, and P. trichocarpa and A. thaliana. Each colour corresponds to a syntenic region between the (b) presents an intermediate situation. The seven colours probably three ancestral genomes that were defined by their occurrence as linked correspond to linkage groups at the time of the palaeo-hexaploid ancestor. clusters in grapevine, independently of intrachromosomal rearrangements. rice and grapevine (Supplementary Figs 8, 9 and 11); 23% of ortho- this species, including domestication traits. A selective amplification logous blocks include the paralogous regions that originate from the of genes belonging to the metabolic pathways of terpenes and tannins has occurred in the grapevine genome, in contrast with other plant grapevine palaeo-hexaploidy. For Arabidopsis, this number is as low as 1.4% (this difference is significant at 5%: x 5 8.9; Supplementary genomes. This suggests that it may become possible to trace the diversity of wine flavours down to the genome level. Grapevine is Table 12), despite the fact that the Arabidopsis genome has suffered also a crop that is highly susceptible to a large diversity of pathogens many gene losses since its two WGDs. These gene losses would be including powdery mildew, oidium and Pierce disease. Other Vitis expected to obscure the orthologous relations with the grapevine species such as V. riparia or V. cinerea, which are known to be res- genome, but they are clearly insufficient to explain the high number istant to several of these pathogens, are interfertile with V. vinifera of one-to-three relationships observed in the rice–grapevine com- and can be used for the introduction of resistance traits by advanced parison. The most probable explanation for this excess is that the rice backcrosses or by gene transfer. Access to the Vitis sequence and the ancestor did not exhibit the palaeo-hexaploidy observed in the grape- exploitation of synteny will speed up this process of introgression of vine, poplar and Arabidopsis. pathogen resistance traits. As a consequence of this, it is hoped that it These findings are summarized in Fig. 3: the triplicated arrange- will also prompt a strong decrease in pesticide use. ment is apparent after the separation of the monocotyledons and The high quality of the assembly, due mainly to the highly homo- dicotyledons and before the spread of the Eurosid clade. Future gen- zygous nature of the PN40024 line, enables the discovery of three ome sequencing projects for other clades of dicotyledons, such as ancestral genomes constituting the diploid content of grapevine. The Solanaceae or basal eudicots, will help in situating the triplication Greek historian Thucydides wrote that Mediterranean people began event more precisely, and eventually in establishing its precise nature to emerge from ignorance when they learnt to cultivate olives and (hexaploidization or genome duplications at distant times). grapes. This first characterization of the grapevine genome, with its Public access to the grapevine genome sequence will help in the indication of a palaeo-hexaploid ancestral genome for many dico- identification of genes underlying the agricultural characteristics of tyledonous plants, addresses fundamental questions related to the origin and importance of this event in the history of flowering plants. Monocotyledons Dicotyledons Future work may help in correlating the differential fates of the three gene complements with phenotypic traits of dicotyledonous species. Eurosids I Eurosids II O. sativa P. trichocarpa V. vinifera A. thaliana METHODS SUMMARY Gene annotation. Protein-coding genes were predicted by combining ab initio models, V. vinifera complementary DNA alignments, and alignments of proteins and genomic DNA from other species. The integration of the data was performed with GAZE . Details are given in Supplementary Information. Paralogous and orthologous gene sets. Statistical testing of homologous regions was performed as described in ref. 21. Formation of the Full Methods and any associated references are available in the online version of palaeo-hexaploid the paper at www.nature.com/nature. genome Received 5 April; accepted 7 August 2007. Published online 26 August 2007. 1. Tuskan, G. A. et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Flowering plants Gray). Science 313, 1596–1604 (2006). 2. International Rice Genome Sequencing Project. The map-based sequence of the rice genome. Nature 436, 793–800 (2005). Figure 3 | Positions of the polyploidization events in the evolution of plants 3. Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering with a sequenced genome. Each star indicates a WGD (tetraploidization) plant Arabidopsis thaliana. Nature 408, 796–815 (2000). event on that branch. The question mark indicates that ancient events are 4. De Bodt, S., Maere, S. & Van de Peer, Y. Genome duplication and the origin of visible in the rice genome that would require other monocotyledon genome angiosperms. Trends Ecol. Evol. 20, 591–597 (2005). sequences to be resolved. The formation of the palaeo-hexaploid ancestral 5. Scannell, D. R., Byrne, K. P., Gordon, J. L., Wong, S. & Wolfe, K. H. Multiple rounds genome occurred after divergence from monocotyledons and before the of speciation associated with reciprocal gene loss in polyploid yeasts. Nature 440, radiation of the Eurosids. 341–345 (2006). © 2007Nature PublishingGroup NATURE | Vol 449 |27 September 2007 LETTERS 6. Jaillon, O. et al. Genome duplication in the teleost fish Tetraodon nigroviridis was financially supported by Consortium National de Recherche en Ge´nomique, reveals the early vertebrate proto-karyotype. Nature 431, 946–957 (2004). Agence Nationale de la Recherche, INRA, and by MiPAF (VIGNA-CRA), Friuli 7. Aury, J. M. et al. Global trends of whole-genome duplications revealed by the Innovazione, Universita` di Udine, Federazione BCC, Fondazione CRUP, Fondazione ciliate Paramecium tetraurelia. Nature 444, 171–178 (2006). Carigo, Fondazione CRT, Vivai Cooperativi Rauscedo, Eurotech, Livio Felluga, 8. Maere, S. et al. Modeling gene and genome duplications in eukaryotes. Proc. Natl Marco Felluga, Venica e Venica, Le Vigne di Zamo` (IGA). We thank S. Cure for Acad. Sci. USA 102, 5454–5459 (2005). correcting the manuscript; F. Caˆmara and R. Guigo for the calibration of the GeneID 9. Blanc, G. & Wolfe, K. H. Functional divergence of duplicated genes formed by gene prediction software, and the Centre Informatique National de l’Enseignement polyploidy during Arabidopsis evolution. Plant Cell 16, 1679–1691 (2004). Supe´rieur for computing resources. 10. Seoighe, C. & Gehring, C. Genome duplication led to highly selective expansion of Author Information The final assembly and annotation are deposited in the EMBL/ the Arabidopsis thaliana proteome. Trends Genet. 20, 461–464 (2004). Genbank/DDBJ databases under accession numbers CU459218–CU462737 (for 11. McGovern, P. E., Hartung, U., Badler, V., Glusker, D. L. & Exner, L. J. The beginnings all scaffolds) and CU462738–CU462772 (for chromosome reconstitutions and of wine making and viniculture in the anciant Near East and Egypt. Expedition 39, unanchored scaffolds). An annotation browser and further information on the 3–21 (1997). project are available from http://www.genoscope.cns.fr/vitis, http:// 12. Jaffe, D. B. et al. Whole-genome sequence assembly for mammalian genomes: www.vitisgenome.it/ and http://www.appliedgenomics.org/. Reprints and Arachne 2. Genome Res. 13, 91–96 (2003). permissions information is available at www.nature.com/reprints. The authors 13. Lodhi, M. A., Daly, M. J., Ye, G. N., Weeden, N. F. & Reisch, B. I. A molecular marker declare no competing financial interests. Correspondence and requests for based linkage map of Vitis. Genome 38, 786–794 (1995). materials should be addressed to P.W. ([email protected]). 14. Doligez, A. et al. An integrated SSR map of grapevine based on five mapping populations. Theor. Appl. Genet. 113, 369–382 (2006). 15. Baur, J. A. et al. Resveratrol improves health and survival of mice on a high-calorie diet. Nature 444, 337–342 (2006). 16. Baur, J. A. & Sinclair, D. A. Therapeutic potential of resveratrol: the in vivo The French-Italian Public Consortium for Grapevine Genome Characterization 1 1 1 2,3 evidence. Nature Rev. Drug Discov. 5, 493–506 (2006). Olivier Jaillon *, Jean-Marc Aury *, Benjamin Noel , Alberto Policriti , Christian 4 2,5 1,4 4 17. Mateo, J. J. & Jimenez, M. Monoterpenes in grape juice and wines. J. Chromatogr. A Clepet , Alberto Casagrande , Nathalie Choisne ,Se´bastien Aubourg , Nicola 6,15 1 6,15 7 8 881, 557–567 (2000). Vitulo , Claire Jubin , Alessandro Vezzi , Fabrice Legeai , Philippe Hugueney , 1 9,15 9,15 4 1 18. Aubourg, S., Lecharny, A. & Bohlmann, J. Genomic analysis of the terpenoid Corinne Dasilva , David Horner , Erica Mica , Delphine Jublot , Julie Poulain , 4 1 1 1 synthase (AtTPS) gene family of Arabidopsis thaliana. Mol. Genet. Genomics 267, Cle´mence Bruye`re , Alain Billault ,Be´atrice Segurens , Michel Gouyvenoux , Edgardo 1 2 1 1 730–745 (2002). Ugarte , Federica Cattonaro ,Ve´ronique Anthouard , Virginie Vico , Cristian Del 2,3 7 2,5 8 2,5 19. Tholl, D. et al. Formation of monoterpenes in Antirrhinum majus and Clarkia breweri Fabbro , Michae¨l Alaux , Gabriele Di Gaspero ,Vincent Dumas , Nicoletta Felice , 4 2,5 4 2,3 flowers involves heterodimeric geranyl diphosphate synthases. Plant Cell 16, Sophie Paillard , Irena Juman , Marco Moroldo , Simone Scalabrin , Aure´lie 4 4 6,15 7 977–992 (2004). Canaguier , Isabelle Le Clainche , Giorgio Malacrida ,Ele´onore Durand , Graziano 10,11,15 12 13 8 20. Adams, K. L. & Wendel, J. F. Polyploidy and genome evolution in plants. Curr. Opin. Pesole , Vale´rie Laucou , Philippe Chatelet , Didier Merdinoglu , Massimo 14,15 15,16 4 1 Plant Biol. 8, 135–141 (2005). Delledonne , Mario Pezzotti , Alain Lecharny , Claude Scarpelli , Franc¸ois 1 9,15 6,15 2,5 21. Simillion, C., Vandepoele, K., Van Montagu, M. C., Zabeau, M. & Van de Peer, Y. Artiguenave , M. Enrico Pe` , Giorgio Valle , Michele Morgante , Michel 4 4 1 1 The hidden duplication past of Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 99, Caboche , Anne-Franc¸oise Adam-Blondon , Jean Weissenbach , Francis Que´tier & 13627–13632 (2002). Patrick Wincker 22. Bowers, J. E., Chapman, B. A., Rong, J. & Paterson, A. H. Unravelling angiosperm *These authors contributed equally to this work. genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422, 433–438 (2003). Affiliations for participants: Genoscope (CEA) and UMR 8030 23. Vision, T. J., Brown, D. G. & Tanksley, S. D. The origins of genomic duplications in CNRS-Genoscope-Universite´ d’Evry, 2 rue Gaston Cre´mieux, BP5706, 91057 Evry, Arabidopsis. Science 290, 2114–2117 (2000). France. Istituto di Genomica Applicata, Parco Scientifico e Tecnologico di Udine, Via 24. Blanc, G., Hokamp, K. & Wolfe, K. H. A recent polyploidy superimposed on older 3 Linussio 51, 33100 Udine, Italy. Dipartimento di Matematica ed Informatica, Universita` large-scale duplications in the Arabidopsis genome. Genome Res. 13, 137–144 degli Studi di Udine, via delle Scienze 208, 33100 Udine, Italy. URGV, UMR INRA 1165, (2003). ´ ´ ´ ´ CNRS-Universite d’Evry Genomique Vegetale, 2 rue Gaston Cremieux, BP5708, 91057 25. Wolfe, K. H., Gouy, M., Yang, Y. W., Sharp, P. M. & Li, W. H. Date of the Evry cedex, France. Dipartimento di Scienze Agrarie ed Ambientali, Universita` degli monocot–dicot divergence estimated from chloroplast DNA sequence data. Proc. Studi di Udine, via delle Scienze 208, 33100 Udine, Italy. CRIBI, Universita degli Studi di Natl Acad. Sci. USA 86, 6201–6205 (1989). Padova, viale G. Colombo 3, 35121 Padova, Italy. URGI, UR1164 Ge´nomique Info, 523, 26. Crane, P. R., Friis, E. M. & Pedersen, K. R. The origin and early diversification of Place des Terrasses, 91034 Evry Cedex, France. UMR INRA 1131, Universite´ de angiosperms. Nature 374, 27–33 (1995). Strasbourg, Sante´ de la Vigne et Qualite´ du Vin, 28 rue de Herrlisheim, BP20507, 68021 27. Eshed, Y. & Zamir, D. An introgression line population of Lycopersicon pennellii in Colmar, France. Dipartimento di Scienze Biomolecolari e Biotecnologie, Universita` degli the cultivated tomato enables the identification and fine mapping of yield- 10 Studi di Milano, via Celoria 26, 20133 Milano, Italy. Dipartimento di Biochimica e associated QTL. Genetics 141, 1147–1162 (1995). Biologia Molecolare, Universita` degli Studi di Bari, via Orabona 4, 70125 Bari, Italy. 28. Howe, K. L., Chothia, T. & Durbin, R. GAZE: a generic framework for the 11 Istituto Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, via Amendola 122/ integration of gene-prediction data by dynamic programming. Genome Res. 12, D, 70125 Bari, Italy. UMR INRA 1097, IRD-Montpellier SupAgro-Univ. Montpellier II, 1418–1427 (2002). Diversite´ et Adaptation des Plantes Cultive´es, 2 Place Pierre Viala, 34060 Montpellier Cedex 1, France. UMR INRA 1098, IRD-Montpellier SupAgro-CIRAD, De´veloppement Supplementary Information is linked to the online version of the paper at et Ame´lioration des Plantes, 2 Place Pierre Viala, 34060 Montpellier Cedex 1, France. www.nature.com/nature. Dipartimento Scientifico e Tecnologico, Universita` degli Studi di Verona Strada Le Acknowledgements The sequencing of the grapevine genome was launched and Grazie 15 – Ca’ Vignal, 37134 Verona, Italy. Dipartimento di Scienze, Tecnologie e carried out after a scientific cooperation agreement between the Ministry of Mercati della Vite e del Vino, Universita` degli Studi di Verona, via della Pieve, 70 37029 S. Agriculture in France and the Ministry of Agriculture in Italy, involving l’Institut Floriano (VR), Italy. VIGNA-CRA Initiative; Consorzio Interuniversitario Nazionale per National de la Recherche Agronomique (INRA), Consiglio per la Ricerca e la Biologia Molecolare delle Piante, c/o Universita` degli Studi di Siena, via Banchi di Sotto Sperimentazione in Agricoltura (CRA) and Friuli Venezia Giulia Region. This work 55, 53100 Siena, Italy. © 2007Nature PublishingGroup doi:10.1038/nature06148 Paralogous and orthologous gene sets. We identified orthologous genes in METHODS six pairs of genomes from four species: A. thaliana, O. sativa, P. trichocarpa Genome sequencing. The V. vinifera PN40024 genome was sequenced with the and V. vinifera. Each pair of predicted gene sets was aligned with the Smith– use of a whole-genome shotgun strategy. All data were generated by paired-end Waterman algorithm, and alignments with a score higher than 300 (BLOSUM62; sequencing of cloned inserts using Sanger technology on ABI3730xl sequencers. gapo5 10, gape5 1) were retained. Two genes, A from genome GA and B from Supplementary Table 2 gives the number of reads obtained per library. genome GB, were considered orthologues if B was the best match for gene A in Genome assembly and chromosome anchoring. All reads were assembled with 12 GB and A was the best match for B in GA. Arachne . We obtained 20,784 contigs that were linked into 3,830 supercontigs For each orthologous gene set with V. vinifera, clusters of orthologous genes of more than 2 kb. The contig N was 64 kb, and the supercontig N was 1.9 Mb. 50 50 were generated. A single linkage clustering with a euclidean distance was used to The total supercontig size was 498 Mb, remarkably close to the expected size of group genes. The distances were calculated with the gene index in each chro- 475 Mb. This indicates that the PN40024 has retained few heterozygous regions. mosome rather than the genomic position. The minimal distance between two Remaining heterozygosity was assessed by aligning all supercontigs with each orthologous genes was adapted in accordance with the selected genomes. Finally, other. We first selected the supercontigs more than 30 kb in size that were we retained only clusters that were composed of at least six genes for Arabidopsis covered over more than 40% of their length by another supercontig with more and O. sativa, and eight genes for P. trichocarpa (Supplementary Table 10). than 95% identity. After visual inspection of the alignments, we added to this list To validate the clustering quality we used a method described previously . For the supercontigs more than 10 kb in size that aligned at more than 40% of their each cluster we computed the probability of finding this cluster in the gene length with supercontigs identified previously. All potential cases were then homology matrix (Supplementary Table 11). This matrix was constructed from inspected visually to discard potential heterozygous regions (aligning relatively two compared chromosomes with genes numbered according to their position homogeneously across their complete length) and retained repeated regions on each chromosome, with no reference to physical distances. (with more heterogeneous alignments). This treatment identified 11 Mb of Paralogous genes were computed by comparing all-against-all of V. vinifera potentially allelic supercontigs. We confirmed that in most cases their coverage proteins by using blastp, and alignments with an expected value of less than 0.1 was about half the average of the homozygous supercontigs. Only one super- were retained and realigned with the Smith–Waterman algorithm . Two genes A contig of each allelic pair was therefore conserved in the final assembly, which and B were considered paralogues if B was the best match for gene A and A was consists of 3,514 supercontigs (N 5 2 Mb) containing 19,577 contigs the best match for B. Moreover, clusters of paralogous genes were constructed in (N 5 66 kb), totalling 487 Mb. If the haploid genome size of 475 Mb is con- the same fashion as orthologous clusters (Supplementary Table 10). sidered correct, then our final assembly contains only about 12 Mb of remaining heterozygosity, or 2.6%. 29. Adam-Blondon, A. F. et al. Construction and characterization of BAC libraries A set of 30,151 bacterial artificial chromosome (BAC) fingerprints of the BAC from major grapevine cultivars. Theor. Appl. Genet. 110, 1363–1371 (2005). clones of a Cabernet–Sauvignon library were assembled into 1,763 contigs with 30. Soderlund, C., Humphray, S., Dunham, A. & French, L. Contigs built with 30 31 FPC , v. 8. In parallel, 1,981 markers were anchored on a subset of BAC clones , fingerprints, markers, and FPC V4.7. Genome Res. 10, 1772–1787 (2000). among which 388 markers mapped onto the genetic map, and 77,237 BAC end 31. Lamoureux, D. et al. Anchoring of a large set of markers onto a BAC library for the 31 32 development of a draft physical map of the grapevine genome. Theor. Appl. Genet. sequences were obtained . Blat alignments (90% identity on 80% of the length, 113, 344–356 (2006). fewer than five hits) were performed with BAC end sequences on the 3,830 32. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 supercontigs of sequences with lengths over 2 kb. The results were then filtered (2002). with homemade Perl scripts to keep only the occurrences in which two paired 33. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in ends were matching at a distance of less than 300 kb and with a consistent large genomes. Bioinformatics 21 (Suppl. 1), i351–i358 (2005). orientation. Two supercontigs were considered linked to each other if two 34. Roest Crollius, H. et al. Estimate of human gene number provided by genome-wide BAC links could be found or one BAC link and a BAC contig link. A total number analysis using Tetraodon nigroviridis DNA sequence. Nature Genet. 25, 235–238 of 111 ultracontigs were constructed with this procedure. (2000). Genome annotation. Several resources were used to build V. vinifera gene mod- 35. Jaillon, O. et al. Genome-wide analyses based on comparative genomics. Cold Spring Harb. Symp. Quant. Biol. 68, 275–282 (2003). els automatically with GAZE . We used predictions of repetitive regions by 33 34,35 36. Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, repeatscout , conserved coding regions predicted by the exofish method , 36 37 38 39 988–995 (2004). genewise alignments of proteins from Uniprot , Geneid and Snap ab initio 37. Bairoch, A. et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, gene predictions, and alignments of several cDNA resources (Supplementary D154–D159 (2005). Information). 38. Parra, G., Blanco, E. & Guigo, R. GeneID in Drosophila. Genome Res. 10, 511–515 A weight was assigned to each resource to further reflect its reliability and (2000). accuracy in predicting gene models. This weight acts as a multiplier for the score 39. Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004). of each information source, before being processed by GAZE. When applied to 40. Smith, T. F. & Waterman, M. S. Identification of common molecular the entire assembled sequence, GAZE predicted 30,434 gene models. subsequences. J. Mol. Biol. 147, 195–197 (1981). © 2007Nature PublishingGroup

Journal

NatureSpringer Journals

Published: Aug 26, 2007

There are no references for this article.