GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Craig Mermel; Steven Schumacher; Barbara Hill; Matthew Meyerson; Rameen Beroukhim; Gad Getz

doi:10.1186/gb-2011-12-4-r41

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Mermel, Craig; Schumacher, Steven; Hill, Barbara; Meyerson, Matthew; Beroukhim, Rameen; Getz, Gad 2011-04-28 00:00:00 We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets. Background subclone carrying such alterations acquires selectively Cancer formsthrough thestepwiseacquisition of beneficial mutations that promote clonal dominance [8]. somatic genetic alterations, including point mutations, Second,SCNAs maysimultaneouslyaffect up to thou- copy-number changes, and fusion events, that affect the sands of genes, but the selective benefits of driver altera- function of critical genes regulating cellular growth and tions are likely to be mediated by only one or a few of survival [1]. The identification of oncogenes and tumor these genes. For these reasons, additional analysis and suppressor genes being targeted by these alterations has experimentation is required to distinguish the drivers greatly accelerated progress in both the understanding from the passengers, and to identify the genes they are of cancer pathogenesis and the identification of novel likely to target. therapeutic vulnerabilities [2]. Genes targeted by somatic A common approach to identifying drivers is to study copy-number alterations (SCNAs), in particular, play large collections of cancer samples, on the notion that central roles in oncogenesis and cancer therapy [3]. Dra- regions containing driver events should be altered at matic improvements in both array and sequencing plat- higher frequencies than regions containing only passen- forms have enabled increasingly high-resolution gers [4,6,7,9-14]. For example, we developed an algo- characterization of the SCNAs present in thousands of rithm, GISTIC (Genomic Identification of Significant cancer genomes [4-6]. Targets in Cancer) [15], that identifies likely driver However, the discovery of new cancer genes being tar- SCNAs by evaluating the frequency and amplitude of geted by SCNAs is complicated by two fundamental observed events. GISTIC has been applied to multiple cancer types, including glioblastoma [10,15], lung adeno- challenges. First, somatic alterations are acquired at ran- dom during each cell division, only some of which (’dri- carcinoma [16], melanoma [17], colorectal carcinoma ver’ alterations) promote cancer development [7]. [18], hepatocellular carcinoma [19], ovarian carcinoma Selectively neutral or weakly deleterious ‘passenger ’ [20], medulloblastoma [21], and lung and esophageal alterations may nonetheless become fixed whenever a squamous carcinoma [22], and has helped identify sev- eral new targets of amplifications (including NKX2-1 * Correspondence: [email protected]; [16], CDK8 [18], VEGFA [19], SOX2 [22], and MCL1 [email protected] and BCL2L1 [4]) and deletions (EHMT1 [21]). Several Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA Full list of author information is available at the end of the article © 2011 Mermel et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Mermel et al. Genome Biology 2011, 12:R41 Page 2 of 14 http://genomebiology.com/2011/12/4/R41 additional algorithms for identifying likely driver SCNAs confidence has been a major limitation in interpreting have also been described [23-25] (reviewed in [26]). copy-number analyses, an important problem as end- Yet, several critical challenges have not yet been ade- users typically use these results to prioritize candidate quately addressed by any of the existing copy-number genes for time-consuming validation experiments. analysis tools. For example, we and others have shown Here we describe several methodological improve- that the abundance of SCNAs in human cancers varies ments to address these challenges, and validate the per- according to their size, with chromosome-arm length formance of the revised algorithms in both real and SCNAs occurring much more frequently than SCNAs of simulated datasets. We have incorporated these changes into a revised GISTIC pipeline, termed GISTIC 2.0. slightly larger or smaller size [4,27]. Therefore, analysis methods need to model complex cancer genomes that contain a mixture of SCNA types occurring at distinct Results and Discussion background rates. Existing copy-number methods have Overview of copy-number analysis pipeline also used ad hoc heuristics to define the genomic Cancer copy-number analyses can be divided into five regions likely to harbor true cancer gene targets. The discrete steps (Figure 1): 1) accurately defining the inability of these methods to provide a priori statistical copy-number profile of each cancer sample; 2) Individual .CEL Files Step 1: Array Calibration Accurate definition Copy Number Estimation of the copy number Segmentation profile in each sample Segmented Copy Number Profiles Deconstruction of segmented profle into underlying SCNAs Step 2: Elimination of arm-level SCNAs by Allows for modelling of background Identification/separation use of amplitude threshold rate of SCNAs and length-based of underlying SCNAs separation of arm-level and focal SCNAs G = -log(Probability | Background) Step 3: G = frequency x amplitude Scoring SCNAs in each Scores computed on markers or genes region according to p-values computed by random permutation of markers likelihood of occuring p-values computed by random across genome permutation of markers or bins by chance across genome Arbitrated peel-off algorithm Step 4: Greedy segment peel-off algorithm Defining independent Formalizes idea that segments can genomic regions Iteratively subtracts segments have multiple targets by allowing covering each peak and undergoing significant segment scores to be split rescores until no significant peaks levels of SCNA among multiple potential peaks remain on chromosome during peel-off Leave-k-out RegBounder Step 5: Accurate definition Assumes that at most ÔkÕ passenger Models expected local variation in events aberrantly define the minimal G-score to define boundaries predicted of the copy number common region to contain the true target with profile in each sample predetermined confidence GISTIC 1.0 (Beroukhim et al, 2007) GISTIC 2.0 Figure 1 Schematic overview of the copy-number analysis framework. High-level overview of our cancer copy-number analysis framework, highlighting specific differences between the original GISTIC algorithm [15] and the GISTIC 2.0 pipeline described in this manuscript. The first step, accurate identification of the copy-number profile in each sample, is common to GISTIC and GISTIC2.0. Mermel et al. Genome Biology 2011, 12:R41 Page 3 of 14 http://genomebiology.com/2011/12/4/R41 identifying the SCNAs that most likely gave rise to these passengers, so that their distribution reflects, to a first overall profiles and estimating their background rates of approximation, the operation of the ‘background’ muta- formation; 3) scoring the SCNAs in each region accord- tion process (see Supplementary Figure S2 in Additional ing to their likelihood of occurring by chance; 4) defin- file 3). ing the independent genomic regions undergoing statistically significant levels of SCNA; and 5) identifying Length-based separation of focal and arm-level SCNAs the likely gene target(s) of each significantly altered A major advantage of the ZD method is its ability to region. Figure 1 depicts a schematic overview of this separate arm-level and focal SCNAs explicitly by length. process, highlighting the specific methodological Prior studies have attempted to exclude arm-level improvements we will address in the present SCNAs by setting high amplitude thresholds [10,16] manuscript. because, in contrast to focal SCNAs, few arm-level The first step, accurately defining the copy-number SCNAs reach high amplitude (Figure 2a). However, this profile of each cancer sample, has been addressed by approach suffers from at least two undesirable conse- multiple previous studies [28-35] and is not discussed in quences: first, low- to moderate-amplitude focal copy- detail here. We assume that segmented copy-number number events are eliminated from the analysis, redu- profiles have been obtained for all samples and all germ- cing sensitivity to identify positively selected regions; line copy-number variations (CNVs) have been removed, and second, the amplitude threshold is left as a free yielding profiles of somatic events. The following sec- parameter, allowing for potential over-fitting of the ana- tions describe improvements to steps 2 to 5. We evalu- lysis to a desired result. ate these improvements on a test set of 178 We have previously shown that SCNA frequencies glioblastoma multiforme (GBM) cancer DNAs hybri- across cancers of diverse tissue origin are inversely pro- dized to the Affymetrix Single Nucleotide Polymorphism portional to SCNA lengths, with the striking exception (SNP)6.0 arrayaspartofThe Cancer Genome Atlas of SCNAs exactly the length of a chromosome arm or (TCGA) project [10] (the ‘TCGA GBM set’), and on whole chromosome (which are very frequent) [4]. This simulated data. Full technical details for each step are trend is preserved in the TCGA GBM samples (Figure described in the Supplementary Methods (Additional 2b). This reproducible distribution provides a natural file 1). basis for classifying events as ‘arm-level’ and ‘focal’ based purely on length. Such length-based filtering of Deconstruction of segmented copy-number profiles into events allows for the computational reconstruction of underlying SCNAs ‘arm-level’ and ‘focal’ representations of the cancer gen- ome (Figure 2c) and enables the inclusion of low- to Segmented copy number profiles represent the summed outcome of all the SCNAs that occurred during cancer moderate-amplitude focal copy-number events in the development. Accurate modeling of the background rate final analysis. of copy-number alteration requires analysis of the indi- To determine the benefits of this approach, we ran the vidual SCNAs. However, because SCNAs may overlap, it original ‘GISTIC 1.0’ algorithm on the TCGA GBM set is impossible to directly infer the underlying events using three different thresholding approaches (Figure 3; from the final segmented copy-number profile alone. Supplementary Table S1 in Additional file 4): 1) a low Given certain assumptions about SCNA background amplitude threshold (log2 ratio of ± 0.1) that only elimi- rates, however, it is possible to estimate the likelihood nates low-level artifactual segments; 2) a high amplitude of any given set of candidate SCNAs so as to select the threshold (log2 ratio of 0.848 and -0.737 for amplifica- most likely one. tions/deletions) used previously [16] to eliminate arm- We have developed an algorithm (’Ziggurat Decon- level events; and 3) the low amplitude threshold but struction’ (ZD)) that deconstructs each segmented copy- also removing all SCNAs occupying more than 98% of a number profile into its most likely set of underlying chromosome arm, leaving only the focal events. SCNAs (see Supplementary Methods in Additional file 1 Filtering out arm-level events through use of either and Supplementary Figure S1 in Additional file 2). ZD is amplitude or length thresholds greatly increased the an iterative optimization algorithm that alternatively sensitivity of GISTIC for detecting focal amplifications estimates a background model for SCNA formation and and deletions (Figure 3; Supplementary Table S1 in then utilizes this model to determine the most likely Additional file 4). While entire chromosomes were deconstruction of each copy-number profile. Its output scored as significant using only a low amplitude thresh- is a catalog of the individual SCNAs in each cancer old, including gain of chromosome 7 and loss of chro- sample, each with an assigned length and amplitude, mosome 10 (Figure 3a), a number of recurrent focal alterations were missed, including amplifications sur- that sum to generate the original segmented copy pro- file. We assume that most of these SCNAs are rounding CDK6, CCND2,and HMGA2.These Mermel et al. Genome Biology 2011, 12:R41 Page 4 of 14 http://genomebiology.com/2011/12/4/R41 (a) (b) Amplitude of Focal and Arm-level SCNAs Length Distribution of SCNAs 0.45 Focal SCNAs Arm-level SCNAs 3.5 0.4 0.35 2.5 0.3 0.25 High amp 0.2 threshold 1.5 0.15 0.1 0.5 0.05 Low amp threshold 0 0.5 1 1.5 2 Focal SCNAs Arm-level SCNAs Length (fraction of chr arm) (c) All Data Arm-level SCNAs Focal SCNAs =+ Figure 2 Computational separation of arm-level and focal SCNAs. (a) Boxplot showing the distribution of copy-number changes for amplified focal (length < 98% of a chromosome arm) and arm-level (length > 98% of a chromosome arm) SCNAs across 178 GBM profiles from TCGA. The black dotted line denotes a typical low-level amplitude threshold used to eliminate artifactual SCNAs, while the green dotted line denotes a typical high-level amplitude threshold used in previous version of GISTIC to eliminate arm-level SCNAs. (b) Histogram showing the frequency of observing SCNAs of a given length across 178 GBM samples. The high frequency of events occupying exactly one chromosome arm led us to distinguish between focal and arm-level SCNAs. (c) Heatmaps showing the total segmented copy-number profile of the TCGA GBM set (leftmost panel), and the results of computationally separating these samples into arm-level profiles (middle panel) and focal profiles (rightmost panel) by summing arm-level and focal SCNAs. In each heatmap, the chromosomes are arranged vertically from top to bottom and samples are arranged from left to right. Red and blue represent gain and loss, respectively. alterations were detected using either the high ampli- alteration. Ideally, we aim to score each region of the tude (Figure 3b) or the focal length filters (Figure 3c). genome according to the probability with which the The benefits of length-based filtering result from the observed set of SCNAs would occur by chance alone. inclusion of low- to moderate-amplitude focal events. Scores using this framework have a clear interpretation: Amplification of PIK3CA and AKT1 and deletion of thehigherthe scoreassigned to aregion, thelesslikely WWOX are detected using length-based filtering, but that the SCNAs in that region are observed entirely by are not significant under the high amplitude filter (com- chance, and the more likely that they underwent positive pare Figure 3b and 3c). Moreover, the length-based ana- selection. lysis identified significant SCNAs detected in neither of The probability of observing a single SCNA of given the amplitude-based analyses, including amplifications length and amplitude can be approximated by the fre- of MLLT10 and deletions of CDKN1B and NF1. quency of occurrence of events of similar length and No known GBM target gene was detected in either of amplitude across the entire dataset (as in Supplemen- the amplitude-based analyses that was not also detected tary Figure S2 in Additional file 3). However, since by the length-based analysis. These results suggest that cancer genomes do contain drivers, this procedure is length-based filtering of arm-level events greatly likely to overestimate the probability of observing improves the sensitivity of GISTIC to identify relevant SCNAs under the null model. Specifically, driver regions of focal SCNA. events tend to be shorter in length and of higher amplitude than passengers and therefore constitute Probabilistic scoring of SCNAs the majority of events in their length/amplitude neighborhood (Supplementary Figure S3 in Additional We set out to define a scoring framework for SCNAs that more accurately reflects the background rates of file 5). Copy Number Change Fraction of segments Mermel et al. Genome Biology 2011, 12:R41 Page 5 of 14 http://genomebiology.com/2011/12/4/R41 All Data All Data Focal Data Low Amplitude Threshold High Amplitude Threshold Low Amplitude Threshold (a) (b) (c) 0.053 0.1 0.2 0.4 0.8 0.033 0.1 0.2 0.4 0.8 0.03 0.1 0.2 0.4 0.8 1 1 1 MDM4 MDM4 MDM4 AKT3 AKT3 AKT3 2 2 2 3 3 3 PIK3CA PIK3CA PDGFRA PDGFRA PDGFRA 4 4 4 5 5 6 6 EGFR EGFR EGFR 7 7 7 CDK6 CDK6 MET MET MET 8 8 8 9 9 9 MLLT10 10 10 CCND2 CCND2 11 11 11 CDK4 CDK4 CDK4 12 12 12 HMGA2 HMGA2 MDM2 MDM2 13 13 MDM2 13 14 14 14 AKT1 AKT1 15 15 16 16 17 17 17 18 18 18 19 19 20 20 20 21 21 21 22 22 -3.7 -6.8 -13 -25 -50 -100 0.25 -3.7 -6.8 -13 -25 -50 -100 0.25 -3.7 -6.8 -13 -25 -50 -100 0.25 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 27 Amplified Regions 41 Amplified Regions 55 Amplified Regions 0.4 0.2 0.1 0.079 0.4 0.2 0.1 0.038 CDKN2C 0.4 0.2 0.1 0.028 CDKN2C 1 CDKN2C QKI 6 QKI 7 QKI CSMD1 CSMD1 CDKN2A/2B CDKN2A/2B 9 CDKN2A/2B PTEN 10 PTEN PTEN CDKN1B RB1 13 RB1 13 RB1 14 13 15 14 16 15 WWOX 16 17 WWOX 18 17 NF1 21 19 22 21 20 -50 -13 -3.7 0.25 21 10 10 10 -100 -25 -6.8 10 10 10 0.25 22 -100 -25 -6.8 0.25 10 10 10 15 Deleted Regions 31 Deleted Regions 36 Deleted Regions Figure 3 Effects of amplitude-based or length-based filtering of arm-level events on GISTIC results. (a-c) GISTIC amplification (top) and deletion (bottom) plots using all data and a low amplitude threshold (a), using all data and a high amplitude threshold (b), and using the focal data and a low amplitude threshold (c). The genome is oriented vertically from top to bottom, and GISTIC q-values at each locus are plotted from left to right on a log scale. The green line represents the significance threshold (q-value = 0.25). For each plot, known or interesting candidate genes are highlighted in black when identified by all three analyses, in red when identified by the high amplitude or focal length analyses, in purple when identified by the low amplitude or focal length analyses, and in green when identified only in the focal length analysis. To avoid biasing our background model, we set out to proportional to the amplitude in copy-number space fit the log-probability distribution of SCNAs to a func- rather than log-copy-number space. tional form that would be insensitive to the presence of Although this functional form was empirically derived driver events in the data (Supplementary Methods in from a large collection of samples run on two different Additional file 1). We made use of a large collection of array-based platforms, it does lead to increased sensitiv- 3,131 cancer samples run on the Affymetrix 250K StyI ity to differences in dynamic range across platforms as SNP Array [4] plus several hundred additional samples well as differential saturation characteristics of probes run on the Affymetrix SNP6.0 Array (data not shown). within the same array platform. To avoid this problem, At the level of resolution provided by these arrays, the we routinely cap the segmented copy-number data at a probability of observing a focal SCNA at a given locus level representing the signal intensity above which most under the background model is roughly independent of probes start to saturate (Supplementary Methods in length. As a result, the functional form for the log-prob- Additional file 1). This ensures that we are using data ability distribution is similar to the original GISTIC G- that originate from the linear regime of the probes’ score definition (G = Frequency × Amplitude), with the response curves and therefore are more comparable notable exception being that the new score is across platforms. Deletions Amplifications Mermel et al. Genome Biology 2011, 12:R41 Page 6 of 14 http://genomebiology.com/2011/12/4/R41 As with GISTIC 1.0, we obtain P-values for each mar- However, because this scoring method does not score ker by comparing the score at each locus to a back- regions of the genome that are not in annotated genes, ground score distribution generated by random it could underweight or completely miss deletions permutation of the marker locations in each sample occurring in non-genic regions. For example, in our (Supplementary Methods in Additional file 1). This pro- GBM samples, gene-based scoring did not identify a cedure controls for sample-specific variations in the rate region just outside of PCHD9 on chr13q21.3 that scored of copy-number alteration. We correct the resulting P- as highly significant (q-value = 4.4e-9) using the stan- dard marker-based score (Supplementary Figure S4b in values for multiple-hypothesis testing using the Benja- Additional file 7). While many non-genic deletions may mini-Hochberg false discovery rate method [36]. in fact represent technical artifacts or rare germline Alternative gene-level scoring for tumor suppressors with events, some may be functionally relevant. non-overlapping deletions Some genes are affected by non-overlapping deletions, Identification of independent significantly altered regions either on different alleles in one sample or across multi- Individual SCNAs, and indeed significantly amplified or ple samples. For such genes, a marker-based score does deleted regions of the genome, may extend over more not weight the presence of all deletions affecting that than one oncogene or tumor suppressor gene. Other gene, despite the fact that these events are likely to have significant regions may contain no oncogenes or tumor similarly deleterious effects on gene function. We have suppressor genes, but achieve apparent significance due developed a modified scoring and permutation proce- to their proximity to a target gene. Thus, an additional dure, termed GeneGISTIC, that scores genes rather step is required after genome-wide scoring to identify than markers (Supplementary Methods in Additional file independently significant regions. 1). In each sample, we assign each gene the minimal GISTIC 1.0 solves this problem through the use of an copy number of any marker contained within that gene, iterative ‘peel-off’ algorithm, which greedily assigns all and then sum across all samples to compute the gene SCNAs to the maximal peak on each chromosome, score. Because genes covering more markers are more removes them from the data, and rescores until no likely toachieve amoreextreme valuebychance, the remaining region crosses the significance threshold. This permutation procedure is adjusted to account for gene approach reduces the power to identify secondary peaks size;the scorefor agenecovering n markers is com- that are close to previously identified significant regions pared against a size-specific null distribution generated (Figure 4a). However, since it is possible for individual by computing minima overall running windows of size n SCNAs to affect multiple driver regions, a less greedy in each sample and then randomly permuting these approach might identify additional peaks without signifi- minimal values across the genome. cantly increasing the false discovery rate. To determine the effect of gene-based scoring of dele- We have, therefore, modified the method to allow tions, we compared the results of gene-based and mar- SCNAs to contribute to more than one peak (’arbitrated ker-based scoring on the TCGA GBM set (holding all peel-off’). We first greedily assign the entirety of an other parameters equal). As expected, GeneGISTIC SCNA’s score to the most significant peak it covers. In ranks known tumor suppressor genes higher and is subsequent steps, however, we allow scores of previously more sensitive for genes subject to non-overlapping assigned segments to be redistributed before deciding deletions (Supplementary Table S2 in Additional file 6). whether a putative region is significant (Supplementary For example, RB1 was ranked 5th out of 39 regions Methods in Additional file 1). Like the original algo- using gene-based scoring (q-value = 2.6e-10) but only rithm, the process terminates when no region has an 13th out of 38 using marker-based scoring (q-value = adjusted score that exceeds the significance threshold. A 0.0013), and CDKN1B was ranked 26th using gene- similar modification of GISTIC has recently been pro- based scoring (q-value = 0.08) compared to 38th using posed [37]. marker-based scoring (q-value = 0.19). NF1 was focally Arbitrated peel-off is more sensitive than the original deleted in 12 of the 178 GBM samples (6.7%), and these algorithm (Figure 4a; Supplementary Table S3 in Addi- deletions were frequently non-overlapping (Supplemen- tional file 8). We generated 10,000 simulated datasets tary Figure S4a in Additional file 7). As a result, NF1 each consisting of 300 samples, with each chromosome was scored just over or just under the significance containing a primary driver event in 10% of the samples threshold using the marker-based score, depending on and a secondary driver event in 5% of the samples. We the parameters used. By contrast, NF1 was robustly analyzed the sensitivity of standard and arbitrated peel- identified using gene-based scoring across all parameter off to detect the secondary peak as we varied the percen- combinations (Supplementary Table S2 in Additional tage of secondary driver events that overlapped the pri- file 6 and data not shown). mary driver peak between 0% and 100% (Supplementary Mermel et al. Genome Biology 2011, 12:R41 Page 7 of 14 http://genomebiology.com/2011/12/4/R41 (a) (b) Sensitivity vs. Driver Distance Sensitivity vs. Driver SCNA Overlap 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 60 100 90 80 70 60 50 40 30 20 10 0 Closer Distance Farther Distance Distance between drivers (Mb) Fraction overlap between driver events (%) Arbitrated peel−off Standard peel−off Figure 4 Sensitivity of peel-off to detect secondary driver events. The average fraction of secondary driver events recovered in independent (not containing the primary driver) peaks by GISTIC using the standard peel-off method (blue line) or arbitrated peel-off (red line) is shown for two simulated datasets. (a) The data are derived from 1,000 simulated chromosomes across 300 samples with a primary driver event present in 10% of samples and a secondary driver event a fixed distance away that is present in 5% of samples. (b) Data are derived from 10,000 simulated chromosomes across 300 samples with a primary driver event present in 10% of samples and a secondary driver event present in 5% of samples, where the fraction of the secondary driver events that overlapped with the primary driver event was varied between 100% (complete dependence; far left) and 0% (complete independence; far right). Error bars represent the mean ± standard error of the mean (some are too small to be visible). Methods in Additional file 1). At 0% overlap, the two genes being targeted for each independently significant methods were nearly equally sensitive at identifying the region of SCNA (the ‘peak region’). The standard secondary peak. However, arbitrated peel-off was vastly approach is to focus on the minimal common region more sensitive than standard peel-off as we increased the (MCR) of overlap (Figure 5a), the region that is altered in rate of overlap between primary and secondary peaks from the greatest number of samples and therefore would be 5 to 50% (Figure 4b), recovering an average of 2.4 times expected to be the most likely to contain the target genes. (range 1.2 to 3.8) more secondary peaks. Over 80% of the However, one or more passenger SCNAs adjacent to, but novel peaks identified by arbitrated peel-off corresponded not overlapping, the target gene can result in an MCR that to an actual simulated driver peak, demonstrating that the does not include the true target. This is a frequent occur- increased sensitivity is accompanied by high specificity. rence, especially when the frequency of the driver event is The primary and secondary peaks tend to merge when low (< 5%; Figure 5b). An alternative method (utilized by the overlap is above 50%, obscuring any appreciable dif- the GISTIC 1.0) is to apply a heuristic ‘leave-k-out’ proce- ference between the two methods (Supplementary Fig- dure to define the boundaries of each peak region (Figure ure S5 in Additional file 9). Indeed, neither method was 5a) [15]. This procedure assumes that up to k passenger capable of independently identifying the secondary peak SCNAs (typically, k = 1) may aberrantly define each once the percent overlap rose above 80%. These simula- boundary of the peak region. While the ‘leave-k-out’ pro- tions demonstrate both the superior sensitivity of arbi- cedure correctly identifies the target gene more often than trated peel-off as well as the challenge of identifying the MCR (Figure 5b), it suffers from the potential for over- neighboring drivers. fitting introduced by the free parameter ‘k’. Moreover, the accuracy of ‘leave-k-out’ varies depending on the number Localizing target genes for each significantly altered of samples and the frequency of the event under question. region For fixed k, the sensitivity of ‘leave-k-out’ increases for The final step in the GISTIC pipeline is to determine increasing driver frequency (Figure 5b) and decreases for the region that is most likely to contain the gene or increasing sample size (Figure 5c). % recovery of independent second driver peak % recovery of independent second driver peak Mermel et al. Genome Biology 2011, 12:R41 Page 8 of 14 http://genomebiology.com/2011/12/4/R41 Leave-1-Out MCR (a) ΔG RegBounder Target Gene Chromosomal Position Target Gene Driver Recall as Function of Driver Frequency Driver Recall as Function of Sample Size (b) (c) (n = 500 samples) (5% driver frequency) 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 MCR Leave−1−Out RegBounder 50% MCR 10 10 RegBounder 75% Leave−1−Out RegBounder 95% RegBounder 75% 0 0 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 600 700 800 Driver Frequency (%) Number of samples Figure 5 Sensitivity of peak finding algorithms. (a) Schematic diagram demonstrating various peak finding methods. The left panel shows the GISTIC score profile for a simulated chromosome containing a mix of driver events covering the denoted target gene and passenger events randomly scattered across the chromosome. The inset at right shows the region around the maximal G-score (gray box in left panel) in higher detail. The MCR (red dotted lines) is defined as the region of maximal segment overlap, or the region of highest G-score. The leave-k-out procedure (blue dotted lines, here shown for k = 1) is obtained by repeatedly computing the MCR after leaving out each sample in turn and taking as the left and right boundaries the minimal and maximal extent of the MCR. RegBounder works by attempting to find a region (dotted green line) over which the variation between boundary and maximal peak score is within the gth percentile of the local range distribution (Supplementary Methods in Additional file 1). Here, RegBounder produces a wider region than either the MCR or leave-k-out procedures, but is the only method whose boundary contains the true driver gene. (b,c) The average fraction of driver events contained within the peak region (conditional on having found a GISTIC peak within 10 Mb) is plotted as a function of driver-frequency (b) or sample size (c) for the MCR (red), leave-1-out (blue), and RegBounder algorithms (the latter at various confidence levels: 50%, magenta; 75%, green; 95%, black). In (b), data are derived from 10,000 simulated chromosomes across 500 samples in which the driver frequency varied from 1 to 10%. In (c), data are derived from 10,000 simulated chromosomes across a variable number of samples in which the driver frequency was fixed at 5%. Error-bars represent the mean ± standard error of the mean (some are too small to be visible). We developed a novel approach (termed ‘RegBoun- driver at least g% of the time, where g is a desired confi- der’) to define the peak region boundaries in such a way dence level. Unlike the MCR and ‘leave-k-out’ proce- that target genes would be included at a pre-determined dures, which are highly dependent on one or a few confidence level, regardless of the event frequency or segment boundaries to define each region, RegBounder number of samples being studied (Figure 5a; Supple- is designed to be relatively robust to random errors mentary Methods in Additional file 1). RegBounder (either due to technical artifacts or passenger segments) models the expected random fluctuation in G-scores in boundary assignment. When applied to real data, within any given window size and uses this distribution RegBounder captures known driver genes more effec- to define a confidence region likely to contain the true tively than ‘leave-1-out ’ (and MCR) in regions with Fraction of drivers identified (%) GISTIC Score Fraction of drivers identified (%) Mermel et al. Genome Biology 2011, 12:R41 Page 9 of 14 http://genomebiology.com/2011/12/4/R41 increased local noise (Figure 6a) and yet is capable of of 90 times larger than the ‘leave-k-out’ peak regions producing narrower boundaries than ‘leave-1-out’ in (for datasets with few total driver events, in which the regions with little noise (Figure 6b). target gene locations are truly uncertain) to 37% smaller In simulated datasets, the performance of RegBounder than the ‘leave-k-out’ procedure (for datasets with many was consistent across a wide range of driver SCNA fre- total driver events). Thus, the increased confidence of quencies (Figure 5b) and sample sizes (Figure 5c), and RegBounder can even be achieved while producing nar- indeed controlled the probability of containing the dri- rower regions than the ‘leave-k-out’ procedure. ver. RegBounder captured the true driver gene in an RegBounder is also more consistent across datasets than the MCR and ‘leave-k-out ’ methods. We ran- average of 72%, 85%, and 95% of driver regions of vary- ing frequency when run with a desired confidence level domly split the TCGA GBM set into two groups and (g) of 50, 75, and 95%, respectively. For no combination compared the peak regions produced by RegBounder of sample-size, driver frequency, and g did the average and the MCR and ‘leave-k-out’ procedures on each. accuracy of RegBounder drop below g. Considering only those peaks that were identified by RegBounder also demonstrated a more optimal trade- GISTIC in both datasets, only 23% of the MCRs and off between peak region sensitivity (the likelihood of 31% of the ‘leave-k-out’ peak regions overlap between including the target gene) and specificity (the number of the two datasets, reflectingthe lowconfidencewith additional genes included) than the MCR or ‘leave-k- which these regions are assigned. By contrast, a major- out’ approaches. The average size of the peak regions ity (53%) of the RegBounder peak regions (at 75% con- decreases with increasing driver frequency (Figure 7a) fidence) overlapped, as expected (0.75 =56%). This and sample size (Figure 7b) for all three approaches. increased overlap came with only a modestly increased However, RegBounder is more sensitive to these vari- median size of the RegBounder peak regions (370 kb) ables than the other methods, so that RegBounder peak compared to the leave-k-out (163 kb) or MCR (115 regions (at 75% confidence) can range from an average kb) peak regions. RegBounder vs. MCR and Leave-1-Out on Lung Adenocarcinoma Samples KRAS hTERT (a) (b) 0.26 0.285 0.28 0.24 0.275 0.22 0.27 0.265 0.2 0.26 0.18 0.255 0.25 0.16 0.245 25 25.5 26 26.5 1.0 1.5 MCR Chromosome 12 Position (Mb) Chromosome 5 Position (Mb) Leave-1-Out RegBounder Figure 6 Comparison of RegBounder to MCR and leave-1-out procedures applied to primary lung adenocarcinomas. The advantages of RegBounder over previous peak-finding procedures are illustrated for two well-described oncogene peaks identified in GISTIC analysis of 371 lung adenocarcinoma samples characterized on the Affymetrix 250K StyI SNP array (as published in [16]). (a) A well-described amplification peak is identified on chromosome 12p12.1 with MCR (red dotted lines) near to but not containing the known lung cancer oncogene KRAS. Because there are more than two apparent passenger events in this region, the leave-1-out peak (blue dotted lines) also does not contain KRAS. However, RegBounder (green dotted lines) produces a wider peak that captures KRAS. (b) An amplification peak on chromosome 5p15.33 contains hTERT, the catalytic subunit of the human telomerase holoenzyme, within the MCR (red dotted lines). In this case, RegBounder (green dotted lines) produces a narrower peak region than the corresponding leave-1-out peak (blue dotted lines), demonstrating the ability of RegBounder to achieve a greater balance between peak region size and accuracy. In both (a) and (b), the y-axis depicts the amplification G- score and the x-axis denotes position along the corresponding chromosome. G-score G-score Mermel et al. Genome Biology 2011, 12:R41 Page 10 of 14 http://genomebiology.com/2011/12/4/R41 Peak Region Size As Function of Driver Frequency Peak Region Size As Function of Sample Size (a) (b) (n = 500 samples) (5% Driver Frequency) MCR MCR Leave−1−Out Leave−1−Out RegBounder 75% RegBounder 75% 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 600 700 800 Driver Frequency (%) Sample Size RegBounder vs. Theoretically Optimal Peak Region (c) (n = 500 samples) Theoretical Minimum Peaks (75% confidence) RegBounder Peaks (75% confidence) 0 1 2 3 4 5 6 7 8 9 10 Driver Frequency (%) Figure 7 Specificity of peak finding algorithms. (a,b) The median size of the peak regions produced by the MCR (red), leave-1-out (blue), and RegBounder (green, 75% confidence) are shown as a function of driver frequency (a) and sample size (b). In (a), data are derived from 10,000 simulated chromosomes across 500 samples in which the driver frequency varied from 1 to 10%. In (b), data are derived from 10,000 simulated chromosomes across a variable number of samples in which the driver frequency was fixed at 5%. (c) Comparison of the peak region sizes obtained by RegBounder (green line) with the theoretically minimal peak region sizes (black line) that could be obtained by any peak finding algorithm with a similar confidence level (Supplementary Methods in Additional file 1). Error-bars represent the mean ± standard error of the mean (some are too small to be visible). RegBounder regions are, on average, only 19% larger optimal trade-off between statistical confidence and than the theoretically minimal peak region size for a peak resolution than previous heuristic approaches. wide range of driver frequencies (Figure 7c) and confi- dence levels (Supplementary Figure S6 in Additional file Source code and module availability 10). These theoretically minimal peak region sizes were The MATLAB source code for the GISTIC2.0 pipeline, derived from the distribution of distances between the along with a precompiled unix executable, will be avail- target gene and the MCR in our simulations (Supple- able for download at [38]. In addition, the entire pipe- mentary Methods in Additional file 1). Our simulations line can be accessed through the GenePattern analysis reveal that RegBounder is capable of producing smaller portal at [39]. peak regions than the ‘leave-k-out’ approach while In addition to including all the methodological simultaneously achieving greater target gene recall improvements described in this manuscript, the GIS- (compare Figures 5b and 7a; ‘RegBounder 75%’ versus TIC2.0 source code has been designed to make efficient ‘leave-1-out’, for driver frequencies > 5%). Thus, use of memory in storing segmented copy-number data RegBounder is a robust algorithm for peak region (Supplementary Methods in Additional file 1). This boundary determination that demonstrates a more improved memory efficiency should allow users with Median Peak region size (markers) Median Peak region size (markers) Median Peak region size (markers) Mermel et al. Genome Biology 2011, 12:R41 Page 11 of 14 http://genomebiology.com/2011/12/4/R41 limited computational resources to run GISTIC2.0 on these assumptions are violated, RegBounder’sperfor- typical size datasets, and will be increasingly important mance may be worse than our simulations suggest. for all users as the density of copy-number measuring While the arbitrated peel-off approach described in this platforms continues its rapid rise. manuscript reflects a more sensitive way of identifying independently targeted regions of amplification and dele- Conclusions tion than our prior approach, it is still an imperfect attempt to decipher the complexity of cancer copy-num- We describe a number of analytical improvements to ber alterations. One major limitation stems from the fact the standard copy-number analysis workflow that that array-based measurements map SCNAs onto a linear increase the sensitivity and specificity with which driver genes may be localized. We also demonstrate the utility reference genome. However, many SCNAs are preceded of each of these changes using both simulated and real by rearrangement events that juxtapose genomic regions cancer copy-number datasets. While these changes have separated by great physical distance in the germline (even been specifically implemented in GISTIC 2.0, the chal- different chromosomes) [40,41]. This level of detailed lenges we describe apply broadly to the general task of structural information is impossible to infer from probe- identifying significantly aberrant regions of SCNA in level copy-number estimates but can be obtained by cancer, and we anticipate that the approaches we have sequencing paired-end libraries [13]. Indeed, we anticipate described can be adapted to other copy-number analysis that copy-number information derived from shotgun workflows. sequencing of cancer samples will become more common The procedure we outline enables data-driven estima- as sequencing costs continue to plummet [42]. Tools for tion of the background rates of SCNA and how these estimating and segmenting copy-number values from rates vary with features of the SCNA, such as length or sequencing coverage data already exist [5], and these seg- amplitude. Thespecifictrendswehaveobservedare mented copy-number profiles can, with only slight modifi- likely to depend on the resolution and characteristics of cation, be run through the GISTIC 2.0 workflow. Fully the measuring platform used to generate our datasets exploiting the level of detailed information provided by (the Affymetrix 250K StyI and SNP6.0 arrays). As more these technologies will, however, require a significant cancer samples are characterized using higher-resolution extension of the background mutation model to include array and sequencing platforms, new trends are likely to the probability of random genomic rearrangements, as emerge. Further improvements would account for such well as the ability to perform significance analysis, segment peel-off, and peak finding across non-contiguous regions trends, possibly taking into account additional features of the reference genome. The data provided by these that may determine SCNA background rates, such as the presence of known fragile sites of the genome or the sequencing efforts should lead to new insights into the cel- surrounding sequence context. Indeed, we and others lular and molecular processes underlying SCNA genera- have recently shown that somatic deletions frequently tion in different cancer types, and will allow for the occur in genes with large genomic footprints [4,6], sug- development of vastly more detailed and accurate models gesting the existence of a contextual bias in the rate of of the background mutation rate of such events during somatic deletion that is presently unaccounted for in tumor development. our background mutation model. Our probabilistic scor- ing framework allows such trends to be accounted for Materials and methods once the background model has been specified. Full methods are available in the Supplementary Materi- For the significant SCNAs, the background rate esti- als (Additional file 1) [43-46]. mates also enable the delineation of regions likely to con- tain the target genes at predetermined confidence. Additional material RegBounder, the algorithm we devised to assign these boundaries, is more robust than either MCR- or ‘leave-k- Additional file 1: Supplementary Methods. Supplementary Methods contains the full description of the GISTIC2.0 method and details of the out’-based methods. RegBounder achieves this higher specific analyses presented in this manuscript. sensitivity by producing wider peak regions when the Additional file 2: Supplementary Figure S1: Ziggurat number of informative segments at a driver locus is Deconstruction. (a) A hypothetical segmented chromosome (green line) small, but we find that RegBounder performs well com- is deconstructed with the simplified procedure used by Ziggurat Deconstruction (ZD) to initialize background SCNA rates. Dotted red and pared to the theoretically optimal performance. However, blue lines denote the length and amplitude of amplified and deleted RegBounder’s underlying assumptions may not always be SCNAs, respectively, while solid red and blue lines denote the result of satisfied, including the assumption that each peak region merging the SCNA with the closest adjacent segment. (b) The same hypothetical segmented chromosome (green line) is deconstructed using contains asingledominanttargetgeneand theexpecta- the more flexible procedure of subsequent rounds of ZD. Here, the ZD is tion that copy-number breakpoints are independently performed with respect to up to two basal levels (dotted magenta lines) distributed around the driver locus. To the extent that Mermel et al. Genome Biology 2011, 12:R41 Page 12 of 14 http://genomebiology.com/2011/12/4/R41 Abbreviations that are fit to the data, allowing for amplified and deleted SCNAs to be CNV: copy number variation; GBM: glioblastoma multiforme; GISTIC: Genomic superimposed. Identification of Significant Targets in Cancer; MCR: minimal common region; Additional file 3: Supplementary Figure S2: distribution of SCNA SCNA: somatic copy number alteration; SNP: single nucleotide length and amplitudes. Two-dimensional histogram showing the polymorphism; TCGA: The Cancer Genome Atlas; ZD: Ziggurat frequency (z-axis) of copy number events as a function of length (x-axis) Deconstruction. and amplitude (y-axis). Frequency is plotted on a log-scale to facilitate visualization of very low frequency copy number events. Acknowledgements This work was supported by a Genome Characterization Center Grant Additional file 4: Supplementary Table S1: comparison of amplitude (U24CA143867) awarded as part of the NCI/NHGRI funded Cancer Genome and length-based filtering of SCNAs. Supplementary Table 1 compares Atlas (TCGA) project. CHM was supported by Medical Scientist Training the GISTIC results obtained using low and high amplitude thresholds Program (MSTP) Award Number T32GM07753 from the National Institute of with those obtained using a focal length threshold on 178 GBM samples. General Medical Sciences. RB was supported by NIH K08CA122833, a V Additional file 5: Supplementary Figure S3: distribution of driver Foundation Scholarship, and the Doris Duke Charitable Foundation. The length and amplitudes. Driver SCNAs are typically of shorter length and content is solely the responsibility of the authors and does not necessarily higher amplitude than random passenger SCNAs. (a,b) Here we show represent the official views of the National Institute of General Medical the cumulative frequency distribution of SCNA amplitudes (a) and Sciences or the National Institutes of Health. lengths (b) for SCNAs covering significantly amplified regions identified by GISTIC (’Driver SCNAs’, red line) or by a similar number of randomly Author details chosen non-driver regions (’Random SCNAs’, blue line). 1 Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Additional file 6: Supplementary Table S2: comparison of Center, Cambridge, MA 02142, USA. Department of Medical Oncology, Dana GeneGISTIC and standard GISTIC deletions analysis. Supplementary Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA. Table 2 compares the GISTIC results obtained using the standard GISTIC Department of Cancer Biology, Dana Farber Cancer Institute, 44 Binney deletions analysis with those obtained using GeneGISTIC on 178 GBM Street, Boston, MA 02115, USA. The Center for Cancer Genome Discovery, sanples. Dana Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA. Additional file 7: Supplementary Figure S4: GeneGISTIC versus Authors’ contributions standard GISTIC. (a) GeneGISTIC helps identify genes subject to non- RB and GG developed and coded the original GISTIC algorithm. CHM, SES, overlapping deletion, such as NF1. The left panel shows the 12 samples RB, and GG developed and coded the algorithmic modifications contained with focal deletions affecting NF1, many of which do not overlap. As a in GISTIC 2.0. CHM, MM, RB, and GG conceived and designed the present result, the standard GISTIC marker score (blue line, right panel) has study. CHM, SES, and BH debugged and packaged the GISTIC 2.0 software multiple local maxima over NF1. By contrast, the GeneGISTIC score release. CHM, MM, RB, and GG wrote the manuscript. All authors read and counts all of these deletions as contributing to the NF1 score, resulting approved the final manuscript. in a score for NF1 (red line, right panel) that is significantly greater than that assigned to any of the individual markers covering NF1. (b) Received: 18 August 2010 Revised: 14 February 2011 GeneGISTIC does not score deletions occurring outside of genes. The left Accepted: 28 April 2011 Published: 28 April 2011 panel shows a region of focal deletion occurring just outside the PCHD9 gene on chromosome 13. These deletions result in a peak in the markers deletion score (blue line, right panel) that is not detected by GeneGISTIC. References 1. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100:57-70. Additional file 8: Supplementary Table S3: new peaks detected by 2. Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature 2009, arbitrated peel-off. Supplementary Table 3 compares the GISTIC results 458:719-724. obtained using the standard peel-off algorithm with those obtained 3. Santarius T, Shipley J, Brewer D, Stratton MR, Cooper CS: A census of using arbitrated peel-off on 178 GBM samples. amplified and overexpressed human cancer genes. Nat Rev Cancer 2010, Additional file 9: Supplementary Figure S5: total recovery of 10:59-64. secondary driver peaks. This figure shows the results from 10,000 4. Beroukhim R, Mermel C, Porter D, Wei G, Raychaudhuri S, Donovan J, simulations of 300 samples in which a primary driver event is present in Barretina J, Boehm J, Dobson J, Urashima M: The landscape of somatic 10% of samples and a secondary driver event is present in 5% of copy-number alteration across human cancers. Nature 2010, 463:899-905. samples. In these simulations, we vary the fraction of overlap between 5. Chiang D, Getz G, Jaffe D, O’Kelly M, Zhao X: High-resolution mapping of driver events from 100% (total dependence) to 0% (total independence). copy-number alterations with massively parallel sequencing. Nat Here we present to the total recovery of the secondary driver peak in Methods 2009, 6:99-103. GISTIC runs using arbitrated peel-off (left panel) or the standard peel-off 6. Bignell GR, Greenman CD, Davies H, Butler AP, Edkins S, Andrews JM, (right panel). The red (left panel) or blue (right panel) lines show the Buck G, Chen L, Beare D, Latimer C, Widaa S, Hinton J, Fahey C, Fu B, fraction of secondary driver peaks identified in independent GISTIC peaks Swamy S, Dalgliesh GL, Teh BT, Deloukas P, Yang F, Campbell PJ, (that is, not containing the primary driver event), as is shown in Figure Futreal PA, Stratton MR: Signatures of mutation and selection in the 4b. The black lines show the fraction of secondary driver peaks identified cancer genome. Nature 2010, 463:893-898. in dependent peaks (that is, a peak containing both the primary and 7. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, secondary driver events), and the green lines show the total recall of Davies H, Teague J, Butler A, Stevens C, Edkins S, O’Meara S, Vastrik I, secondary driver peaks (in any peak). Error-bars representing the mean ± Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, standard error of the mean are drawn, but may be smaller than the Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, point used to represent the mean and hence not be visible. Hinton J, Jenkinson A, Jones D, et al: Patterns of somatic mutation in Additional file 10: Supplementary Figure S6: comparison of human cancer genomes. Nature 2007, 446:153-158. RegBounder to theoretically optimal peaks. Comparison between the 8. Merlo LM, Pepper JW, Reid BJ, Maley CC: Cancer as an evolutionary and peak region sizes obtained by RegBounder (green line) with the ecological process. Nat Rev Cancer 2006, 6:924-935. theoretically minimal peak region sizes (black line) that could be 9. Network CGAR: Comprehensive genomic characterization defines human obtained by a similarly confident peak finding algorithm (Supplementary glioblastoma genes and core pathways. Nature 2008, 455:1061-1068. Methods in Additional file 1) at 50% (left) and 95% (right) confidence. 10. McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Error-bars representing the median ± standard error of the mean are Mastrogianakis GM, Olson JJ, Mikkelsen T, Lehman N, Aldape K, Yung WK, drawn, but may be smaller than the points used to represent the Bogler O, Weinstein JN, VandenBerg S, Berger M, Prados M, Muzny D, median and hence not be visible. Morgan M, Scherer S, Sabo A, Nazareth L, Lewis L, Hall O, Zhu Y, Ren Y, Alvi O, Yao J, Hawes A, Jhangiani S, Fowler G, et al: Comprehensive Mermel et al. Genome Biology 2011, 12:R41 Page 13 of 14 http://genomebiology.com/2011/12/4/R41 genomic characterization defines human glioblastoma genes and core 22. Bass AJ, Watanabe H, Mermel CH, Yu S, Perner S, Verhaak RG, Kim SY, pathways. Nature 2008, 455:1061-1068. Wardwell L, Tamayo P, Gat-Viks I, Ramos AH, Woo MS, Weir BA, Getz G, 11. Pleasance E, Cheetham R, Stephens P, McBride D, Humphray S, Beroukhim R, O’Kelly M, Dutt A, Rozenblatt-Rosen O, Dziunycz P, Greenman C, Varela I, Lin M, Ordóñez G, Bignell G: A comprehensive Komisarof J, Chirieac LR, Lafargue CJ, Scheble V, Wilbertz T, Ma C, Rao S, catalogue of somatic mutations from a human cancer genome. Nature Nakagawa H, Stairs DB, Lin L, Giordano TJ, et al: SOX2 is an amplified 2009, 463:191-196. lineage-survival oncogene in lung and esophageal squamous cell 12. Pleasance ED, Stephens PJ, O’Meara S, McBride DJ, Meynert A, Jones D, carcinomas. Nat Genet 2009, 41:1238-1242. Lin ML, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, 23. Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ, Weber BL, Ordonez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Maris JM, Grant GR: STAC: A method for testing the significance of DNA Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, copy number aberrations across multiple array-CGH experiments. McLaughlin SF, Peckham HE, Tsung EF, et al: A small-cell lung cancer Genome Res 2006, 16:1149-1158. genome with complex signatures of tobacco exposure. Nature 2010, 24. Guttman M, Mies C, Dudycz-Sulicz K, Diskin SJ, Baldwin DA, Stoeckert CJ, 463:184-190. Grant GR: Assessing the significance of conserved genomic aberrations 13. Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Simpson JT, using high resolution genomic microarrays. PLoS Genet 2007, 3:e143. Stebbings LA, Leroy C, Edkins S, Mudie LJ, Greenman CD, Jia M, Latimer C, 25. Taylor BS, Barretina J, Socci ND, Decarolis P, Ladanyi M, Meyerson M, Teague JW, Lau KW, Burton J, Quail MA, Swerdlow H, Churcher C, Singer S, Sander C, Gibson G: Functional copy-number alterations in Natrajan R, Sieuwerts AM, Martens JW, Silver DP, Langerod A, Russnes HE, cancer. PLoS ONE 2008, 3:e3179. Foekens JA, Reis-Filho JS, van ‘t Veer L, Richardson AL, Borresen-Dale AL, 26. Shah SP: Computational methods for identification of recurrent copy et al: Complex landscapes of somatic rearrangement in human breast number alteration patterns by array CGH. Cytogenet Genome Res 2008, cancer genomes. Nature 2009, 462:1005-1010. 123:343-351. 14. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, 27. Leach NT, Rehder C, Jensen K, Holt S, Jackson-Cook C: Human Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, chromosomes with shorter telomeres and large heterochromatin Markowitz SD, Willis J, Dawson D, Willson JK, Gazdar AF, Hartigan J, Wu L, regions have a higher frequency of acquired somatic cell aneuploidy. Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Mech Ageing Dev 2004, 125:563-573. Kinzler KW, Velculescu VE: The consensus coding sequences of human 28. Li C, Hung Wong W: Model-based analysis of oligonucleotide arrays: breast and colorectal cancers. Science 2006, 314:268-274. model validation, design issues and standard error application. Genome 15. Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, Linhart D, Biol 2001, 2:RESEARCH0032. Vivanco I, Lee JC, Huang JH, Alexander S, Du J, Kau T, Thomas RK, Shah K, 29. Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Soto H, Perner S, Prensner J, Debiasi RM, Demichelis F, Hatton C, Rubin MA, expression index computation and outlier detection. Proc Natl Acad Sci Garraway LA, Nelson SF, Liau L, Mischel PS, Cloughesy TF, Meyerson M, USA 2001, 98:31-36. Golub TA, Lander ES, Mellinghoff IK, et al: Assessing the significance of 30. Bolstad BM, Collin F, Simpson KM, Irizarry RA, Speed TP: Experimental chromosomal aberrations in cancer: methodology and application to design and low-level analysis of microarray data. Int Rev Neurobiol 2004, glioma. Proc Natl Acad Sci USA 2007, 104:20007-20012. 60:25-58. 16. Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, 31. Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Province MA, Kraja A, Johnson LA, Shah K, Sato M, Thomas RK, Barletta JA, Ally A, Cao M, Birch P, Brown-John M, Fernandes N, Go A, Kennedy G, Borecki IB, Broderick S, Chang AC, Chiang DY, Chirieac LR, Cho J, Fujii Y, Langlois S, Eydoux P, Friedman JM, Marra MA: Assessment of algorithms Gazdar AF, Giordano T, Greulich H, Hanna M, Johnson BE, Kris MG, Lash A, for high throughput detection of genomic copy number variation in Lin L, Lindeman N, et al: Characterizing the cancer genome in lung oligonucleotide microarray data. BMC Bioinformatics 2007, 8:368. adenocarcinoma. Nature 2007, 450:893-898. 32. Hupé P, Stransky N, Thiery J-P, Radvanyi F, Barillot E: Analysis of array CGH 17. Lin WM, Baker AC, Beroukhim R, Winckler W, Feng W, Marmion JM, Laine E, data: from signal ratio to gain and loss of DNA regions. Bioinformatics Greulich H, Tseng H, Gates C, Hodi FS, Dranoff G, Sellers WR, Thomas RK, 2004, 20:3413-3422. Meyerson M, Golub TR, Dummer R, Herlyn M, Getz G, Garraway LA: 33. Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary Modeling genomic diversity and tumor dependency in malignant segmentation for the analysis of array-based DNA copy number data. melanoma. Cancer Res 2008, 68:664-673. Biostatistics 2004, 5:557-572. 18. Firestein R, Bass AJ, Kim SY, Dunn IF, Silver SJ, Guney I, Freed E, Ligon AH, 34. Venkatraman ES, Olshen AB: A faster circular binary segmentation Vena N, Ogino S, Chheda MG, Tamayo P, Finn S, Shrestha Y, Boehm JS, algorithm for the analysis of array CGH data. Bioinformatics 2007, Jain S, Bojarski E, Mermel C, Barretina J, Chan JA, Baselga J, Tabernero J, 23:657-663. Root DE, Fuchs CS, Loda M, Shivdasani RA, Meyerson M, Hahn WC: CDK8 is 35. Nilsson B, Johansson M, Al-Shahrour F, Carpenter AE, Ebert BL: Ultrasome: a colorectal cancer oncogene that regulates beta-catenin activity. Nature efficient aberration caller for copy number studies of ultra-high 2008, 455:547-551. resolution. Bioinformatics 2009, 25:1078-1079. 19. Chiang DY, Villanueva A, Hoshida Y, Peix J, Newell P, Minguez B, 36. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical LeBlanc AC, Donovan DJ, Thung SN, Sole M, Tovar V, Alsinet C, Ramos AH, and powerful approach to multiple testing. J R Stat Soc B (Methodological) Barretina J, Roayaie S, Schwartz M, Waxman S, Bruix J, Mazzaferro V, 1995, 57:289-300. Ligon AH, Najfeld V, Friedman SL, Sellers WR, Meyerson M, Llovet JM: Focal 37. Sanchez-Garcia F, Akavia UD, Mozes E, Pe’er D: JISTIC: identification of gains of VEGFA and molecular classification of hepatocellular carcinoma. significant targets in cancer. BMC Bioinformatics 2010, 11:189. Cancer Res 2008, 68:6779-6788. 38. GISTIC 2 Manuscript and Software Download Page. [http://www. 20. Etemadmoghadam D, deFazio A, Beroukhim R, Mermel C, George J, Getz G, broadinstitute.org/cancer/pub/GISTIC2]. Tothill R, Okamoto A, Raeder MB, Harnett P, Lade S, Akslen LA, Tinker AV, 39. GenePattern. [http://www.broadinstitute.org/cancer/software/genepattern/]. Locandro B, Alsop K, Chiew YE, Traficante N, Fereday S, Johnson D, Fox S, 40. Stephens PJ, Greenman CD, Fu B, Yang F, Bignell GR, Mudie LJ, Pleasance ED, Lau KW, Beare D, Stebbings LA, McLaren S, Lin ML, Sellers W, Urashima M, Salvesen HB, Meyerson M, Bowtell D, Bowtell D, Chenevix-Trench G, Green A, Webb P, deFazio A, et al: Integrated genome- McBride DJ, Varela I, Nik-Zainal S, Leroy C, Jia M, Menzies A, Butler AP, wide DNA copy number and expression analysis identifies distinct Teague JW, Quail MA, Burton J, Swerdlow H, Carter NP, Morsberger LA, mechanisms of primary chemoresistance in ovarian carcinomas. Clin Iacobuzio-Donahue C, Follows GA, Green AR, Flanagan AM, Stratton MR, Cancer Res 2009, 15:1417-1427. et al: Massive genomic rearrangement acquired in a single catastrophic 21. Northcott PA, Nakahara Y, Wu X, Feuk L, Ellison DW, Croul S, Mack S, event during cancer development. Cell 2011, 144:27-40. Kongkham PN, Peacock J, Dubuc A, Ra Y-S, Zilberberg K, McLeod J, 41. Dahlback HS, Brandal P, Meling TR, Gorunova L, Scheie D, Heim S: Genomic Scherer SW, Sunil Rao J, Eberhart CG, Grajkowska W, Gillespie Y, Lach B, aberrations in 80 cases of primary glioblastoma multiforme: Grundy R, Pollack IF, Hamilton RL, Van Meter T, Carlotti CG, Boop F, Pathogenetic heterogeneity and putative cytogenetic pathways. Genes Bigner D, Gilbertson RJ, Rutka JT, Taylor MD: Multiple recurrent genetic Chromosomes Cancer 2009, 48:908-924. events converge on control of histone lysine methylation in 42. Metzker M: Sequencing technologies - the next generation. Nat Rev Genet medulloblastoma. Nat Genet 2009, 41:465-472. 2009, 11:31-46. Mermel et al. Genome Biology 2011, 12:R41 Page 14 of 14 http://genomebiology.com/2011/12/4/R41 43. The Cancer Genome Atlas Data Portal, GBM Publication. [http://tcga-data. nci.nih.gov/docs/publications/gbm_2008/]. 44. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A, Elliott AL, Parkin M, Hubbell E, Webster T, Mei R, Veitch J, Collins PJ, Handsaker R, Lincoln S, Nizzari M, Blume J, Jones KW, Rava R, Daly MJ, Gabriel SB, Altshuler D: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 2008, 40:1166-1174. 45. Schwarz G: Estimating the dimension of a model. Ann Statist 1978, 6:461-464. 46. Holland AJ, Cleveland DW: Boveri revisited: chromosomal instability, aneuploidy and tumorigenesis. Nat Rev Mol Cell Biol 2009, 10:478-487. doi:10.1186/gb-2011-12-4-r41 Cite this article as: Mermel et al.: GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biology 2011 12:R41. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color ﬁgure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genome Biology Springer Journals http://www.deepdyve.com/lp/springer-journals/gistic2-0-facilitates-sensitive-and-confident-localization-of-the-pysEB56tdl

Loading next page...

References (93)

T. Santarius, J. Shipley, D. Brewer, M. Stratton, C. Cooper (2010)
A census of amplified and overexpressed human cancer genes
Nature Reviews Cancer, 10
(2011)
Genome Biology
(ChiangDGetzGJaffeDO'KellyMZhaoXHigh-resolution mapping of copy-number alterations with massively parallel sequencing.Nat Methods200969910310.1038/nmeth.127619043412)
ChiangDGetzGJaffeDO'KellyMZhaoXHigh-resolution mapping of copy-number alterations with massively parallel sequencing.Nat Methods200969910310.1038/nmeth.127619043412
ChiangDGetzGJaffeDO'KellyMZhaoXHigh-resolution mapping of copy-number alterations with massively parallel sequencing.Nat Methods200969910310.1038/nmeth.127619043412, ChiangDGetzGJaffeDO'KellyMZhaoXHigh-resolution mapping of copy-number alterations with massively parallel sequencing.Nat Methods200969910310.1038/nmeth.127619043412
(VenkatramanESOlshenABA faster circular binary segmentation algorithm for the analysis of array CGH data.Bioinformatics20072365766310.1093/bioinformatics/btl64617234643)
VenkatramanESOlshenABA faster circular binary segmentation algorithm for the analysis of array CGH data.Bioinformatics20072365766310.1093/bioinformatics/btl64617234643
VenkatramanESOlshenABA faster circular binary segmentation algorithm for the analysis of array CGH data.Bioinformatics20072365766310.1093/bioinformatics/btl64617234643, VenkatramanESOlshenABA faster circular binary segmentation algorithm for the analysis of array CGH data.Bioinformatics20072365766310.1093/bioinformatics/btl64617234643
(HollandAJClevelandDWBoveri revisited: chromosomal instability, aneuploidy and tumorigenesis.Nat Rev Mol Cell Biol20091047848719546858)
HollandAJClevelandDWBoveri revisited: chromosomal instability, aneuploidy and tumorigenesis.Nat Rev Mol Cell Biol20091047848719546858
HollandAJClevelandDWBoveri revisited: chromosomal instability, aneuploidy and tumorigenesis.Nat Rev Mol Cell Biol20091047848719546858, HollandAJClevelandDWBoveri revisited: chromosomal instability, aneuploidy and tumorigenesis.Nat Rev Mol Cell Biol20091047848719546858
A. Bass, H. Watanabe, C. Mermel, Soyoung Yu, S. Perner, R. Verhaak, S. Kim, Leslie Wardwell, P. Tamayo, I. Gat-Viks, A. Ramos, M. Woo, B. Weir, G. Getz, R. Beroukhim, Michael O’Kelly, A. Dutt, O. Rozenblatt-Rosen, P. Dziunycz, Justin Komisarof, L. Chirieac, C. LaFargue, V. Scheble, Theresia Wilbertz, Changqing Ma, Shilpa Rao, H. Nakagawa, D. Stairs, Lin Lin, T. Giordano, Patrick Wagner, J. Minna, A. Gazdar, Chang-Qi Zhu, M. Brose, I. Cecconello, U. Ribeiro, S. Marie, O. Dahl, R. Shivdasani, M. Tsao, M. Rubin, Kwok-kin Wong, A. Regev, W. Hahn, D. Beer, A. Rustgi, M. Meyerson (2009)
SOX2 Is an Amplified Lineage Survival Oncogene in Lung and Esophageal Squamous Cell Carcinomas
Nature genetics, 41
(SjoblomTJonesSWoodLDParsonsDWLinJBarberTDMandelkerDLearyRJPtakJSillimanNSzaboSBuckhaultsPFarrellCMeehPMarkowitzSDWillisJDawsonDWillsonJKGazdarAFHartiganJWuLLiuCParmigianiGParkBHBachmanKEPapadopoulosNVogelsteinBKinzlerKWVelculescuVEThe consensus coding sequences of human breast and colorectal cancers.Science200631426827410.1126/science.113342716959974)
SjoblomTJonesSWoodLDParsonsDWLinJBarberTDMandelkerDLearyRJPtakJSillimanNSzaboSBuckhaultsPFarrellCMeehPMarkowitzSDWillisJDawsonDWillsonJKGazdarAFHartiganJWuLLiuCParmigianiGParkBHBachmanKEPapadopoulosNVogelsteinBKinzlerKWVelculescuVEThe consensus coding sequences of human breast and colorectal cancers.Science200631426827410.1126/science.113342716959974
SjoblomTJonesSWoodLDParsonsDWLinJBarberTDMandelkerDLearyRJPtakJSillimanNSzaboSBuckhaultsPFarrellCMeehPMarkowitzSDWillisJDawsonDWillsonJKGazdarAFHartiganJWuLLiuCParmigianiGParkBHBachmanKEPapadopoulosNVogelsteinBKinzlerKWVelculescuVEThe consensus coding sequences of human breast and colorectal cancers.Science200631426827410.1126/science.113342716959974, SjoblomTJonesSWoodLDParsonsDWLinJBarberTDMandelkerDLearyRJPtakJSillimanNSzaboSBuckhaultsPFarrellCMeehPMarkowitzSDWillisJDawsonDWillsonJKGazdarAFHartiganJWuLLiuCParmigianiGParkBHBachmanKEPapadopoulosNVogelsteinBKinzlerKWVelculescuVEThe consensus coding sequences of human breast and colorectal cancers.Science200631426827410.1126/science.113342716959974
R. McLendon, A. Friedman, D. Bigner, Erwin Meir, D. Brat, Gena Mastrogianakis, J. Olson, T. Mikkelsen, N. Lehman, K. Aldape, W. Yung, O. Bogler, J. Weinstein, S. Vandenberg, M. Berger, M. Prados, D. Muzny, M. Morgan, S. Scherer, A. Sabo, L. Nazareth, L. Lewis, O. Hall, Yiming Zhu, Yanru Ren, Omar Alvi, Jiqiang Yao, A. Hawes, S. Jhangiani, G. Fowler, A. Lucas, C. Kovar, Andrew Cree, H. Dinh, J. Santibanez, Vandita Joshi, M. Gonzalez-Garay, Christopher Miller, A. Milosavljevic, L. Donehower, D. Wheeler, R. Gibbs, K. Cibulskis, C. Sougnez, T. Fennell, Scott Mahan, Jane Wilkinson, L. Ziaugra, R. Onofrio, Toby Bloom, R. Nicol, K. Ardlie, J. Baldwin, S. Gabriel, E. Lander, L. Ding, R. Fulton, M. McLellan, J. Wallis, D. Larson, Xiaoqi Shi, R. Abbott, L. Fulton, Ken Chen, D. Koboldt, M. Wendl, R. Meyer, Yuzhu Tang, Ling Lin, John Osborne, Brian Dunford-Shore, T. Miner, K. Delehaunty, C. Markovic, Gary Swift, W. Courtney, C. Pohl, S. Abbott, Amy Hawkins, Shin Leong, C. Haipek, Heather Schmidt, M. Wiechert, T. Vickery, S. Scott, D. Dooling, A. Chinwalla, G. Weinstock, E. Mardis, R. Wilson, G. Getz, W. Winckler, R. Verhaak, M. Lawrence, Michael O’Kelly, James Robinson, Gabriele Alexe, R. Beroukhim, S. Carter, Derek Chiang, Josh Gould, Supriya Gupta, Joshua Korn, C. Mermel, J. Mesirov, S. Monti, Huy Nguyen, Melissa Parkin, Michael Reich, Nicolas Stransky, B. Weir, L. Garraway, T. Golub, M. Meyerson, L. Chin, A. Protopopov, Jianhua Zhang, I. Perna, S. Aronson, N. Sathiamoorthy, Georgi Ren, Jun Yao, W. Wiedemeyer, Hyun Kim, Won Sek, Yonghong Xiao, I. Kohane, J. Seidman, P. Park, R. Kucherlapati, P. Laird, L. Cope, J. Herman, D. Weisenberger, F. Pan, D. Berg, L. Neste, Mingyu Joo, Kornel Schuebel, S. Baylin, D. Absher, Jun Li, Audrey Southwick, Shannon Brady, A. Aggarwal, Tisha Chung, G. Sherlock, J. Brooks, R. Myers, P. Spellman, E. Purdom, L. Jakkula, A. Lapuk, H. Marr, S. Dorton, Gi Yoon, Ju Han, A. Ray, V. Wang, S. Durinck, M. Robinson, Nicholas Wang, K. Vranizan, V. Peng, E. Name, G. Fontenay, J. Ngai, J. Conboy, B. Parvin, H. Feiler, T. Speed, J. Gray, C. Brennan, N. Socci, A. Olshen, B. Taylor, A. Lash, N. Schultz, B. Reva, Yevgeniy Antipin, Alexey Stukalov, Benjamin Gross, E. Cerami, Qingqing Wei, L. Qin, V. Seshan, Liliana Villafania, Magali Cavatore, L. Borsu, A. Viale, W. Gerald, C. Sander, M. Ladanyi, C. Perou, D. Hayes, M. Topal, K. Hoadley, Yuan Qi, S. Balu, Yan Shi, Junyuan Wu, R. Penny, M. Bittner, T. Shelton, E. Lenkiewicz, S. Morris, D. Beasley, Sheri Sanders, A. Kahn, R. Sfeir, Jessica Chen, D. Nassau, Larry Feng, E. Hickey, A. Barker, D. Gerhard, J. Vockley, C. Compton, J. Vaught, P. Fielding, M. Ferguson, C. Schaefer, Jinghui Zhang, Subha Madhavan, K. Buetow, F. Collins, P. Good, M. Guyer, B. Ozenberger, Jane Peterson, E. Thomson (2008)
Comprehensive genomic characterization defines human glioblastoma genes and core pathways
Nature, 455
(LinWMBakerACBeroukhimRWincklerWFengWMarmionJMLaineEGreulichHTsengHGatesCHodiFSDranoffGSellersWRThomasRKMeyersonMGolubTRDummerRHerlynMGetzGGarrawayLAModeling genomic diversity and tumor dependency in malignant melanoma.Cancer Res20086866467310.1158/0008-5472.CAN-07-261518245465)
LinWMBakerACBeroukhimRWincklerWFengWMarmionJMLaineEGreulichHTsengHGatesCHodiFSDranoffGSellersWRThomasRKMeyersonMGolubTRDummerRHerlynMGetzGGarrawayLAModeling genomic diversity and tumor dependency in malignant melanoma.Cancer Res20086866467310.1158/0008-5472.CAN-07-261518245465
LinWMBakerACBeroukhimRWincklerWFengWMarmionJMLaineEGreulichHTsengHGatesCHodiFSDranoffGSellersWRThomasRKMeyersonMGolubTRDummerRHerlynMGetzGGarrawayLAModeling genomic diversity and tumor dependency in malignant melanoma.Cancer Res20086866467310.1158/0008-5472.CAN-07-261518245465, LinWMBakerACBeroukhimRWincklerWFengWMarmionJMLaineEGreulichHTsengHGatesCHodiFSDranoffGSellersWRThomasRKMeyersonMGolubTRDummerRHerlynMGetzGGarrawayLAModeling genomic diversity and tumor dependency in malignant melanoma.Cancer Res20086866467310.1158/0008-5472.CAN-07-261518245465
S. Diskin, T. Eck, J. Greshock, Y. Mossé, Tara Naylor, C. Stoeckert, B. Weber, J. Maris, G. Grant (2006)
STAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments.
Genome research, 16 9
(LiCHung WongWModel-based analysis of oligonucleotide arrays: model validation, design issues and standard error application.Genome Biol20012RESEARCH003211532216)
LiCHung WongWModel-based analysis of oligonucleotide arrays: model validation, design issues and standard error application.Genome Biol20012RESEARCH003211532216
LiCHung WongWModel-based analysis of oligonucleotide arrays: model validation, design issues and standard error application.Genome Biol20012RESEARCH003211532216, LiCHung WongWModel-based analysis of oligonucleotide arrays: model validation, design issues and standard error application.Genome Biol20012RESEARCH003211532216
M. Metzker (2010)
Sequencing technologies — the next generation
Nature Reviews Genetics, 11
(GenePatternhttp://www.broadinstitute.org/cancer/software/genepattern/)
GenePatternhttp://www.broadinstitute.org/cancer/software/genepattern/
GenePatternhttp://www.broadinstitute.org/cancer/software/genepattern/, GenePatternhttp://www.broadinstitute.org/cancer/software/genepattern/
C. Greenman, P. Stephens, Raffaella Smith, G. Dalgliesh, C. Hunter, G. Bignell, H. Davies, J. Teague, A. Butler, C. Stevens, S. Edkins, S. O'meara, Imre Vastrik, Esther Schmidt, T. Avis, S. Barthorpe, G. Bhamra, G. Buck, Bhudipa Choudhury, J. Clements, J. Cole, E. Dicks, S. Forbes, K. Gray, Kelly Halliday, R. Harrison, K. Hills, Jonathon Hinton, A. Jenkinson, David Jones, A. Menzies, T. Mironenko, J. Perry, K. Raine, David Richardson, Rebecca Shepherd, Alexandra Small, Calli Tofts, J. Varian, T. Webb, S. West, S. Widaa, A. Yates, Daniel Cahill, David Louis, P. Goldstraw, Andrew Nicholson, F. Brasseur, L. Looijenga, Barbara Weber, Y. Chiew, A. deFazio, Mel Greaves, Anthony Green, P. Campbell, E. Birney, D. Easton, G. Chenevix-Trench, M. Tan, S. Khoo, Bin Teh, Siu Yuen, Suet Leung, R. Wooster, P. Futreal, Michael Stratton (2007)
Patterns of somatic mutation in human cancer genomes
Nature, 446
M. Guttman, C. Mies, Katarzyna Dudycz-Sulicz, S. Diskin, D. Baldwin, C. Stoeckert, G. Grant (2007)
Assessing the Significance of Conserved Genomic Aberrations Using High Resolution Genomic Microarrays
PLoS Genetics, 3
(WeirBAWooMSGetzGPernerSDingLBeroukhimRLinWMProvinceMAKrajaAJohnsonLAShahKSatoMThomasRKBarlettaJABoreckiIBBroderickSChangACChiangDYChirieacLRChoJFujiiYGazdarAFGiordanoTGreulichHHannaMJohnsonBEKrisMGLashALinLLindemanNCharacterizing the cancer genome in lung adenocarcinoma.Nature200745089389810.1038/nature0635817982442)
WeirBAWooMSGetzGPernerSDingLBeroukhimRLinWMProvinceMAKrajaAJohnsonLAShahKSatoMThomasRKBarlettaJABoreckiIBBroderickSChangACChiangDYChirieacLRChoJFujiiYGazdarAFGiordanoTGreulichHHannaMJohnsonBEKrisMGLashALinLLindemanNCharacterizing the cancer genome in lung adenocarcinoma.Nature200745089389810.1038/nature0635817982442
WeirBAWooMSGetzGPernerSDingLBeroukhimRLinWMProvinceMAKrajaAJohnsonLAShahKSatoMThomasRKBarlettaJABoreckiIBBroderickSChangACChiangDYChirieacLRChoJFujiiYGazdarAFGiordanoTGreulichHHannaMJohnsonBEKrisMGLashALinLLindemanNCharacterizing the cancer genome in lung adenocarcinoma.Nature200745089389810.1038/nature0635817982442, WeirBAWooMSGetzGPernerSDingLBeroukhimRLinWMProvinceMAKrajaAJohnsonLAShahKSatoMThomasRKBarlettaJABoreckiIBBroderickSChangACChiangDYChirieacLRChoJFujiiYGazdarAFGiordanoTGreulichHHannaMJohnsonBEKrisMGLashALinLLindemanNCharacterizing the cancer genome in lung adenocarcinoma.Nature200745089389810.1038/nature0635817982442
(BenjaminiYHochbergYControlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc B (Methodological)199557289300)
BenjaminiYHochbergYControlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc B (Methodological)199557289300
BenjaminiYHochbergYControlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc B (Methodological)199557289300, BenjaminiYHochbergYControlling the false discovery rate: a practical and powerful approach to multiple testing.J R Stat Soc B (Methodological)199557289300
(StrattonMRCampbellPJFutrealPAThe cancer genome.Nature200945871972410.1038/nature0794319360079)
StrattonMRCampbellPJFutrealPAThe cancer genome.Nature200945871972410.1038/nature0794319360079
StrattonMRCampbellPJFutrealPAThe cancer genome.Nature200945871972410.1038/nature0794319360079, StrattonMRCampbellPJFutrealPAThe cancer genome.Nature200945871972410.1038/nature0794319360079
Cheng Li, W. Wong (2001)
Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection.
Proceedings of the National Academy of Sciences of the United States of America, 98 1
A. Holland, D. Cleveland (2009)
Boveri revisited: chromosomal instability, aneuploidy and tumorigenesis
Nature Reviews Molecular Cell Biology, 10
Derek Chiang, G. Getz, D. Jaffe, Michael O’Kelly, Xiaojun Zhao, S. Carter, C. Russ, C. Nusbaum, M. Meyerson, E. Lander (2009)
High-resolution mapping of copy-number alterations with massively parallel sequencing
Nature Methods, 6
B. Bolstad, Francois Collin, K. Simpson, R. Irizarry, T. Speed (2004)
Experimental design and low-level analysis of microarray data.
International review of neurobiology, 60
(PleasanceECheethamRStephensPMcBrideDHumphraySGreenmanCVarelaILinMOrdóñezGBignellGA comprehensive catalogue of somatic mutations from a human cancer genome.Nature200946319119620016485)
PleasanceECheethamRStephensPMcBrideDHumphraySGreenmanCVarelaILinMOrdóñezGBignellGA comprehensive catalogue of somatic mutations from a human cancer genome.Nature200946319119620016485
PleasanceECheethamRStephensPMcBrideDHumphraySGreenmanCVarelaILinMOrdóñezGBignellGA comprehensive catalogue of somatic mutations from a human cancer genome.Nature200946319119620016485, PleasanceECheethamRStephensPMcBrideDHumphraySGreenmanCVarelaILinMOrdóñezGBignellGA comprehensive catalogue of somatic mutations from a human cancer genome.Nature200946319119620016485
E. Pleasance, R. Cheetham, P. Stephens, D. Mcbride, S. Humphray, C. Greenman, I. Varela, Meng‐Lay Lin, G. Ordóñez, G. Bignell, K. Ye, J. Alipaz, Markus Bauer, D. Beare, A. Butler, Richard Carter, Lina Chen, A. Cox, S. Edkins, P. Kokko-Gonzales, N. Gormley, R. Grocock, C. Haudenschild, Matthew Hims, Terena James, Mingming Jia, Z. Kingsbury, Catherine Leroy, J. Marshall, A. Menzies, L. Mudie, Z. Ning, Tom Royce, Ole Schulz-Trieglaff, Anastassia Spiridou, L. Stebbings, L. Szajkowski, J. Teague, David Williamson, L. Chin, M. Ross, Peter Campbell, D. Bentley, P. Futreal, Michael Stratton (2010)
A comprehensive catalogue of somatic mutations from a human cancer genome
Nature, 463
R. Firestein, A. Bass, S. Kim, I. Dunn, S. Silver, Isil Guney, E. Freed, A. Ligon, Natalie Vena, S. Ogino, M. Chheda, P. Tamayo, S. Finn, Y. Shrestha, J. Boehm, Supriya Jain, Emeric Bojarski, C. Mermel, J. Barretina, J. Chan, J. Baselga, J. Tabernero, D. Root, C. Fuchs, M. Loda, R. Shivdasani, M. Meyerson, W. Hahn (2008)
CDK8 is a colorectal cancer oncogene that regulates β-catenin activity
Nature, 455
J. Rockett (2001)
Arrays of DNA-binding sites
Genome Biology, 2
(LiCWongWHModel-based analysis of oligonucleotide arrays: expression index computation and outlier detection.Proc Natl Acad Sci USA200198313610.1073/pnas.01140409811134512)
LiCWongWHModel-based analysis of oligonucleotide arrays: expression index computation and outlier detection.Proc Natl Acad Sci USA200198313610.1073/pnas.01140409811134512
LiCWongWHModel-based analysis of oligonucleotide arrays: expression index computation and outlier detection.Proc Natl Acad Sci USA200198313610.1073/pnas.01140409811134512, LiCWongWHModel-based analysis of oligonucleotide arrays: expression index computation and outlier detection.Proc Natl Acad Sci USA200198313610.1073/pnas.01140409811134512
(ShahSPComputational methods for identification of recurrent copy number alteration patterns by array CGH.Cytogenet Genome Res200812334335110.1159/00018472619287173)
ShahSPComputational methods for identification of recurrent copy number alteration patterns by array CGH.Cytogenet Genome Res200812334335110.1159/00018472619287173
ShahSPComputational methods for identification of recurrent copy number alteration patterns by array CGH.Cytogenet Genome Res200812334335110.1159/00018472619287173, ShahSPComputational methods for identification of recurrent copy number alteration patterns by array CGH.Cytogenet Genome Res200812334335110.1159/00018472619287173
(MerloLMPepperJWReidBJMaleyCCCancer as an evolutionary and ecological process.Nat Rev Cancer2006692493510.1038/nrc201317109012)
MerloLMPepperJWReidBJMaleyCCCancer as an evolutionary and ecological process.Nat Rev Cancer2006692493510.1038/nrc201317109012
MerloLMPepperJWReidBJMaleyCCCancer as an evolutionary and ecological process.Nat Rev Cancer2006692493510.1038/nrc201317109012, MerloLMPepperJWReidBJMaleyCCCancer as an evolutionary and ecological process.Nat Rev Cancer2006692493510.1038/nrc201317109012
Felix Sanchez-Garcia, U. Akavia, Eyal Mozes, D. Pe’er (2010)
JISTIC: Identification of Significant Targets in Cancer
BMC Bioinformatics, 11
(StephensPJMcBrideDJLinMLVarelaIPleasanceEDSimpsonJTStebbingsLALeroyCEdkinsSMudieLJGreenmanCDJiaMLatimerCTeagueJWLauKWBurtonJQuailMASwerdlowHChurcherCNatrajanRSieuwertsAMMartensJWSilverDPLangerodARussnesHEFoekensJAReis-FilhoJSvan 't VeerLRichardsonALBorresen-DaleALComplex landscapes of somatic rearrangement in human breast cancer genomes.Nature20094621005101010.1038/nature0864520033038)
StephensPJMcBrideDJLinMLVarelaIPleasanceEDSimpsonJTStebbingsLALeroyCEdkinsSMudieLJGreenmanCDJiaMLatimerCTeagueJWLauKWBurtonJQuailMASwerdlowHChurcherCNatrajanRSieuwertsAMMartensJWSilverDPLangerodARussnesHEFoekensJAReis-FilhoJSvan 't VeerLRichardsonALBorresen-DaleALComplex landscapes of somatic rearrangement in human breast cancer genomes.Nature20094621005101010.1038/nature0864520033038
StephensPJMcBrideDJLinMLVarelaIPleasanceEDSimpsonJTStebbingsLALeroyCEdkinsSMudieLJGreenmanCDJiaMLatimerCTeagueJWLauKWBurtonJQuailMASwerdlowHChurcherCNatrajanRSieuwertsAMMartensJWSilverDPLangerodARussnesHEFoekensJAReis-FilhoJSvan 't VeerLRichardsonALBorresen-DaleALComplex landscapes of somatic rearrangement in human breast cancer genomes.Nature20094621005101010.1038/nature0864520033038, StephensPJMcBrideDJLinMLVarelaIPleasanceEDSimpsonJTStebbingsLALeroyCEdkinsSMudieLJGreenmanCDJiaMLatimerCTeagueJWLauKWBurtonJQuailMASwerdlowHChurcherCNatrajanRSieuwertsAMMartensJWSilverDPLangerodARussnesHEFoekensJAReis-FilhoJSvan 't VeerLRichardsonALBorresen-DaleALComplex landscapes of somatic rearrangement in human breast cancer genomes.Nature20094621005101010.1038/nature0864520033038
(BarossADelaneyADLiHINayarTFlibotteSQianHChanSYAsanoJAllyACaoMBirchPBrown-JohnMFernandesNGoAKennedyGLangloisSEydouxPFriedmanJMMarraMAAssessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data.BMC Bioinformatics2007836810.1186/1471-2105-8-36817910767)
BarossADelaneyADLiHINayarTFlibotteSQianHChanSYAsanoJAllyACaoMBirchPBrown-JohnMFernandesNGoAKennedyGLangloisSEydouxPFriedmanJMMarraMAAssessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data.BMC Bioinformatics2007836810.1186/1471-2105-8-36817910767
BarossADelaneyADLiHINayarTFlibotteSQianHChanSYAsanoJAllyACaoMBirchPBrown-JohnMFernandesNGoAKennedyGLangloisSEydouxPFriedmanJMMarraMAAssessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data.BMC Bioinformatics2007836810.1186/1471-2105-8-36817910767, BarossADelaneyADLiHINayarTFlibotteSQianHChanSYAsanoJAllyACaoMBirchPBrown-JohnMFernandesNGoAKennedyGLangloisSEydouxPFriedmanJMMarraMAAssessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data.BMC Bioinformatics2007836810.1186/1471-2105-8-36817910767
P. Northcott, Y. Nakahara, Xiaochong Wu, L. Feuk, D. Ellison, S. Croul, S. Mack, P. Kongkham, J. Peacock, A. Dubuc, Young-Shin Ra, Karen Zilberberg, Jessica Mcleod, S. Scherer, J. Rao, C. Eberhart, W. Grajkowska, Y. Gillespie, B. Lach, R. Grundy, I. Pollack, R. Hamilton, T. Meter, C. Carlotti, F. Boop, D. Bigner, R. Gilbertson, J. Rutka, Michael Taylor (2009)
Multiple recurrent genetic events converge on control of histone lysine methylation in medulloblastoma
Nature Genetics, 41
(NorthcottPANakaharaYWuXFeukLEllisonDWCroulSMackSKongkhamPNPeacockJDubucARaY-SZilberbergKMcLeodJSchererSWSunil RaoJEberhartCGGrajkowskaWGillespieYLachBGrundyRPollackIFHamiltonRLVan MeterTCarlottiCGBoopFBignerDGilbertsonRJRutkaJTTaylorMDMultiple recurrent genetic events converge on control of histone lysine methylation in medulloblastoma.Nat Genet20094146547210.1038/ng.33619270706)
NorthcottPANakaharaYWuXFeukLEllisonDWCroulSMackSKongkhamPNPeacockJDubucARaY-SZilberbergKMcLeodJSchererSWSunil RaoJEberhartCGGrajkowskaWGillespieYLachBGrundyRPollackIFHamiltonRLVan MeterTCarlottiCGBoopFBignerDGilbertsonRJRutkaJTTaylorMDMultiple recurrent genetic events converge on control of histone lysine methylation in medulloblastoma.Nat Genet20094146547210.1038/ng.33619270706
NorthcottPANakaharaYWuXFeukLEllisonDWCroulSMackSKongkhamPNPeacockJDubucARaY-SZilberbergKMcLeodJSchererSWSunil RaoJEberhartCGGrajkowskaWGillespieYLachBGrundyRPollackIFHamiltonRLVan MeterTCarlottiCGBoopFBignerDGilbertsonRJRutkaJTTaylorMDMultiple recurrent genetic events converge on control of histone lysine methylation in medulloblastoma.Nat Genet20094146547210.1038/ng.33619270706, NorthcottPANakaharaYWuXFeukLEllisonDWCroulSMackSKongkhamPNPeacockJDubucARaY-SZilberbergKMcLeodJSchererSWSunil RaoJEberhartCGGrajkowskaWGillespieYLachBGrundyRPollackIFHamiltonRLVan MeterTCarlottiCGBoopFBignerDGilbertsonRJRutkaJTTaylorMDMultiple recurrent genetic events converge on control of histone lysine methylation in medulloblastoma.Nat Genet20094146547210.1038/ng.33619270706
(StephensPJGreenmanCDFuBYangFBignellGRMudieLJPleasanceEDLauKWBeareDStebbingsLAMcLarenSLinMLMcBrideDJVarelaINik-ZainalSLeroyCJiaMMenziesAButlerAPTeagueJWQuailMABurtonJSwerdlowHCarterNPMorsbergerLAIacobuzio-DonahueCFollowsGAGreenARFlanaganAMStrattonMRMassive genomic rearrangement acquired in a single catastrophic event during cancer development.Cell2011144274010.1016/j.cell.2010.11.05521215367)
StephensPJGreenmanCDFuBYangFBignellGRMudieLJPleasanceEDLauKWBeareDStebbingsLAMcLarenSLinMLMcBrideDJVarelaINik-ZainalSLeroyCJiaMMenziesAButlerAPTeagueJWQuailMABurtonJSwerdlowHCarterNPMorsbergerLAIacobuzio-DonahueCFollowsGAGreenARFlanaganAMStrattonMRMassive genomic rearrangement acquired in a single catastrophic event during cancer development.Cell2011144274010.1016/j.cell.2010.11.05521215367
StephensPJGreenmanCDFuBYangFBignellGRMudieLJPleasanceEDLauKWBeareDStebbingsLAMcLarenSLinMLMcBrideDJVarelaINik-ZainalSLeroyCJiaMMenziesAButlerAPTeagueJWQuailMABurtonJSwerdlowHCarterNPMorsbergerLAIacobuzio-DonahueCFollowsGAGreenARFlanaganAMStrattonMRMassive genomic rearrangement acquired in a single catastrophic event during cancer development.Cell2011144274010.1016/j.cell.2010.11.05521215367, StephensPJGreenmanCDFuBYangFBignellGRMudieLJPleasanceEDLauKWBeareDStebbingsLAMcLarenSLinMLMcBrideDJVarelaINik-ZainalSLeroyCJiaMMenziesAButlerAPTeagueJWQuailMABurtonJSwerdlowHCarterNPMorsbergerLAIacobuzio-DonahueCFollowsGAGreenARFlanaganAMStrattonMRMassive genomic rearrangement acquired in a single catastrophic event during cancer development.Cell2011144274010.1016/j.cell.2010.11.05521215367
(GuttmanMMiesCDudycz-SuliczKDiskinSJBaldwinDAStoeckertCJGrantGRAssessing the significance of conserved genomic aberrations using high resolution genomic microarrays.PLoS Genet20073e14310.1371/journal.pgen.003014317722985)
GuttmanMMiesCDudycz-SuliczKDiskinSJBaldwinDAStoeckertCJGrantGRAssessing the significance of conserved genomic aberrations using high resolution genomic microarrays.PLoS Genet20073e14310.1371/journal.pgen.003014317722985
GuttmanMMiesCDudycz-SuliczKDiskinSJBaldwinDAStoeckertCJGrantGRAssessing the significance of conserved genomic aberrations using high resolution genomic microarrays.PLoS Genet20073e14310.1371/journal.pgen.003014317722985, GuttmanMMiesCDudycz-SuliczKDiskinSJBaldwinDAStoeckertCJGrantGRAssessing the significance of conserved genomic aberrations using high resolution genomic microarrays.PLoS Genet20073e14310.1371/journal.pgen.003014317722985
T. Sjöblom, Siân Jones, L. Wood, D. Parsons, Jimmy Lin, T. Barber, D. Mandelker, R. Leary, J. Ptak, N. Silliman, Steve Szabo, P. Buckhaults, Christopher Farrell, Paul Meeh, S. Markowitz, J. Willis, D. Dawson, J. Willson, A. Gazdar, James Hartigan, Leo Wu, Changsheng Liu, G. Parmigiani, B. Park, K. Bachman, N. Papadopoulos, B. Vogelstein, K. Kinzler, V. Velculescu (2006)
The Consensus Coding Sequences of Human Breast and Colorectal Cancers
Science, 314
S. Shah (2009)
Computational methods for identification of recurrent copy number alteration patterns by array CGH
Cytogenetic and Genome Research, 123
(BeroukhimRMermelCPorterDWeiGRaychaudhuriSDonovanJBarretinaJBoehmJDobsonJUrashimaMThe landscape of somatic copy-number alteration across human cancers.Nature201046389990510.1038/nature0882220164920)
BeroukhimRMermelCPorterDWeiGRaychaudhuriSDonovanJBarretinaJBoehmJDobsonJUrashimaMThe landscape of somatic copy-number alteration across human cancers.Nature201046389990510.1038/nature0882220164920
BeroukhimRMermelCPorterDWeiGRaychaudhuriSDonovanJBarretinaJBoehmJDobsonJUrashimaMThe landscape of somatic copy-number alteration across human cancers.Nature201046389990510.1038/nature0882220164920, BeroukhimRMermelCPorterDWeiGRaychaudhuriSDonovanJBarretinaJBoehmJDobsonJUrashimaMThe landscape of somatic copy-number alteration across human cancers.Nature201046389990510.1038/nature0882220164920
(ChiangDYVillanuevaAHoshidaYPeixJNewellPMinguezBLeBlancACDonovanDJThungSNSoleMTovarVAlsinetCRamosAHBarretinaJRoayaieSSchwartzMWaxmanSBruixJMazzaferroVLigonAHNajfeldVFriedmanSLSellersWRMeyersonMLlovetJMFocal gains of VEGFA and molecular classification of hepatocellular carcinoma.Cancer Res2008686779678810.1158/0008-5472.CAN-08-074218701503)
ChiangDYVillanuevaAHoshidaYPeixJNewellPMinguezBLeBlancACDonovanDJThungSNSoleMTovarVAlsinetCRamosAHBarretinaJRoayaieSSchwartzMWaxmanSBruixJMazzaferroVLigonAHNajfeldVFriedmanSLSellersWRMeyersonMLlovetJMFocal gains of VEGFA and molecular classification of hepatocellular carcinoma.Cancer Res2008686779678810.1158/0008-5472.CAN-08-074218701503
ChiangDYVillanuevaAHoshidaYPeixJNewellPMinguezBLeBlancACDonovanDJThungSNSoleMTovarVAlsinetCRamosAHBarretinaJRoayaieSSchwartzMWaxmanSBruixJMazzaferroVLigonAHNajfeldVFriedmanSLSellersWRMeyersonMLlovetJMFocal gains of VEGFA and molecular classification of hepatocellular carcinoma.Cancer Res2008686779678810.1158/0008-5472.CAN-08-074218701503, ChiangDYVillanuevaAHoshidaYPeixJNewellPMinguezBLeBlancACDonovanDJThungSNSoleMTovarVAlsinetCRamosAHBarretinaJRoayaieSSchwartzMWaxmanSBruixJMazzaferroVLigonAHNajfeldVFriedmanSLSellersWRMeyersonMLlovetJMFocal gains of VEGFA and molecular classification of hepatocellular carcinoma.Cancer Res2008686779678810.1158/0008-5472.CAN-08-074218701503
(DiskinSJEckTGreshockJMosseYPNaylorTStoeckertCJWeberBLMarisJMGrantGRSTAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments.Genome Res2006161149115810.1101/gr.507650616899652)
DiskinSJEckTGreshockJMosseYPNaylorTStoeckertCJWeberBLMarisJMGrantGRSTAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments.Genome Res2006161149115810.1101/gr.507650616899652
DiskinSJEckTGreshockJMosseYPNaylorTStoeckertCJWeberBLMarisJMGrantGRSTAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments.Genome Res2006161149115810.1101/gr.507650616899652, DiskinSJEckTGreshockJMosseYPNaylorTStoeckertCJWeberBLMarisJMGrantGRSTAC: A method for testing the significance of DNA copy number aberrations across multiple array-CGH experiments.Genome Res2006161149115810.1101/gr.507650616899652
(GreenmanCStephensPSmithRDalglieshGLHunterCBignellGDaviesHTeagueJButlerAStevensCEdkinsSO'MearaSVastrikISchmidtEEAvisTBarthorpeSBhamraGBuckGChoudhuryBClementsJColeJDicksEForbesSGrayKHallidayKHarrisonRHillsKHintonJJenkinsonAJonesDPatterns of somatic mutation in human cancer genomes.Nature200744615315810.1038/nature0561017344846)
GreenmanCStephensPSmithRDalglieshGLHunterCBignellGDaviesHTeagueJButlerAStevensCEdkinsSO'MearaSVastrikISchmidtEEAvisTBarthorpeSBhamraGBuckGChoudhuryBClementsJColeJDicksEForbesSGrayKHallidayKHarrisonRHillsKHintonJJenkinsonAJonesDPatterns of somatic mutation in human cancer genomes.Nature200744615315810.1038/nature0561017344846
GreenmanCStephensPSmithRDalglieshGLHunterCBignellGDaviesHTeagueJButlerAStevensCEdkinsSO'MearaSVastrikISchmidtEEAvisTBarthorpeSBhamraGBuckGChoudhuryBClementsJColeJDicksEForbesSGrayKHallidayKHarrisonRHillsKHintonJJenkinsonAJonesDPatterns of somatic mutation in human cancer genomes.Nature200744615315810.1038/nature0561017344846, GreenmanCStephensPSmithRDalglieshGLHunterCBignellGDaviesHTeagueJButlerAStevensCEdkinsSO'MearaSVastrikISchmidtEEAvisTBarthorpeSBhamraGBuckGChoudhuryBClementsJColeJDicksEForbesSGrayKHallidayKHarrisonRHillsKHintonJJenkinsonAJonesDPatterns of somatic mutation in human cancer genomes.Nature200744615315810.1038/nature0561017344846
M. Stratton, P. Campbell, Peter Campbell, P. Futreal (2009)
The cancer genome
Nature, 458
(BassAJWatanabeHMermelCHYuSPernerSVerhaakRGKimSYWardwellLTamayoPGat-ViksIRamosAHWooMSWeirBAGetzGBeroukhimRO'KellyMDuttARozenblatt-RosenODziunyczPKomisarofJChirieacLRLafargueCJSchebleVWilbertzTMaCRaoSNakagawaHStairsDBLinLGiordanoTJSOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas.Nat Genet2009411238124210.1038/ng.46519801978)
BassAJWatanabeHMermelCHYuSPernerSVerhaakRGKimSYWardwellLTamayoPGat-ViksIRamosAHWooMSWeirBAGetzGBeroukhimRO'KellyMDuttARozenblatt-RosenODziunyczPKomisarofJChirieacLRLafargueCJSchebleVWilbertzTMaCRaoSNakagawaHStairsDBLinLGiordanoTJSOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas.Nat Genet2009411238124210.1038/ng.46519801978
BassAJWatanabeHMermelCHYuSPernerSVerhaakRGKimSYWardwellLTamayoPGat-ViksIRamosAHWooMSWeirBAGetzGBeroukhimRO'KellyMDuttARozenblatt-RosenODziunyczPKomisarofJChirieacLRLafargueCJSchebleVWilbertzTMaCRaoSNakagawaHStairsDBLinLGiordanoTJSOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas.Nat Genet2009411238124210.1038/ng.46519801978, BassAJWatanabeHMermelCHYuSPernerSVerhaakRGKimSYWardwellLTamayoPGat-ViksIRamosAHWooMSWeirBAGetzGBeroukhimRO'KellyMDuttARozenblatt-RosenODziunyczPKomisarofJChirieacLRLafargueCJSchebleVWilbertzTMaCRaoSNakagawaHStairsDBLinLGiordanoTJSOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas.Nat Genet2009411238124210.1038/ng.46519801978
D. Etemadmoghadam, A. deFazio, R. Beroukhim, C. Mermel, J. George, G. Getz, R. Tothill, A. Okamoto, M. Raeder, P. Harnett, S. Lade, L. Akslen, A. Tinker, Bianca Locandro, K. Alsop, Y. Chiew, N. Traficante, S. Fereday, Daryl Johnson, S. Fox, W. Sellers, M. Urashima, H. Salvesen, M. Meyerson, D. Bowtell (2009)
Integrated Genome-Wide DNA Copy Number and Expression Analysis Identifies Distinct Mechanisms of Primary Chemoresistance in Ovarian Carcinomas
Clinical Cancer Research, 15
R. Beroukhim, C. Mermel, D. Porter, G. Wei, S. Raychaudhuri, Jerry Donovan, J. Barretina, J. Boehm, Jennifer Dobson, M. Urashima, Kevin Henry, Reid Pinchback, A. Ligon, Yoon-Jae Cho, Leila Haery, H. Greulich, Michael Reich, W. Winckler, M. Lawrence, B. Weir, K. Tanaka, Derek Chiang, A. Bass, Alice Loo, Carter Hoffman, John Prensner, T. Liefeld, Qing Gao, Derek Yecies, S. Signoretti, E. Maher, F. Kaye, H. Sasaki, J. Tepper, J. Fletcher, J. Tabernero, J. Baselga, M. Tsao, F. Demichelis, M. Rubin, P. Janne, M. Daly, C. Nucera, R. Levine, B. Ebert, S. Gabriel, A. Rustgi, C. Antonescu, M. Ladanyi, A. Letai, L. Garraway, M. Loda, D. Beer, L. True, A. Okamoto, S. Pomeroy, S. Singer, T. Golub, E. Lander, G. Getz, W. Sellers, M. Meyerson (2010)
The landscape of somatic copy-number alteration across human cancers
Nature, 463
Tony Gutschner, S. Diederichs (2012)
The hallmarks of cancer
RNA Biology, 9
E. Venkatraman, A. Olshen (2007)
A faster circular binary segmentation algorithm for the analysis of array CGH data
Bioinformatics, 23 6
(MetzkerMSequencing technologies - the next generation.Nat Rev Genet200911314619997069)
MetzkerMSequencing technologies - the next generation.Nat Rev Genet200911314619997069
MetzkerMSequencing technologies - the next generation.Nat Rev Genet200911314619997069, MetzkerMSequencing technologies - the next generation.Nat Rev Genet200911314619997069
(2010)
A small-cell lung cancer genome with complex signatures of tobacco exposure
Lauren Merlo, J. Pepper, B. Reid, C. Maley (2006)
Cancer as an evolutionary and ecological process
Nature Reviews Cancer, 6
A. Olshen, E. Venkatraman, R. Lucito, M. Wigler (2004)
Circular binary segmentation for the analysis of array-based DNA copy number data.
Biostatistics, 5 4
G. Bignell, C. Greenman, H. Davies, Adam Butler, S. Edkins, J. Andrews, G. Buck, Lina Chen, D. Beare, Calli Latimer, S. Widaa, Jonathon Hinton, C. Fahey, B. Fu, Sajani Swamy, G. Dalgliesh, B. Teh, P. Deloukas, Fengtang Yang, Peter Campbell, P. Futreal, Michael Stratton (2010)
Signatures of mutation and selection in the cancer genome
Nature, 463
(BeroukhimRGetzGNghiemphuLBarretinaJHsuehTLinhartDVivancoILeeJCHuangJHAlexanderSDuJKauTThomasRKShahKSotoHPernerSPrensnerJDebiasiRMDemichelisFHattonCRubinMAGarrawayLANelsonSFLiauLMischelPSCloughesyTFMeyersonMGolubTALanderESMellinghoffIKAssessing the significance of chromosomal aberrations in cancer: methodology and application to glioma.Proc Natl Acad Sci USA2007104200072001210.1073/pnas.071005210418077431)
BeroukhimRGetzGNghiemphuLBarretinaJHsuehTLinhartDVivancoILeeJCHuangJHAlexanderSDuJKauTThomasRKShahKSotoHPernerSPrensnerJDebiasiRMDemichelisFHattonCRubinMAGarrawayLANelsonSFLiauLMischelPSCloughesyTFMeyersonMGolubTALanderESMellinghoffIKAssessing the significance of chromosomal aberrations in cancer: methodology and application to glioma.Proc Natl Acad Sci USA2007104200072001210.1073/pnas.071005210418077431
BeroukhimRGetzGNghiemphuLBarretinaJHsuehTLinhartDVivancoILeeJCHuangJHAlexanderSDuJKauTThomasRKShahKSotoHPernerSPrensnerJDebiasiRMDemichelisFHattonCRubinMAGarrawayLANelsonSFLiauLMischelPSCloughesyTFMeyersonMGolubTALanderESMellinghoffIKAssessing the significance of chromosomal aberrations in cancer: methodology and application to glioma.Proc Natl Acad Sci USA2007104200072001210.1073/pnas.071005210418077431, BeroukhimRGetzGNghiemphuLBarretinaJHsuehTLinhartDVivancoILeeJCHuangJHAlexanderSDuJKauTThomasRKShahKSotoHPernerSPrensnerJDebiasiRMDemichelisFHattonCRubinMAGarrawayLANelsonSFLiauLMischelPSCloughesyTFMeyersonMGolubTALanderESMellinghoffIKAssessing the significance of chromosomal aberrations in cancer: methodology and application to glioma.Proc Natl Acad Sci USA2007104200072001210.1073/pnas.071005210418077431
P. Hupé, Nicolas Stransky, J. Thiery, F. Radvanyi, E. Barillot (2004)
Analysis of array CGH data: from signal ratio to gain and loss of DNA regions
Bioinformatics, 20 18
G. Schwarz (1978)
Estimating the Dimension of a Model
Annals of Statistics, 6
R. Beroukhim, G. Getz, L. Nghiemphu, J. Barretina, Teli Hsueh, David Linhart, I. Vivanco, Jeffrey Lee, Julie Huang, Sethu Alexander, Jinyan Du, Tweeny Kau, Roman Thomas, K. Shah, H. Soto, S. Perner, John Prensner, R. Debiasi, F. Demichelis, C. Hatton, M. Rubin, L. Garraway, S. Nelson, L. Liau, P. Mischel, T. Cloughesy, M. Meyerson, Todd Golub, E. Lander, I. Mellinghoff, W. Sellers (2007)
Assessing the significance of chromosomal aberrations in cancer: Methodology and application to glioma
Proceedings of the National Academy of Sciences, 104
B. Weir, M. Woo, G. Getz, S. Perner, L. Ding, R. Beroukhim, William Lin, M. Province, A. Kraja, L. Johnson, K. Shah, Mitsuo Sato, Roman Thomas, J. Barletta, I. Borecki, S. Broderick, A. Chang, Derek Chiang, L. Chirieac, Jeonghee Cho, Y. Fujii, A. Gazdar, T. Giordano, H. Greulich, M. Hanna, B. Johnson, M. Kris, A. Lash, Ling Lin, N. Lindeman, E. Mardis, J. McPherson, J. Minna, M. Morgan, M. Nadel, M. Orringer, John Osborne, B. Ozenberger, A. Ramos, James Robinson, J. Roth, V. Rusch, H. Sasaki, F. Shepherd, C. Sougnez, M. Spitz, M. Tsao, David Twomey, R. Verhaak, G. Weinstock, D. Wheeler, W. Winckler, A. Yoshizawa, Soyoung Yu, M. Zakowski, Qunyuan Zhang, D. Beer, I. Wistuba, M. Watson, L. Garraway, M. Ladanyi, W. Travis, W. Pao, M. Rubin, S. Gabriel, R. Gibbs, H. Varmus, R. Wilson, E. Lander, M. Meyerson (2007)
Characterizing the cancer genome in lung adenocarcinoma
Nature, 450
(GISTIC 2 Manuscript and Software Download Pagehttp://www.broadinstitute.org/cancer/pub/GISTIC2)
GISTIC 2 Manuscript and Software Download Pagehttp://www.broadinstitute.org/cancer/pub/GISTIC2
GISTIC 2 Manuscript and Software Download Pagehttp://www.broadinstitute.org/cancer/pub/GISTIC2, GISTIC 2 Manuscript and Software Download Pagehttp://www.broadinstitute.org/cancer/pub/GISTIC2
(The Cancer Genome Atlas Data Portal, GBM Publicationhttp://tcga-data.nci.nih.gov/docs/publications/gbm_2008/)
The Cancer Genome Atlas Data Portal, GBM Publicationhttp://tcga-data.nci.nih.gov/docs/publications/gbm_2008/
The Cancer Genome Atlas Data Portal, GBM Publicationhttp://tcga-data.nci.nih.gov/docs/publications/gbm_2008/, The Cancer Genome Atlas Data Portal, GBM Publicationhttp://tcga-data.nci.nih.gov/docs/publications/gbm_2008/
(BignellGRGreenmanCDDaviesHButlerAPEdkinsSAndrewsJMBuckGChenLBeareDLatimerCWidaaSHintonJFaheyCFuBSwamySDalglieshGLTehBTDeloukasPYangFCampbellPJFutrealPAStrattonMRSignatures of mutation and selection in the cancer genome.Nature201046389389810.1038/nature0876820164919)
BignellGRGreenmanCDDaviesHButlerAPEdkinsSAndrewsJMBuckGChenLBeareDLatimerCWidaaSHintonJFaheyCFuBSwamySDalglieshGLTehBTDeloukasPYangFCampbellPJFutrealPAStrattonMRSignatures of mutation and selection in the cancer genome.Nature201046389389810.1038/nature0876820164919
BignellGRGreenmanCDDaviesHButlerAPEdkinsSAndrewsJMBuckGChenLBeareDLatimerCWidaaSHintonJFaheyCFuBSwamySDalglieshGLTehBTDeloukasPYangFCampbellPJFutrealPAStrattonMRSignatures of mutation and selection in the cancer genome.Nature201046389389810.1038/nature0876820164919, BignellGRGreenmanCDDaviesHButlerAPEdkinsSAndrewsJMBuckGChenLBeareDLatimerCWidaaSHintonJFaheyCFuBSwamySDalglieshGLTehBTDeloukasPYangFCampbellPJFutrealPAStrattonMRSignatures of mutation and selection in the cancer genome.Nature201046389389810.1038/nature0876820164919
(OlshenABVenkatramanESLucitoRWiglerMCircular binary segmentation for the analysis of array-based DNA copy number data.Biostatistics2004555757210.1093/biostatistics/kxh00815475419)
OlshenABVenkatramanESLucitoRWiglerMCircular binary segmentation for the analysis of array-based DNA copy number data.Biostatistics2004555757210.1093/biostatistics/kxh00815475419
OlshenABVenkatramanESLucitoRWiglerMCircular binary segmentation for the analysis of array-based DNA copy number data.Biostatistics2004555757210.1093/biostatistics/kxh00815475419, OlshenABVenkatramanESLucitoRWiglerMCircular binary segmentation for the analysis of array-based DNA copy number data.Biostatistics2004555757210.1093/biostatistics/kxh00815475419
ED Pleasance (2010)
10.1038/nature08629
Nature, 463
(Sanchez-GarciaFAkaviaUDMozesEPe'erDJISTIC: identification of significant targets in cancer.BMC Bioinformatics20101118910.1186/1471-2105-11-18920398270)
Sanchez-GarciaFAkaviaUDMozesEPe'erDJISTIC: identification of significant targets in cancer.BMC Bioinformatics20101118910.1186/1471-2105-11-18920398270
Sanchez-GarciaFAkaviaUDMozesEPe'erDJISTIC: identification of significant targets in cancer.BMC Bioinformatics20101118910.1186/1471-2105-11-18920398270, Sanchez-GarciaFAkaviaUDMozesEPe'erDJISTIC: identification of significant targets in cancer.BMC Bioinformatics20101118910.1186/1471-2105-11-18920398270
Philip Stephens, D. Mcbride, Meng‐Lay Lin, I. Varela, Erin Pleasance, Jared Simpson, L. Stebbings, Catherine Leroy, S. Edkins, L. Mudie, C. Greenman, Mingming Jia, Calli Latimer, J. Teague, K. Lau, J. Burton, Michael Quail, H. Swerdlow, C. Churcher, R. Natrajan, A. Sieuwerts, J. Martens, Daniel Silver, A. Langerød, H. Russnes, J. Foekens, J. Reis-Filho, L. Veer, Andrea Richardson, A. Børresen-Dale, Peter Campbell, P. Futreal, Michael Stratton (2009)
COMPLEX LANDSCAPES OF SOMATIC REARRANGEMENT IN HUMAN BREAST CANCER GENOMES
Nature, 462
(HanahanDWeinbergRAThe hallmarks of cancer.Cell2000100577010.1016/S0092-8674(00)81683-910647931)
HanahanDWeinbergRAThe hallmarks of cancer.Cell2000100577010.1016/S0092-8674(00)81683-910647931
HanahanDWeinbergRAThe hallmarks of cancer.Cell2000100577010.1016/S0092-8674(00)81683-910647931, HanahanDWeinbergRAThe hallmarks of cancer.Cell2000100577010.1016/S0092-8674(00)81683-910647931
P. Stephens, C. Greenman, B. Fu, Fengtang Yang, G. Bignell, L. Mudie, E. Pleasance, K. Lau, D. Beare, L. Stebbings, Stuart Mclaren, Meng‐Lay Lin, D. Mcbride, I. Varela, S. Nik-Zainal, Catherine Leroy, Mingming Jia, A. Menzies, A. Butler, J. Teague, M. Quail, J. Burton, H. Swerdlow, N. Carter, L. Morsberger, C. Iacobuzio-Donahue, G. Follows, A. Green, A. Flanagan, A. Flanagan, M. Stratton, P. Futreal, P. Campbell, Peter Campbell (2011)
Massive Genomic Rearrangement Acquired in a Single Catastrophic Event during Cancer Development
Cell, 144
H. Dahlback, P. Brandal, T. Meling, L. Gorunova, D. Scheie, S. Heim (2009)
Genomic aberrations in 80 cases of primary glioblastoma multiforme: Pathogenetic heterogeneity and putative cytogenetic pathways
Genes, 48
Derek Chiang, A. Villanueva, Y. Hoshida, J. Peix, P. Newell, B. Mínguez, A. Leblanc, D. Donovan, S. Thung, M. Solé, V. Tovar, Clara Alsinet, A. Ramos, J. Barretina, S. Roayaie, M. Schwartz, S. Waxman, J. Bruix, V. Mazzaferro, A. Ligon, V. Najfeld, S. Friedman, W. Sellers, M. Meyerson, J. Llovet (2008)
Focal gains of VEGFA and molecular classification of hepatocellular carcinoma.
Cancer research, 68 16
(HupéPStranskyNThieryJ-PRadvanyiFBarillotEAnalysis of array CGH data: from signal ratio to gain and loss of DNA regions.Bioinformatics2004203413342210.1093/bioinformatics/bth41815381628)
HupéPStranskyNThieryJ-PRadvanyiFBarillotEAnalysis of array CGH data: from signal ratio to gain and loss of DNA regions.Bioinformatics2004203413342210.1093/bioinformatics/bth41815381628
HupéPStranskyNThieryJ-PRadvanyiFBarillotEAnalysis of array CGH data: from signal ratio to gain and loss of DNA regions.Bioinformatics2004203413342210.1093/bioinformatics/bth41815381628, HupéPStranskyNThieryJ-PRadvanyiFBarillotEAnalysis of array CGH data: from signal ratio to gain and loss of DNA regions.Bioinformatics2004203413342210.1093/bioinformatics/bth41815381628
(TaylorBSBarretinaJSocciNDDecarolisPLadanyiMMeyersonMSingerSSanderCGibsonGFunctional copy-number alterations in cancer.PLoS ONE20083e317910.1371/journal.pone.000317918784837)
TaylorBSBarretinaJSocciNDDecarolisPLadanyiMMeyersonMSingerSSanderCGibsonGFunctional copy-number alterations in cancer.PLoS ONE20083e317910.1371/journal.pone.000317918784837
TaylorBSBarretinaJSocciNDDecarolisPLadanyiMMeyersonMSingerSSanderCGibsonGFunctional copy-number alterations in cancer.PLoS ONE20083e317910.1371/journal.pone.000317918784837, TaylorBSBarretinaJSocciNDDecarolisPLadanyiMMeyersonMSingerSSanderCGibsonGFunctional copy-number alterations in cancer.PLoS ONE20083e317910.1371/journal.pone.000317918784837
B. Taylor, J. Barretina, N. Socci, Penelope Decarolis, M. Ladanyi, M. Meyerson, S. Singer, C. Sander (2008)
Functional Copy-Number Alterations in Cancer
PLoS ONE, 3
B. Nilsson, Mikael Johansson, F. Al-Shahrour, Anne Carpenter, B. Ebert (2009)
Ultrasome: efficient aberration caller for copy number studies of ultra-high resolution
Bioinformatics, 25 8
(PleasanceEDStephensPJO'MearaSMcBrideDJMeynertAJonesDLinMLBeareDLauKWGreenmanCVarelaINik-ZainalSDaviesHROrdonezGRMudieLJLatimerCEdkinsSStebbingsLChenLJiaMLeroyCMarshallJMenziesAButlerATeagueJWMangionJSunYAMcLaughlinSFPeckhamHETsungEFA small-cell lung cancer genome with complex signatures of tobacco exposure.Nature201046318419010.1038/nature0862920016488)
PleasanceEDStephensPJO'MearaSMcBrideDJMeynertAJonesDLinMLBeareDLauKWGreenmanCVarelaINik-ZainalSDaviesHROrdonezGRMudieLJLatimerCEdkinsSStebbingsLChenLJiaMLeroyCMarshallJMenziesAButlerATeagueJWMangionJSunYAMcLaughlinSFPeckhamHETsungEFA small-cell lung cancer genome with complex signatures of tobacco exposure.Nature201046318419010.1038/nature0862920016488
PleasanceEDStephensPJO'MearaSMcBrideDJMeynertAJonesDLinMLBeareDLauKWGreenmanCVarelaINik-ZainalSDaviesHROrdonezGRMudieLJLatimerCEdkinsSStebbingsLChenLJiaMLeroyCMarshallJMenziesAButlerATeagueJWMangionJSunYAMcLaughlinSFPeckhamHETsungEFA small-cell lung cancer genome with complex signatures of tobacco exposure.Nature201046318419010.1038/nature0862920016488, PleasanceEDStephensPJO'MearaSMcBrideDJMeynertAJonesDLinMLBeareDLauKWGreenmanCVarelaINik-ZainalSDaviesHROrdonezGRMudieLJLatimerCEdkinsSStebbingsLChenLJiaMLeroyCMarshallJMenziesAButlerATeagueJWMangionJSunYAMcLaughlinSFPeckhamHETsungEFA small-cell lung cancer genome with complex signatures of tobacco exposure.Nature201046318419010.1038/nature0862920016488
(SchwarzGEstimating the dimension of a model.Ann Statist1978646146410.1214/aos/1176344136)
SchwarzGEstimating the dimension of a model.Ann Statist1978646146410.1214/aos/1176344136
SchwarzGEstimating the dimension of a model.Ann Statist1978646146410.1214/aos/1176344136, SchwarzGEstimating the dimension of a model.Ann Statist1978646146410.1214/aos/1176344136
Bmc Bioinformatics, Ágnes Baross, Allen Delaney, Irene Li, Tarun Nayar, S. Flibotte, H. Qian, Susanna Chan, J. Asano, Adrian Ally, Manqiu Cao, P. Birch, Mabel Brown-John, Nicole Fernandes, Anne Go, G. Kennedy, S. Langlois, P. Eydoux, Jm Friedman, M. Marra (2007)
Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data
BMC Bioinformatics, 8
N. Leach, C. Rehder, Keith Jensen, S. Holt, C. Jackson-Cook (2004)
Human chromosomes with shorter telomeres and large heterochromatin regions have a higher frequency of acquired somatic cell aneuploidy
Mechanisms of Ageing and Development, 125
(Network CGARComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890)
Network CGARComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890
Network CGARComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890, Network CGARComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890
Cheng Li, Wing Wong (2001)
Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application
Genome Biology, 2
(McLendonRFriedmanABignerDVan MeirEGBratDJMastrogianakisGMOlsonJJMikkelsenTLehmanNAldapeKYungWKBoglerOWeinsteinJNVandenBergSBergerMPradosMMuznyDMorganMSchererSSaboANazarethLLewisLHallOZhuYRenYAlviOYaoJHawesAJhangianiSFowlerGComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890)
McLendonRFriedmanABignerDVan MeirEGBratDJMastrogianakisGMOlsonJJMikkelsenTLehmanNAldapeKYungWKBoglerOWeinsteinJNVandenBergSBergerMPradosMMuznyDMorganMSchererSSaboANazarethLLewisLHallOZhuYRenYAlviOYaoJHawesAJhangianiSFowlerGComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890
McLendonRFriedmanABignerDVan MeirEGBratDJMastrogianakisGMOlsonJJMikkelsenTLehmanNAldapeKYungWKBoglerOWeinsteinJNVandenBergSBergerMPradosMMuznyDMorganMSchererSSaboANazarethLLewisLHallOZhuYRenYAlviOYaoJHawesAJhangianiSFowlerGComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890, McLendonRFriedmanABignerDVan MeirEGBratDJMastrogianakisGMOlsonJJMikkelsenTLehmanNAldapeKYungWKBoglerOWeinsteinJNVandenBergSBergerMPradosMMuznyDMorganMSchererSSaboANazarethLLewisLHallOZhuYRenYAlviOYaoJHawesAJhangianiSFowlerGComprehensive genomic characterization defines human glioblastoma genes and core pathways.Nature20084551061106810.1038/nature0738518772890
S. Mccarroll, F. Kuruvilla, Joshua Korn, S. Cawley, J. Nemesh, Alec Wysoker, M. Shapero, P. Bakker, J. Maller, Andrew Kirby, A. Elliott, Melissa Parkin, E. Hubbell, Teresa Webster, R. Mei, Jim Veitch, P. Collins, R. Handsaker, S. Lincoln, Marcia Nizzari, J. Blume, K. Jones, R. Rava, M. Daly, S. Gabriel, D. Altshuler (2008)
Integrated detection and population-genetic analysis of SNPs and copy number variation
Nature Genetics, 40
(LeachNTRehderCJensenKHoltSJackson-CookCHuman chromosomes with shorter telomeres and large heterochromatin regions have a higher frequency of acquired somatic cell aneuploidy.Mech Ageing Dev200412556357310.1016/j.mad.2004.06.00615336914)
LeachNTRehderCJensenKHoltSJackson-CookCHuman chromosomes with shorter telomeres and large heterochromatin regions have a higher frequency of acquired somatic cell aneuploidy.Mech Ageing Dev200412556357310.1016/j.mad.2004.06.00615336914
LeachNTRehderCJensenKHoltSJackson-CookCHuman chromosomes with shorter telomeres and large heterochromatin regions have a higher frequency of acquired somatic cell aneuploidy.Mech Ageing Dev200412556357310.1016/j.mad.2004.06.00615336914, LeachNTRehderCJensenKHoltSJackson-CookCHuman chromosomes with shorter telomeres and large heterochromatin regions have a higher frequency of acquired somatic cell aneuploidy.Mech Ageing Dev200412556357310.1016/j.mad.2004.06.00615336914
GISTIC 2 Manuscript and Software Download Page
(BolstadBMCollinFSimpsonKMIrizarryRASpeedTPExperimental design and low-level analysis of microarray data.Int Rev Neurobiol200460255815474586)
BolstadBMCollinFSimpsonKMIrizarryRASpeedTPExperimental design and low-level analysis of microarray data.Int Rev Neurobiol200460255815474586
BolstadBMCollinFSimpsonKMIrizarryRASpeedTPExperimental design and low-level analysis of microarray data.Int Rev Neurobiol200460255815474586, BolstadBMCollinFSimpsonKMIrizarryRASpeedTPExperimental design and low-level analysis of microarray data.Int Rev Neurobiol200460255815474586
Y. Benjamini, Y. Hochberg (1995)
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Journal of the royal statistical society series b-methodological, 57
(FiresteinRBassAJKimSYDunnIFSilverSJGuneyIFreedELigonAHVenaNOginoSChhedaMGTamayoPFinnSShresthaYBoehmJSJainSBojarskiEMermelCBarretinaJChanJABaselgaJTaberneroJRootDEFuchsCSLodaMShivdasaniRAMeyersonMHahnWCCDK8 is a colorectal cancer oncogene that regulates beta-catenin activity.Nature200845554755110.1038/nature0717918794900)
FiresteinRBassAJKimSYDunnIFSilverSJGuneyIFreedELigonAHVenaNOginoSChhedaMGTamayoPFinnSShresthaYBoehmJSJainSBojarskiEMermelCBarretinaJChanJABaselgaJTaberneroJRootDEFuchsCSLodaMShivdasaniRAMeyersonMHahnWCCDK8 is a colorectal cancer oncogene that regulates beta-catenin activity.Nature200845554755110.1038/nature0717918794900
FiresteinRBassAJKimSYDunnIFSilverSJGuneyIFreedELigonAHVenaNOginoSChhedaMGTamayoPFinnSShresthaYBoehmJSJainSBojarskiEMermelCBarretinaJChanJABaselgaJTaberneroJRootDEFuchsCSLodaMShivdasaniRAMeyersonMHahnWCCDK8 is a colorectal cancer oncogene that regulates beta-catenin activity.Nature200845554755110.1038/nature0717918794900, FiresteinRBassAJKimSYDunnIFSilverSJGuneyIFreedELigonAHVenaNOginoSChhedaMGTamayoPFinnSShresthaYBoehmJSJainSBojarskiEMermelCBarretinaJChanJABaselgaJTaberneroJRootDEFuchsCSLodaMShivdasaniRAMeyersonMHahnWCCDK8 is a colorectal cancer oncogene that regulates beta-catenin activity.Nature200845554755110.1038/nature0717918794900
(DahlbackHSBrandalPMelingTRGorunovaLScheieDHeimSGenomic aberrations in 80 cases of primary glioblastoma multiforme: Pathogenetic heterogeneity and putative cytogenetic pathways.Genes Chromosomes Cancer20094890892410.1002/gcc.2069019603525)
DahlbackHSBrandalPMelingTRGorunovaLScheieDHeimSGenomic aberrations in 80 cases of primary glioblastoma multiforme: Pathogenetic heterogeneity and putative cytogenetic pathways.Genes Chromosomes Cancer20094890892410.1002/gcc.2069019603525
DahlbackHSBrandalPMelingTRGorunovaLScheieDHeimSGenomic aberrations in 80 cases of primary glioblastoma multiforme: Pathogenetic heterogeneity and putative cytogenetic pathways.Genes Chromosomes Cancer20094890892410.1002/gcc.2069019603525, DahlbackHSBrandalPMelingTRGorunovaLScheieDHeimSGenomic aberrations in 80 cases of primary glioblastoma multiforme: Pathogenetic heterogeneity and putative cytogenetic pathways.Genes Chromosomes Cancer20094890892410.1002/gcc.2069019603525
(McCarrollSAKuruvillaFGKornJMCawleySNemeshJWysokerAShaperoMHde BakkerPIMallerJBKirbyAElliottALParkinMHubbellEWebsterTMeiRVeitchJCollinsPJHandsakerRLincolnSNizzariMBlumeJJonesKWRavaRDalyMJGabrielSBAltshulerDIntegrated detection and population-genetic analysis of SNPs and copy number variation.Nat Genet2008401166117410.1038/ng.23818776908)
McCarrollSAKuruvillaFGKornJMCawleySNemeshJWysokerAShaperoMHde BakkerPIMallerJBKirbyAElliottALParkinMHubbellEWebsterTMeiRVeitchJCollinsPJHandsakerRLincolnSNizzariMBlumeJJonesKWRavaRDalyMJGabrielSBAltshulerDIntegrated detection and population-genetic analysis of SNPs and copy number variation.Nat Genet2008401166117410.1038/ng.23818776908
McCarrollSAKuruvillaFGKornJMCawleySNemeshJWysokerAShaperoMHde BakkerPIMallerJBKirbyAElliottALParkinMHubbellEWebsterTMeiRVeitchJCollinsPJHandsakerRLincolnSNizzariMBlumeJJonesKWRavaRDalyMJGabrielSBAltshulerDIntegrated detection and population-genetic analysis of SNPs and copy number variation.Nat Genet2008401166117410.1038/ng.23818776908, McCarrollSAKuruvillaFGKornJMCawleySNemeshJWysokerAShaperoMHde BakkerPIMallerJBKirbyAElliottALParkinMHubbellEWebsterTMeiRVeitchJCollinsPJHandsakerRLincolnSNizzariMBlumeJJonesKWRavaRDalyMJGabrielSBAltshulerDIntegrated detection and population-genetic analysis of SNPs and copy number variation.Nat Genet2008401166117410.1038/ng.23818776908
(EtemadmoghadamDdeFazioABeroukhimRMermelCGeorgeJGetzGTothillROkamotoARaederMBHarnettPLadeSAkslenLATinkerAVLocandroBAlsopKChiewYETraficanteNFeredaySJohnsonDFoxSSellersWUrashimaMSalvesenHBMeyersonMBowtellDBowtellDChenevix-TrenchGGreenAWebbPdeFazioAIntegrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas.Clin Cancer Res2009151417142710.1158/1078-0432.CCR-08-156419193619)
EtemadmoghadamDdeFazioABeroukhimRMermelCGeorgeJGetzGTothillROkamotoARaederMBHarnettPLadeSAkslenLATinkerAVLocandroBAlsopKChiewYETraficanteNFeredaySJohnsonDFoxSSellersWUrashimaMSalvesenHBMeyersonMBowtellDBowtellDChenevix-TrenchGGreenAWebbPdeFazioAIntegrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas.Clin Cancer Res2009151417142710.1158/1078-0432.CCR-08-156419193619
EtemadmoghadamDdeFazioABeroukhimRMermelCGeorgeJGetzGTothillROkamotoARaederMBHarnettPLadeSAkslenLATinkerAVLocandroBAlsopKChiewYETraficanteNFeredaySJohnsonDFoxSSellersWUrashimaMSalvesenHBMeyersonMBowtellDBowtellDChenevix-TrenchGGreenAWebbPdeFazioAIntegrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas.Clin Cancer Res2009151417142710.1158/1078-0432.CCR-08-156419193619, EtemadmoghadamDdeFazioABeroukhimRMermelCGeorgeJGetzGTothillROkamotoARaederMBHarnettPLadeSAkslenLATinkerAVLocandroBAlsopKChiewYETraficanteNFeredaySJohnsonDFoxSSellersWUrashimaMSalvesenHBMeyersonMBowtellDBowtellDChenevix-TrenchGGreenAWebbPdeFazioAIntegrated genome-wide DNA copy number and expression analysis identifies distinct mechanisms of primary chemoresistance in ovarian carcinomas.Clin Cancer Res2009151417142710.1158/1078-0432.CCR-08-156419193619
(NilssonBJohanssonMAl-ShahrourFCarpenterAEEbertBLUltrasome: efficient aberration caller for copy number studies of ultra-high resolution.Bioinformatics2009251078107910.1093/bioinformatics/btp09119228802)
NilssonBJohanssonMAl-ShahrourFCarpenterAEEbertBLUltrasome: efficient aberration caller for copy number studies of ultra-high resolution.Bioinformatics2009251078107910.1093/bioinformatics/btp09119228802
NilssonBJohanssonMAl-ShahrourFCarpenterAEEbertBLUltrasome: efficient aberration caller for copy number studies of ultra-high resolution.Bioinformatics2009251078107910.1093/bioinformatics/btp09119228802, NilssonBJohanssonMAl-ShahrourFCarpenterAEEbertBLUltrasome: efficient aberration caller for copy number studies of ultra-high resolution.Bioinformatics2009251078107910.1093/bioinformatics/btp09119228802
D Hanahan (2000)
10.1016/S0092-8674(00)81683-9
Cell, 100
William Lin, Alissa Baker, R. Beroukhim, W. Winckler, W. Feng, J. Marmion, E. Laine, H. Greulich, Hsiuyi Tseng, C. Gates, F. Hodi, G. Dranoff, W. Sellers, Roman Thomas, M. Meyerson, T. Golub, R. Dummer, M. Herlyn, G. Getz, L. Garraway (2008)
Modeling genomic diversity and tumor dependency in malignant melanoma.
Cancer research, 68 3
(SantariusTShipleyJBrewerDStrattonMRCooperCSA census of amplified and overexpressed human cancer genes.Nat Rev Cancer201010596410.1038/nrc277120029424)
SantariusTShipleyJBrewerDStrattonMRCooperCSA census of amplified and overexpressed human cancer genes.Nat Rev Cancer201010596410.1038/nrc277120029424
SantariusTShipleyJBrewerDStrattonMRCooperCSA census of amplified and overexpressed human cancer genes.Nat Rev Cancer201010596410.1038/nrc277120029424, SantariusTShipleyJBrewerDStrattonMRCooperCSA census of amplified and overexpressed human cancer genes.Nat Rev Cancer201010596410.1038/nrc277120029424

Publisher: Springer Journals
Copyright: Copyright © 2011 by Mermel et al.; licensee BioMed Central Ltd.
Subject: Life Sciences; Animal Genetics and Genomics; Human Genetics; Plant Genetics and Genomics; Microbial Genetics and Genomics; Bioinformatics; Evolutionary Biology
eISSN: 1474-760X
DOI: 10.1186/gb-2011-12-4-r41
pmid: 21527027
Publisher site: See Article on Publisher Site

Abstract

We describe methods with enhanced power and specificity to identify genes targeted by somatic copy-number alterations (SCNAs) that drive cancer growth. By separating SCNA profiles into underlying arm-level and focal alterations, we improve the estimation of background rates for each category. We additionally describe a probabilistic method for defining the boundaries of selected-for SCNA regions with user-defined confidence. Here we detail this revised computational approach, GISTIC2.0, and validate its performance in real and simulated datasets. Background subclone carrying such alterations acquires selectively Cancer formsthrough thestepwiseacquisition of beneficial mutations that promote clonal dominance [8]. somatic genetic alterations, including point mutations, Second,SCNAs maysimultaneouslyaffect up to thou- copy-number changes, and fusion events, that affect the sands of genes, but the selective benefits of driver altera- function of critical genes regulating cellular growth and tions are likely to be mediated by only one or a few of survival [1]. The identification of oncogenes and tumor these genes. For these reasons, additional analysis and suppressor genes being targeted by these alterations has experimentation is required to distinguish the drivers greatly accelerated progress in both the understanding from the passengers, and to identify the genes they are of cancer pathogenesis and the identification of novel likely to target. therapeutic vulnerabilities [2]. Genes targeted by somatic A common approach to identifying drivers is to study copy-number alterations (SCNAs), in particular, play large collections of cancer samples, on the notion that central roles in oncogenesis and cancer therapy [3]. Dra- regions containing driver events should be altered at matic improvements in both array and sequencing plat- higher frequencies than regions containing only passen- forms have enabled increasingly high-resolution gers [4,6,7,9-14]. For example, we developed an algo- characterization of the SCNAs present in thousands of rithm, GISTIC (Genomic Identification of Significant cancer genomes [4-6]. Targets in Cancer) [15], that identifies likely driver However, the discovery of new cancer genes being tar- SCNAs by evaluating the frequency and amplitude of geted by SCNAs is complicated by two fundamental observed events. GISTIC has been applied to multiple cancer types, including glioblastoma [10,15], lung adeno- challenges. First, somatic alterations are acquired at ran- dom during each cell division, only some of which (’dri- carcinoma [16], melanoma [17], colorectal carcinoma ver’ alterations) promote cancer development [7]. [18], hepatocellular carcinoma [19], ovarian carcinoma Selectively neutral or weakly deleterious ‘passenger ’ [20], medulloblastoma [21], and lung and esophageal alterations may nonetheless become fixed whenever a squamous carcinoma [22], and has helped identify sev- eral new targets of amplifications (including NKX2-1 * Correspondence: [email protected]; [16], CDK8 [18], VEGFA [19], SOX2 [22], and MCL1 [email protected] and BCL2L1 [4]) and deletions (EHMT1 [21]). Several Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA 02142, USA Full list of author information is available at the end of the article © 2011 Mermel et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Mermel et al. Genome Biology 2011, 12:R41 Page 2 of 14 http://genomebiology.com/2011/12/4/R41 additional algorithms for identifying likely driver SCNAs confidence has been a major limitation in interpreting have also been described [23-25] (reviewed in [26]). copy-number analyses, an important problem as end- Yet, several critical challenges have not yet been ade- users typically use these results to prioritize candidate quately addressed by any of the existing copy-number genes for time-consuming validation experiments. analysis tools. For example, we and others have shown Here we describe several methodological improve- that the abundance of SCNAs in human cancers varies ments to address these challenges, and validate the per- according to their size, with chromosome-arm length formance of the revised algorithms in both real and SCNAs occurring much more frequently than SCNAs of simulated datasets. We have incorporated these changes into a revised GISTIC pipeline, termed GISTIC 2.0. slightly larger or smaller size [4,27]. Therefore, analysis methods need to model complex cancer genomes that contain a mixture of SCNA types occurring at distinct Results and Discussion background rates. Existing copy-number methods have Overview of copy-number analysis pipeline also used ad hoc heuristics to define the genomic Cancer copy-number analyses can be divided into five regions likely to harbor true cancer gene targets. The discrete steps (Figure 1): 1) accurately defining the inability of these methods to provide a priori statistical copy-number profile of each cancer sample; 2) Individual .CEL Files Step 1: Array Calibration Accurate definition Copy Number Estimation of the copy number Segmentation profile in each sample Segmented Copy Number Profiles Deconstruction of segmented profle into underlying SCNAs Step 2: Elimination of arm-level SCNAs by Allows for modelling of background Identification/separation use of amplitude threshold rate of SCNAs and length-based of underlying SCNAs separation of arm-level and focal SCNAs G = -log(Probability | Background) Step 3: G = frequency x amplitude Scoring SCNAs in each Scores computed on markers or genes region according to p-values computed by random permutation of markers likelihood of occuring p-values computed by random across genome permutation of markers or bins by chance across genome Arbitrated peel-off algorithm Step 4: Greedy segment peel-off algorithm Defining independent Formalizes idea that segments can genomic regions Iteratively subtracts segments have multiple targets by allowing covering each peak and undergoing significant segment scores to be split rescores until no significant peaks levels of SCNA among multiple potential peaks remain on chromosome during peel-off Leave-k-out RegBounder Step 5: Accurate definition Assumes that at most ÔkÕ passenger Models expected local variation in events aberrantly define the minimal G-score to define boundaries predicted of the copy number common region to contain the true target with profile in each sample predetermined confidence GISTIC 1.0 (Beroukhim et al, 2007) GISTIC 2.0 Figure 1 Schematic overview of the copy-number analysis framework. High-level overview of our cancer copy-number analysis framework, highlighting specific differences between the original GISTIC algorithm [15] and the GISTIC 2.0 pipeline described in this manuscript. The first step, accurate identification of the copy-number profile in each sample, is common to GISTIC and GISTIC2.0. Mermel et al. Genome Biology 2011, 12:R41 Page 3 of 14 http://genomebiology.com/2011/12/4/R41 identifying the SCNAs that most likely gave rise to these passengers, so that their distribution reflects, to a first overall profiles and estimating their background rates of approximation, the operation of the ‘background’ muta- formation; 3) scoring the SCNAs in each region accord- tion process (see Supplementary Figure S2 in Additional ing to their likelihood of occurring by chance; 4) defin- file 3). ing the independent genomic regions undergoing statistically significant levels of SCNA; and 5) identifying Length-based separation of focal and arm-level SCNAs the likely gene target(s) of each significantly altered A major advantage of the ZD method is its ability to region. Figure 1 depicts a schematic overview of this separate arm-level and focal SCNAs explicitly by length. process, highlighting the specific methodological Prior studies have attempted to exclude arm-level improvements we will address in the present SCNAs by setting high amplitude thresholds [10,16] manuscript. because, in contrast to focal SCNAs, few arm-level The first step, accurately defining the copy-number SCNAs reach high amplitude (Figure 2a). However, this profile of each cancer sample, has been addressed by approach suffers from at least two undesirable conse- multiple previous studies [28-35] and is not discussed in quences: first, low- to moderate-amplitude focal copy- detail here. We assume that segmented copy-number number events are eliminated from the analysis, redu- profiles have been obtained for all samples and all germ- cing sensitivity to identify positively selected regions; line copy-number variations (CNVs) have been removed, and second, the amplitude threshold is left as a free yielding profiles of somatic events. The following sec- parameter, allowing for potential over-fitting of the ana- tions describe improvements to steps 2 to 5. We evalu- lysis to a desired result. ate these improvements on a test set of 178 We have previously shown that SCNA frequencies glioblastoma multiforme (GBM) cancer DNAs hybri- across cancers of diverse tissue origin are inversely pro- dized to the Affymetrix Single Nucleotide Polymorphism portional to SCNA lengths, with the striking exception (SNP)6.0 arrayaspartofThe Cancer Genome Atlas of SCNAs exactly the length of a chromosome arm or (TCGA) project [10] (the ‘TCGA GBM set’), and on whole chromosome (which are very frequent) [4]. This simulated data. Full technical details for each step are trend is preserved in the TCGA GBM samples (Figure described in the Supplementary Methods (Additional 2b). This reproducible distribution provides a natural file 1). basis for classifying events as ‘arm-level’ and ‘focal’ based purely on length. Such length-based filtering of Deconstruction of segmented copy-number profiles into events allows for the computational reconstruction of underlying SCNAs ‘arm-level’ and ‘focal’ representations of the cancer gen- ome (Figure 2c) and enables the inclusion of low- to Segmented copy number profiles represent the summed outcome of all the SCNAs that occurred during cancer moderate-amplitude focal copy-number events in the development. Accurate modeling of the background rate final analysis. of copy-number alteration requires analysis of the indi- To determine the benefits of this approach, we ran the vidual SCNAs. However, because SCNAs may overlap, it original ‘GISTIC 1.0’ algorithm on the TCGA GBM set is impossible to directly infer the underlying events using three different thresholding approaches (Figure 3; from the final segmented copy-number profile alone. Supplementary Table S1 in Additional file 4): 1) a low Given certain assumptions about SCNA background amplitude threshold (log2 ratio of ± 0.1) that only elimi- rates, however, it is possible to estimate the likelihood nates low-level artifactual segments; 2) a high amplitude of any given set of candidate SCNAs so as to select the threshold (log2 ratio of 0.848 and -0.737 for amplifica- most likely one. tions/deletions) used previously [16] to eliminate arm- We have developed an algorithm (’Ziggurat Decon- level events; and 3) the low amplitude threshold but struction’ (ZD)) that deconstructs each segmented copy- also removing all SCNAs occupying more than 98% of a number profile into its most likely set of underlying chromosome arm, leaving only the focal events. SCNAs (see Supplementary Methods in Additional file 1 Filtering out arm-level events through use of either and Supplementary Figure S1 in Additional file 2). ZD is amplitude or length thresholds greatly increased the an iterative optimization algorithm that alternatively sensitivity of GISTIC for detecting focal amplifications estimates a background model for SCNA formation and and deletions (Figure 3; Supplementary Table S1 in then utilizes this model to determine the most likely Additional file 4). While entire chromosomes were deconstruction of each copy-number profile. Its output scored as significant using only a low amplitude thresh- is a catalog of the individual SCNAs in each cancer old, including gain of chromosome 7 and loss of chro- sample, each with an assigned length and amplitude, mosome 10 (Figure 3a), a number of recurrent focal alterations were missed, including amplifications sur- that sum to generate the original segmented copy pro- file. We assume that most of these SCNAs are rounding CDK6, CCND2,and HMGA2.These Mermel et al. Genome Biology 2011, 12:R41 Page 4 of 14 http://genomebiology.com/2011/12/4/R41 (a) (b) Amplitude of Focal and Arm-level SCNAs Length Distribution of SCNAs 0.45 Focal SCNAs Arm-level SCNAs 3.5 0.4 0.35 2.5 0.3 0.25 High amp 0.2 threshold 1.5 0.15 0.1 0.5 0.05 Low amp threshold 0 0.5 1 1.5 2 Focal SCNAs Arm-level SCNAs Length (fraction of chr arm) (c) All Data Arm-level SCNAs Focal SCNAs =+ Figure 2 Computational separation of arm-level and focal SCNAs. (a) Boxplot showing the distribution of copy-number changes for amplified focal (length < 98% of a chromosome arm) and arm-level (length > 98% of a chromosome arm) SCNAs across 178 GBM profiles from TCGA. The black dotted line denotes a typical low-level amplitude threshold used to eliminate artifactual SCNAs, while the green dotted line denotes a typical high-level amplitude threshold used in previous version of GISTIC to eliminate arm-level SCNAs. (b) Histogram showing the frequency of observing SCNAs of a given length across 178 GBM samples. The high frequency of events occupying exactly one chromosome arm led us to distinguish between focal and arm-level SCNAs. (c) Heatmaps showing the total segmented copy-number profile of the TCGA GBM set (leftmost panel), and the results of computationally separating these samples into arm-level profiles (middle panel) and focal profiles (rightmost panel) by summing arm-level and focal SCNAs. In each heatmap, the chromosomes are arranged vertically from top to bottom and samples are arranged from left to right. Red and blue represent gain and loss, respectively. alterations were detected using either the high ampli- alteration. Ideally, we aim to score each region of the tude (Figure 3b) or the focal length filters (Figure 3c). genome according to the probability with which the The benefits of length-based filtering result from the observed set of SCNAs would occur by chance alone. inclusion of low- to moderate-amplitude focal events. Scores using this framework have a clear interpretation: Amplification of PIK3CA and AKT1 and deletion of thehigherthe scoreassigned to aregion, thelesslikely WWOX are detected using length-based filtering, but that the SCNAs in that region are observed entirely by are not significant under the high amplitude filter (com- chance, and the more likely that they underwent positive pare Figure 3b and 3c). Moreover, the length-based ana- selection. lysis identified significant SCNAs detected in neither of The probability of observing a single SCNA of given the amplitude-based analyses, including amplifications length and amplitude can be approximated by the fre- of MLLT10 and deletions of CDKN1B and NF1. quency of occurrence of events of similar length and No known GBM target gene was detected in either of amplitude across the entire dataset (as in Supplemen- the amplitude-based analyses that was not also detected tary Figure S2 in Additional file 3). However, since by the length-based analysis. These results suggest that cancer genomes do contain drivers, this procedure is length-based filtering of arm-level events greatly likely to overestimate the probability of observing improves the sensitivity of GISTIC to identify relevant SCNAs under the null model. Specifically, driver regions of focal SCNA. events tend to be shorter in length and of higher amplitude than passengers and therefore constitute Probabilistic scoring of SCNAs the majority of events in their length/amplitude neighborhood (Supplementary Figure S3 in Additional We set out to define a scoring framework for SCNAs that more accurately reflects the background rates of file 5). Copy Number Change Fraction of segments Mermel et al. Genome Biology 2011, 12:R41 Page 5 of 14 http://genomebiology.com/2011/12/4/R41 All Data All Data Focal Data Low Amplitude Threshold High Amplitude Threshold Low Amplitude Threshold (a) (b) (c) 0.053 0.1 0.2 0.4 0.8 0.033 0.1 0.2 0.4 0.8 0.03 0.1 0.2 0.4 0.8 1 1 1 MDM4 MDM4 MDM4 AKT3 AKT3 AKT3 2 2 2 3 3 3 PIK3CA PIK3CA PDGFRA PDGFRA PDGFRA 4 4 4 5 5 6 6 EGFR EGFR EGFR 7 7 7 CDK6 CDK6 MET MET MET 8 8 8 9 9 9 MLLT10 10 10 CCND2 CCND2 11 11 11 CDK4 CDK4 CDK4 12 12 12 HMGA2 HMGA2 MDM2 MDM2 13 13 MDM2 13 14 14 14 AKT1 AKT1 15 15 16 16 17 17 17 18 18 18 19 19 20 20 20 21 21 21 22 22 -3.7 -6.8 -13 -25 -50 -100 0.25 -3.7 -6.8 -13 -25 -50 -100 0.25 -3.7 -6.8 -13 -25 -50 -100 0.25 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 27 Amplified Regions 41 Amplified Regions 55 Amplified Regions 0.4 0.2 0.1 0.079 0.4 0.2 0.1 0.038 CDKN2C 0.4 0.2 0.1 0.028 CDKN2C 1 CDKN2C QKI 6 QKI 7 QKI CSMD1 CSMD1 CDKN2A/2B CDKN2A/2B 9 CDKN2A/2B PTEN 10 PTEN PTEN CDKN1B RB1 13 RB1 13 RB1 14 13 15 14 16 15 WWOX 16 17 WWOX 18 17 NF1 21 19 22 21 20 -50 -13 -3.7 0.25 21 10 10 10 -100 -25 -6.8 10 10 10 0.25 22 -100 -25 -6.8 0.25 10 10 10 15 Deleted Regions 31 Deleted Regions 36 Deleted Regions Figure 3 Effects of amplitude-based or length-based filtering of arm-level events on GISTIC results. (a-c) GISTIC amplification (top) and deletion (bottom) plots using all data and a low amplitude threshold (a), using all data and a high amplitude threshold (b), and using the focal data and a low amplitude threshold (c). The genome is oriented vertically from top to bottom, and GISTIC q-values at each locus are plotted from left to right on a log scale. The green line represents the significance threshold (q-value = 0.25). For each plot, known or interesting candidate genes are highlighted in black when identified by all three analyses, in red when identified by the high amplitude or focal length analyses, in purple when identified by the low amplitude or focal length analyses, and in green when identified only in the focal length analysis. To avoid biasing our background model, we set out to proportional to the amplitude in copy-number space fit the log-probability distribution of SCNAs to a func- rather than log-copy-number space. tional form that would be insensitive to the presence of Although this functional form was empirically derived driver events in the data (Supplementary Methods in from a large collection of samples run on two different Additional file 1). We made use of a large collection of array-based platforms, it does lead to increased sensitiv- 3,131 cancer samples run on the Affymetrix 250K StyI ity to differences in dynamic range across platforms as SNP Array [4] plus several hundred additional samples well as differential saturation characteristics of probes run on the Affymetrix SNP6.0 Array (data not shown). within the same array platform. To avoid this problem, At the level of resolution provided by these arrays, the we routinely cap the segmented copy-number data at a probability of observing a focal SCNA at a given locus level representing the signal intensity above which most under the background model is roughly independent of probes start to saturate (Supplementary Methods in length. As a result, the functional form for the log-prob- Additional file 1). This ensures that we are using data ability distribution is similar to the original GISTIC G- that originate from the linear regime of the probes’ score definition (G = Frequency × Amplitude), with the response curves and therefore are more comparable notable exception being that the new score is across platforms. Deletions Amplifications Mermel et al. Genome Biology 2011, 12:R41 Page 6 of 14 http://genomebiology.com/2011/12/4/R41 As with GISTIC 1.0, we obtain P-values for each mar- However, because this scoring method does not score ker by comparing the score at each locus to a back- regions of the genome that are not in annotated genes, ground score distribution generated by random it could underweight or completely miss deletions permutation of the marker locations in each sample occurring in non-genic regions. For example, in our (Supplementary Methods in Additional file 1). This pro- GBM samples, gene-based scoring did not identify a cedure controls for sample-specific variations in the rate region just outside of PCHD9 on chr13q21.3 that scored of copy-number alteration. We correct the resulting P- as highly significant (q-value = 4.4e-9) using the stan- dard marker-based score (Supplementary Figure S4b in values for multiple-hypothesis testing using the Benja- Additional file 7). While many non-genic deletions may mini-Hochberg false discovery rate method [36]. in fact represent technical artifacts or rare germline Alternative gene-level scoring for tumor suppressors with events, some may be functionally relevant. non-overlapping deletions Some genes are affected by non-overlapping deletions, Identification of independent significantly altered regions either on different alleles in one sample or across multi- Individual SCNAs, and indeed significantly amplified or ple samples. For such genes, a marker-based score does deleted regions of the genome, may extend over more not weight the presence of all deletions affecting that than one oncogene or tumor suppressor gene. Other gene, despite the fact that these events are likely to have significant regions may contain no oncogenes or tumor similarly deleterious effects on gene function. We have suppressor genes, but achieve apparent significance due developed a modified scoring and permutation proce- to their proximity to a target gene. Thus, an additional dure, termed GeneGISTIC, that scores genes rather step is required after genome-wide scoring to identify than markers (Supplementary Methods in Additional file independently significant regions. 1). In each sample, we assign each gene the minimal GISTIC 1.0 solves this problem through the use of an copy number of any marker contained within that gene, iterative ‘peel-off’ algorithm, which greedily assigns all and then sum across all samples to compute the gene SCNAs to the maximal peak on each chromosome, score. Because genes covering more markers are more removes them from the data, and rescores until no likely toachieve amoreextreme valuebychance, the remaining region crosses the significance threshold. This permutation procedure is adjusted to account for gene approach reduces the power to identify secondary peaks size;the scorefor agenecovering n markers is com- that are close to previously identified significant regions pared against a size-specific null distribution generated (Figure 4a). However, since it is possible for individual by computing minima overall running windows of size n SCNAs to affect multiple driver regions, a less greedy in each sample and then randomly permuting these approach might identify additional peaks without signifi- minimal values across the genome. cantly increasing the false discovery rate. To determine the effect of gene-based scoring of dele- We have, therefore, modified the method to allow tions, we compared the results of gene-based and mar- SCNAs to contribute to more than one peak (’arbitrated ker-based scoring on the TCGA GBM set (holding all peel-off’). We first greedily assign the entirety of an other parameters equal). As expected, GeneGISTIC SCNA’s score to the most significant peak it covers. In ranks known tumor suppressor genes higher and is subsequent steps, however, we allow scores of previously more sensitive for genes subject to non-overlapping assigned segments to be redistributed before deciding deletions (Supplementary Table S2 in Additional file 6). whether a putative region is significant (Supplementary For example, RB1 was ranked 5th out of 39 regions Methods in Additional file 1). Like the original algo- using gene-based scoring (q-value = 2.6e-10) but only rithm, the process terminates when no region has an 13th out of 38 using marker-based scoring (q-value = adjusted score that exceeds the significance threshold. A 0.0013), and CDKN1B was ranked 26th using gene- similar modification of GISTIC has recently been pro- based scoring (q-value = 0.08) compared to 38th using posed [37]. marker-based scoring (q-value = 0.19). NF1 was focally Arbitrated peel-off is more sensitive than the original deleted in 12 of the 178 GBM samples (6.7%), and these algorithm (Figure 4a; Supplementary Table S3 in Addi- deletions were frequently non-overlapping (Supplemen- tional file 8). We generated 10,000 simulated datasets tary Figure S4a in Additional file 7). As a result, NF1 each consisting of 300 samples, with each chromosome was scored just over or just under the significance containing a primary driver event in 10% of the samples threshold using the marker-based score, depending on and a secondary driver event in 5% of the samples. We the parameters used. By contrast, NF1 was robustly analyzed the sensitivity of standard and arbitrated peel- identified using gene-based scoring across all parameter off to detect the secondary peak as we varied the percen- combinations (Supplementary Table S2 in Additional tage of secondary driver events that overlapped the pri- file 6 and data not shown). mary driver peak between 0% and 100% (Supplementary Mermel et al. Genome Biology 2011, 12:R41 Page 7 of 14 http://genomebiology.com/2011/12/4/R41 (a) (b) Sensitivity vs. Driver Distance Sensitivity vs. Driver SCNA Overlap 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 0 10 20 30 40 50 60 100 90 80 70 60 50 40 30 20 10 0 Closer Distance Farther Distance Distance between drivers (Mb) Fraction overlap between driver events (%) Arbitrated peel−off Standard peel−off Figure 4 Sensitivity of peel-off to detect secondary driver events. The average fraction of secondary driver events recovered in independent (not containing the primary driver) peaks by GISTIC using the standard peel-off method (blue line) or arbitrated peel-off (red line) is shown for two simulated datasets. (a) The data are derived from 1,000 simulated chromosomes across 300 samples with a primary driver event present in 10% of samples and a secondary driver event a fixed distance away that is present in 5% of samples. (b) Data are derived from 10,000 simulated chromosomes across 300 samples with a primary driver event present in 10% of samples and a secondary driver event present in 5% of samples, where the fraction of the secondary driver events that overlapped with the primary driver event was varied between 100% (complete dependence; far left) and 0% (complete independence; far right). Error bars represent the mean ± standard error of the mean (some are too small to be visible). Methods in Additional file 1). At 0% overlap, the two genes being targeted for each independently significant methods were nearly equally sensitive at identifying the region of SCNA (the ‘peak region’). The standard secondary peak. However, arbitrated peel-off was vastly approach is to focus on the minimal common region more sensitive than standard peel-off as we increased the (MCR) of overlap (Figure 5a), the region that is altered in rate of overlap between primary and secondary peaks from the greatest number of samples and therefore would be 5 to 50% (Figure 4b), recovering an average of 2.4 times expected to be the most likely to contain the target genes. (range 1.2 to 3.8) more secondary peaks. Over 80% of the However, one or more passenger SCNAs adjacent to, but novel peaks identified by arbitrated peel-off corresponded not overlapping, the target gene can result in an MCR that to an actual simulated driver peak, demonstrating that the does not include the true target. This is a frequent occur- increased sensitivity is accompanied by high specificity. rence, especially when the frequency of the driver event is The primary and secondary peaks tend to merge when low (< 5%; Figure 5b). An alternative method (utilized by the overlap is above 50%, obscuring any appreciable dif- the GISTIC 1.0) is to apply a heuristic ‘leave-k-out’ proce- ference between the two methods (Supplementary Fig- dure to define the boundaries of each peak region (Figure ure S5 in Additional file 9). Indeed, neither method was 5a) [15]. This procedure assumes that up to k passenger capable of independently identifying the secondary peak SCNAs (typically, k = 1) may aberrantly define each once the percent overlap rose above 80%. These simula- boundary of the peak region. While the ‘leave-k-out’ pro- tions demonstrate both the superior sensitivity of arbi- cedure correctly identifies the target gene more often than trated peel-off as well as the challenge of identifying the MCR (Figure 5b), it suffers from the potential for over- neighboring drivers. fitting introduced by the free parameter ‘k’. Moreover, the accuracy of ‘leave-k-out’ varies depending on the number Localizing target genes for each significantly altered of samples and the frequency of the event under question. region For fixed k, the sensitivity of ‘leave-k-out’ increases for The final step in the GISTIC pipeline is to determine increasing driver frequency (Figure 5b) and decreases for the region that is most likely to contain the gene or increasing sample size (Figure 5c). % recovery of independent second driver peak % recovery of independent second driver peak Mermel et al. Genome Biology 2011, 12:R41 Page 8 of 14 http://genomebiology.com/2011/12/4/R41 Leave-1-Out MCR (a) ΔG RegBounder Target Gene Chromosomal Position Target Gene Driver Recall as Function of Driver Frequency Driver Recall as Function of Sample Size (b) (c) (n = 500 samples) (5% driver frequency) 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 MCR Leave−1−Out RegBounder 50% MCR 10 10 RegBounder 75% Leave−1−Out RegBounder 95% RegBounder 75% 0 0 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 600 700 800 Driver Frequency (%) Number of samples Figure 5 Sensitivity of peak finding algorithms. (a) Schematic diagram demonstrating various peak finding methods. The left panel shows the GISTIC score profile for a simulated chromosome containing a mix of driver events covering the denoted target gene and passenger events randomly scattered across the chromosome. The inset at right shows the region around the maximal G-score (gray box in left panel) in higher detail. The MCR (red dotted lines) is defined as the region of maximal segment overlap, or the region of highest G-score. The leave-k-out procedure (blue dotted lines, here shown for k = 1) is obtained by repeatedly computing the MCR after leaving out each sample in turn and taking as the left and right boundaries the minimal and maximal extent of the MCR. RegBounder works by attempting to find a region (dotted green line) over which the variation between boundary and maximal peak score is within the gth percentile of the local range distribution (Supplementary Methods in Additional file 1). Here, RegBounder produces a wider region than either the MCR or leave-k-out procedures, but is the only method whose boundary contains the true driver gene. (b,c) The average fraction of driver events contained within the peak region (conditional on having found a GISTIC peak within 10 Mb) is plotted as a function of driver-frequency (b) or sample size (c) for the MCR (red), leave-1-out (blue), and RegBounder algorithms (the latter at various confidence levels: 50%, magenta; 75%, green; 95%, black). In (b), data are derived from 10,000 simulated chromosomes across 500 samples in which the driver frequency varied from 1 to 10%. In (c), data are derived from 10,000 simulated chromosomes across a variable number of samples in which the driver frequency was fixed at 5%. Error-bars represent the mean ± standard error of the mean (some are too small to be visible). We developed a novel approach (termed ‘RegBoun- driver at least g% of the time, where g is a desired confi- der’) to define the peak region boundaries in such a way dence level. Unlike the MCR and ‘leave-k-out’ proce- that target genes would be included at a pre-determined dures, which are highly dependent on one or a few confidence level, regardless of the event frequency or segment boundaries to define each region, RegBounder number of samples being studied (Figure 5a; Supple- is designed to be relatively robust to random errors mentary Methods in Additional file 1). RegBounder (either due to technical artifacts or passenger segments) models the expected random fluctuation in G-scores in boundary assignment. When applied to real data, within any given window size and uses this distribution RegBounder captures known driver genes more effec- to define a confidence region likely to contain the true tively than ‘leave-1-out ’ (and MCR) in regions with Fraction of drivers identified (%) GISTIC Score Fraction of drivers identified (%) Mermel et al. Genome Biology 2011, 12:R41 Page 9 of 14 http://genomebiology.com/2011/12/4/R41 increased local noise (Figure 6a) and yet is capable of of 90 times larger than the ‘leave-k-out’ peak regions producing narrower boundaries than ‘leave-1-out’ in (for datasets with few total driver events, in which the regions with little noise (Figure 6b). target gene locations are truly uncertain) to 37% smaller In simulated datasets, the performance of RegBounder than the ‘leave-k-out’ procedure (for datasets with many was consistent across a wide range of driver SCNA fre- total driver events). Thus, the increased confidence of quencies (Figure 5b) and sample sizes (Figure 5c), and RegBounder can even be achieved while producing nar- indeed controlled the probability of containing the dri- rower regions than the ‘leave-k-out’ procedure. ver. RegBounder captured the true driver gene in an RegBounder is also more consistent across datasets than the MCR and ‘leave-k-out ’ methods. We ran- average of 72%, 85%, and 95% of driver regions of vary- ing frequency when run with a desired confidence level domly split the TCGA GBM set into two groups and (g) of 50, 75, and 95%, respectively. For no combination compared the peak regions produced by RegBounder of sample-size, driver frequency, and g did the average and the MCR and ‘leave-k-out’ procedures on each. accuracy of RegBounder drop below g. Considering only those peaks that were identified by RegBounder also demonstrated a more optimal trade- GISTIC in both datasets, only 23% of the MCRs and off between peak region sensitivity (the likelihood of 31% of the ‘leave-k-out’ peak regions overlap between including the target gene) and specificity (the number of the two datasets, reflectingthe lowconfidencewith additional genes included) than the MCR or ‘leave-k- which these regions are assigned. By contrast, a major- out’ approaches. The average size of the peak regions ity (53%) of the RegBounder peak regions (at 75% con- decreases with increasing driver frequency (Figure 7a) fidence) overlapped, as expected (0.75 =56%). This and sample size (Figure 7b) for all three approaches. increased overlap came with only a modestly increased However, RegBounder is more sensitive to these vari- median size of the RegBounder peak regions (370 kb) ables than the other methods, so that RegBounder peak compared to the leave-k-out (163 kb) or MCR (115 regions (at 75% confidence) can range from an average kb) peak regions. RegBounder vs. MCR and Leave-1-Out on Lung Adenocarcinoma Samples KRAS hTERT (a) (b) 0.26 0.285 0.28 0.24 0.275 0.22 0.27 0.265 0.2 0.26 0.18 0.255 0.25 0.16 0.245 25 25.5 26 26.5 1.0 1.5 MCR Chromosome 12 Position (Mb) Chromosome 5 Position (Mb) Leave-1-Out RegBounder Figure 6 Comparison of RegBounder to MCR and leave-1-out procedures applied to primary lung adenocarcinomas. The advantages of RegBounder over previous peak-finding procedures are illustrated for two well-described oncogene peaks identified in GISTIC analysis of 371 lung adenocarcinoma samples characterized on the Affymetrix 250K StyI SNP array (as published in [16]). (a) A well-described amplification peak is identified on chromosome 12p12.1 with MCR (red dotted lines) near to but not containing the known lung cancer oncogene KRAS. Because there are more than two apparent passenger events in this region, the leave-1-out peak (blue dotted lines) also does not contain KRAS. However, RegBounder (green dotted lines) produces a wider peak that captures KRAS. (b) An amplification peak on chromosome 5p15.33 contains hTERT, the catalytic subunit of the human telomerase holoenzyme, within the MCR (red dotted lines). In this case, RegBounder (green dotted lines) produces a narrower peak region than the corresponding leave-1-out peak (blue dotted lines), demonstrating the ability of RegBounder to achieve a greater balance between peak region size and accuracy. In both (a) and (b), the y-axis depicts the amplification G- score and the x-axis denotes position along the corresponding chromosome. G-score G-score Mermel et al. Genome Biology 2011, 12:R41 Page 10 of 14 http://genomebiology.com/2011/12/4/R41 Peak Region Size As Function of Driver Frequency Peak Region Size As Function of Sample Size (a) (b) (n = 500 samples) (5% Driver Frequency) MCR MCR Leave−1−Out Leave−1−Out RegBounder 75% RegBounder 75% 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 600 700 800 Driver Frequency (%) Sample Size RegBounder vs. Theoretically Optimal Peak Region (c) (n = 500 samples) Theoretical Minimum Peaks (75% confidence) RegBounder Peaks (75% confidence) 0 1 2 3 4 5 6 7 8 9 10 Driver Frequency (%) Figure 7 Specificity of peak finding algorithms. (a,b) The median size of the peak regions produced by the MCR (red), leave-1-out (blue), and RegBounder (green, 75% confidence) are shown as a function of driver frequency (a) and sample size (b). In (a), data are derived from 10,000 simulated chromosomes across 500 samples in which the driver frequency varied from 1 to 10%. In (b), data are derived from 10,000 simulated chromosomes across a variable number of samples in which the driver frequency was fixed at 5%. (c) Comparison of the peak region sizes obtained by RegBounder (green line) with the theoretically minimal peak region sizes (black line) that could be obtained by any peak finding algorithm with a similar confidence level (Supplementary Methods in Additional file 1). Error-bars represent the mean ± standard error of the mean (some are too small to be visible). RegBounder regions are, on average, only 19% larger optimal trade-off between statistical confidence and than the theoretically minimal peak region size for a peak resolution than previous heuristic approaches. wide range of driver frequencies (Figure 7c) and confi- dence levels (Supplementary Figure S6 in Additional file Source code and module availability 10). These theoretically minimal peak region sizes were The MATLAB source code for the GISTIC2.0 pipeline, derived from the distribution of distances between the along with a precompiled unix executable, will be avail- target gene and the MCR in our simulations (Supple- able for download at [38]. In addition, the entire pipe- mentary Methods in Additional file 1). Our simulations line can be accessed through the GenePattern analysis reveal that RegBounder is capable of producing smaller portal at [39]. peak regions than the ‘leave-k-out’ approach while In addition to including all the methodological simultaneously achieving greater target gene recall improvements described in this manuscript, the GIS- (compare Figures 5b and 7a; ‘RegBounder 75%’ versus TIC2.0 source code has been designed to make efficient ‘leave-1-out’, for driver frequencies > 5%). Thus, use of memory in storing segmented copy-number data RegBounder is a robust algorithm for peak region (Supplementary Methods in Additional file 1). This boundary determination that demonstrates a more improved memory efficiency should allow users with Median Peak region size (markers) Median Peak region size (markers) Median Peak region size (markers) Mermel et al. Genome Biology 2011, 12:R41 Page 11 of 14 http://genomebiology.com/2011/12/4/R41 limited computational resources to run GISTIC2.0 on these assumptions are violated, RegBounder’sperfor- typical size datasets, and will be increasingly important mance may be worse than our simulations suggest. for all users as the density of copy-number measuring While the arbitrated peel-off approach described in this platforms continues its rapid rise. manuscript reflects a more sensitive way of identifying independently targeted regions of amplification and dele- Conclusions tion than our prior approach, it is still an imperfect attempt to decipher the complexity of cancer copy-num- We describe a number of analytical improvements to ber alterations. One major limitation stems from the fact the standard copy-number analysis workflow that that array-based measurements map SCNAs onto a linear increase the sensitivity and specificity with which driver genes may be localized. We also demonstrate the utility reference genome. However, many SCNAs are preceded of each of these changes using both simulated and real by rearrangement events that juxtapose genomic regions cancer copy-number datasets. While these changes have separated by great physical distance in the germline (even been specifically implemented in GISTIC 2.0, the chal- different chromosomes) [40,41]. This level of detailed lenges we describe apply broadly to the general task of structural information is impossible to infer from probe- identifying significantly aberrant regions of SCNA in level copy-number estimates but can be obtained by cancer, and we anticipate that the approaches we have sequencing paired-end libraries [13]. Indeed, we anticipate described can be adapted to other copy-number analysis that copy-number information derived from shotgun workflows. sequencing of cancer samples will become more common The procedure we outline enables data-driven estima- as sequencing costs continue to plummet [42]. Tools for tion of the background rates of SCNA and how these estimating and segmenting copy-number values from rates vary with features of the SCNA, such as length or sequencing coverage data already exist [5], and these seg- amplitude. Thespecifictrendswehaveobservedare mented copy-number profiles can, with only slight modifi- likely to depend on the resolution and characteristics of cation, be run through the GISTIC 2.0 workflow. Fully the measuring platform used to generate our datasets exploiting the level of detailed information provided by (the Affymetrix 250K StyI and SNP6.0 arrays). As more these technologies will, however, require a significant cancer samples are characterized using higher-resolution extension of the background mutation model to include array and sequencing platforms, new trends are likely to the probability of random genomic rearrangements, as emerge. Further improvements would account for such well as the ability to perform significance analysis, segment peel-off, and peak finding across non-contiguous regions trends, possibly taking into account additional features of the reference genome. The data provided by these that may determine SCNA background rates, such as the presence of known fragile sites of the genome or the sequencing efforts should lead to new insights into the cel- surrounding sequence context. Indeed, we and others lular and molecular processes underlying SCNA genera- have recently shown that somatic deletions frequently tion in different cancer types, and will allow for the occur in genes with large genomic footprints [4,6], sug- development of vastly more detailed and accurate models gesting the existence of a contextual bias in the rate of of the background mutation rate of such events during somatic deletion that is presently unaccounted for in tumor development. our background mutation model. Our probabilistic scor- ing framework allows such trends to be accounted for Materials and methods once the background model has been specified. Full methods are available in the Supplementary Materi- For the significant SCNAs, the background rate esti- als (Additional file 1) [43-46]. mates also enable the delineation of regions likely to con- tain the target genes at predetermined confidence. Additional material RegBounder, the algorithm we devised to assign these boundaries, is more robust than either MCR- or ‘leave-k- Additional file 1: Supplementary Methods. Supplementary Methods contains the full description of the GISTIC2.0 method and details of the out’-based methods. RegBounder achieves this higher specific analyses presented in this manuscript. sensitivity by producing wider peak regions when the Additional file 2: Supplementary Figure S1: Ziggurat number of informative segments at a driver locus is Deconstruction. (a) A hypothetical segmented chromosome (green line) small, but we find that RegBounder performs well com- is deconstructed with the simplified procedure used by Ziggurat Deconstruction (ZD) to initialize background SCNA rates. Dotted red and pared to the theoretically optimal performance. However, blue lines denote the length and amplitude of amplified and deleted RegBounder’s underlying assumptions may not always be SCNAs, respectively, while solid red and blue lines denote the result of satisfied, including the assumption that each peak region merging the SCNA with the closest adjacent segment. (b) The same hypothetical segmented chromosome (green line) is deconstructed using contains asingledominanttargetgeneand theexpecta- the more flexible procedure of subsequent rounds of ZD. Here, the ZD is tion that copy-number breakpoints are independently performed with respect to up to two basal levels (dotted magenta lines) distributed around the driver locus. To the extent that Mermel et al. Genome Biology 2011, 12:R41 Page 12 of 14 http://genomebiology.com/2011/12/4/R41 Abbreviations that are fit to the data, allowing for amplified and deleted SCNAs to be CNV: copy number variation; GBM: glioblastoma multiforme; GISTIC: Genomic superimposed. Identification of Significant Targets in Cancer; MCR: minimal common region; Additional file 3: Supplementary Figure S2: distribution of SCNA SCNA: somatic copy number alteration; SNP: single nucleotide length and amplitudes. Two-dimensional histogram showing the polymorphism; TCGA: The Cancer Genome Atlas; ZD: Ziggurat frequency (z-axis) of copy number events as a function of length (x-axis) Deconstruction. and amplitude (y-axis). Frequency is plotted on a log-scale to facilitate visualization of very low frequency copy number events. Acknowledgements This work was supported by a Genome Characterization Center Grant Additional file 4: Supplementary Table S1: comparison of amplitude (U24CA143867) awarded as part of the NCI/NHGRI funded Cancer Genome and length-based filtering of SCNAs. Supplementary Table 1 compares Atlas (TCGA) project. CHM was supported by Medical Scientist Training the GISTIC results obtained using low and high amplitude thresholds Program (MSTP) Award Number T32GM07753 from the National Institute of with those obtained using a focal length threshold on 178 GBM samples. General Medical Sciences. RB was supported by NIH K08CA122833, a V Additional file 5: Supplementary Figure S3: distribution of driver Foundation Scholarship, and the Doris Duke Charitable Foundation. The length and amplitudes. Driver SCNAs are typically of shorter length and content is solely the responsibility of the authors and does not necessarily higher amplitude than random passenger SCNAs. (a,b) Here we show represent the official views of the National Institute of General Medical the cumulative frequency distribution of SCNA amplitudes (a) and Sciences or the National Institutes of Health. lengths (b) for SCNAs covering significantly amplified regions identified by GISTIC (’Driver SCNAs’, red line) or by a similar number of randomly Author details chosen non-driver regions (’Random SCNAs’, blue line). 1 Cancer Program, The Broad Institute of MIT and Harvard, 7 Cambridge Additional file 6: Supplementary Table S2: comparison of Center, Cambridge, MA 02142, USA. Department of Medical Oncology, Dana GeneGISTIC and standard GISTIC deletions analysis. Supplementary Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA. Table 2 compares the GISTIC results obtained using the standard GISTIC Department of Cancer Biology, Dana Farber Cancer Institute, 44 Binney deletions analysis with those obtained using GeneGISTIC on 178 GBM Street, Boston, MA 02115, USA. The Center for Cancer Genome Discovery, sanples. Dana Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA. Additional file 7: Supplementary Figure S4: GeneGISTIC versus Authors’ contributions standard GISTIC. (a) GeneGISTIC helps identify genes subject to non- RB and GG developed and coded the original GISTIC algorithm. CHM, SES, overlapping deletion, such as NF1. The left panel shows the 12 samples RB, and GG developed and coded the algorithmic modifications contained with focal deletions affecting NF1, many of which do not overlap. As a in GISTIC 2.0. CHM, MM, RB, and GG conceived and designed the present result, the standard GISTIC marker score (blue line, right panel) has study. CHM, SES, and BH debugged and packaged the GISTIC 2.0 software multiple local maxima over NF1. By contrast, the GeneGISTIC score release. CHM, MM, RB, and GG wrote the manuscript. All authors read and counts all of these deletions as contributing to the NF1 score, resulting approved the final manuscript. in a score for NF1 (red line, right panel) that is significantly greater than that assigned to any of the individual markers covering NF1. (b) Received: 18 August 2010 Revised: 14 February 2011 GeneGISTIC does not score deletions occurring outside of genes. The left Accepted: 28 April 2011 Published: 28 April 2011 panel shows a region of focal deletion occurring just outside the PCHD9 gene on chromosome 13. These deletions result in a peak in the markers deletion score (blue line, right panel) that is not detected by GeneGISTIC. References 1. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100:57-70. Additional file 8: Supplementary Table S3: new peaks detected by 2. Stratton MR, Campbell PJ, Futreal PA: The cancer genome. Nature 2009, arbitrated peel-off. Supplementary Table 3 compares the GISTIC results 458:719-724. obtained using the standard peel-off algorithm with those obtained 3. Santarius T, Shipley J, Brewer D, Stratton MR, Cooper CS: A census of using arbitrated peel-off on 178 GBM samples. amplified and overexpressed human cancer genes. Nat Rev Cancer 2010, Additional file 9: Supplementary Figure S5: total recovery of 10:59-64. secondary driver peaks. This figure shows the results from 10,000 4. Beroukhim R, Mermel C, Porter D, Wei G, Raychaudhuri S, Donovan J, simulations of 300 samples in which a primary driver event is present in Barretina J, Boehm J, Dobson J, Urashima M: The landscape of somatic 10% of samples and a secondary driver event is present in 5% of copy-number alteration across human cancers. Nature 2010, 463:899-905. samples. In these simulations, we vary the fraction of overlap between 5. Chiang D, Getz G, Jaffe D, O’Kelly M, Zhao X: High-resolution mapping of driver events from 100% (total dependence) to 0% (total independence). copy-number alterations with massively parallel sequencing. Nat Here we present to the total recovery of the secondary driver peak in Methods 2009, 6:99-103. GISTIC runs using arbitrated peel-off (left panel) or the standard peel-off 6. Bignell GR, Greenman CD, Davies H, Butler AP, Edkins S, Andrews JM, (right panel). The red (left panel) or blue (right panel) lines show the Buck G, Chen L, Beare D, Latimer C, Widaa S, Hinton J, Fahey C, Fu B, fraction of secondary driver peaks identified in independent GISTIC peaks Swamy S, Dalgliesh GL, Teh BT, Deloukas P, Yang F, Campbell PJ, (that is, not containing the primary driver event), as is shown in Figure Futreal PA, Stratton MR: Signatures of mutation and selection in the 4b. The black lines show the fraction of secondary driver peaks identified cancer genome. Nature 2010, 463:893-898. in dependent peaks (that is, a peak containing both the primary and 7. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, secondary driver events), and the green lines show the total recall of Davies H, Teague J, Butler A, Stevens C, Edkins S, O’Meara S, Vastrik I, secondary driver peaks (in any peak). Error-bars representing the mean ± Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, standard error of the mean are drawn, but may be smaller than the Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, point used to represent the mean and hence not be visible. Hinton J, Jenkinson A, Jones D, et al: Patterns of somatic mutation in Additional file 10: Supplementary Figure S6: comparison of human cancer genomes. Nature 2007, 446:153-158. RegBounder to theoretically optimal peaks. Comparison between the 8. Merlo LM, Pepper JW, Reid BJ, Maley CC: Cancer as an evolutionary and peak region sizes obtained by RegBounder (green line) with the ecological process. Nat Rev Cancer 2006, 6:924-935. theoretically minimal peak region sizes (black line) that could be 9. Network CGAR: Comprehensive genomic characterization defines human obtained by a similarly confident peak finding algorithm (Supplementary glioblastoma genes and core pathways. Nature 2008, 455:1061-1068. Methods in Additional file 1) at 50% (left) and 95% (right) confidence. 10. McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Error-bars representing the median ± standard error of the mean are Mastrogianakis GM, Olson JJ, Mikkelsen T, Lehman N, Aldape K, Yung WK, drawn, but may be smaller than the points used to represent the Bogler O, Weinstein JN, VandenBerg S, Berger M, Prados M, Muzny D, median and hence not be visible. Morgan M, Scherer S, Sabo A, Nazareth L, Lewis L, Hall O, Zhu Y, Ren Y, Alvi O, Yao J, Hawes A, Jhangiani S, Fowler G, et al: Comprehensive Mermel et al. Genome Biology 2011, 12:R41 Page 13 of 14 http://genomebiology.com/2011/12/4/R41 genomic characterization defines human glioblastoma genes and core 22. Bass AJ, Watanabe H, Mermel CH, Yu S, Perner S, Verhaak RG, Kim SY, pathways. Nature 2008, 455:1061-1068. Wardwell L, Tamayo P, Gat-Viks I, Ramos AH, Woo MS, Weir BA, Getz G, 11. Pleasance E, Cheetham R, Stephens P, McBride D, Humphray S, Beroukhim R, O’Kelly M, Dutt A, Rozenblatt-Rosen O, Dziunycz P, Greenman C, Varela I, Lin M, Ordóñez G, Bignell G: A comprehensive Komisarof J, Chirieac LR, Lafargue CJ, Scheble V, Wilbertz T, Ma C, Rao S, catalogue of somatic mutations from a human cancer genome. Nature Nakagawa H, Stairs DB, Lin L, Giordano TJ, et al: SOX2 is an amplified 2009, 463:191-196. lineage-survival oncogene in lung and esophageal squamous cell 12. Pleasance ED, Stephens PJ, O’Meara S, McBride DJ, Meynert A, Jones D, carcinomas. Nat Genet 2009, 41:1238-1242. Lin ML, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, 23. Diskin SJ, Eck T, Greshock J, Mosse YP, Naylor T, Stoeckert CJ, Weber BL, Ordonez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Maris JM, Grant GR: STAC: A method for testing the significance of DNA Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, copy number aberrations across multiple array-CGH experiments. McLaughlin SF, Peckham HE, Tsung EF, et al: A small-cell lung cancer Genome Res 2006, 16:1149-1158. genome with complex signatures of tobacco exposure. Nature 2010, 24. Guttman M, Mies C, Dudycz-Sulicz K, Diskin SJ, Baldwin DA, Stoeckert CJ, 463:184-190. Grant GR: Assessing the significance of conserved genomic aberrations 13. Stephens PJ, McBride DJ, Lin ML, Varela I, Pleasance ED, Simpson JT, using high resolution genomic microarrays. PLoS Genet 2007, 3:e143. Stebbings LA, Leroy C, Edkins S, Mudie LJ, Greenman CD, Jia M, Latimer C, 25. Taylor BS, Barretina J, Socci ND, Decarolis P, Ladanyi M, Meyerson M, Teague JW, Lau KW, Burton J, Quail MA, Swerdlow H, Churcher C, Singer S, Sander C, Gibson G: Functional copy-number alterations in Natrajan R, Sieuwerts AM, Martens JW, Silver DP, Langerod A, Russnes HE, cancer. PLoS ONE 2008, 3:e3179. Foekens JA, Reis-Filho JS, van ‘t Veer L, Richardson AL, Borresen-Dale AL, 26. Shah SP: Computational methods for identification of recurrent copy et al: Complex landscapes of somatic rearrangement in human breast number alteration patterns by array CGH. Cytogenet Genome Res 2008, cancer genomes. Nature 2009, 462:1005-1010. 123:343-351. 14. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, 27. Leach NT, Rehder C, Jensen K, Holt S, Jackson-Cook C: Human Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, chromosomes with shorter telomeres and large heterochromatin Markowitz SD, Willis J, Dawson D, Willson JK, Gazdar AF, Hartigan J, Wu L, regions have a higher frequency of acquired somatic cell aneuploidy. Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Mech Ageing Dev 2004, 125:563-573. Kinzler KW, Velculescu VE: The consensus coding sequences of human 28. Li C, Hung Wong W: Model-based analysis of oligonucleotide arrays: breast and colorectal cancers. Science 2006, 314:268-274. model validation, design issues and standard error application. Genome 15. Beroukhim R, Getz G, Nghiemphu L, Barretina J, Hsueh T, Linhart D, Biol 2001, 2:RESEARCH0032. Vivanco I, Lee JC, Huang JH, Alexander S, Du J, Kau T, Thomas RK, Shah K, 29. Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Soto H, Perner S, Prensner J, Debiasi RM, Demichelis F, Hatton C, Rubin MA, expression index computation and outlier detection. Proc Natl Acad Sci Garraway LA, Nelson SF, Liau L, Mischel PS, Cloughesy TF, Meyerson M, USA 2001, 98:31-36. Golub TA, Lander ES, Mellinghoff IK, et al: Assessing the significance of 30. Bolstad BM, Collin F, Simpson KM, Irizarry RA, Speed TP: Experimental chromosomal aberrations in cancer: methodology and application to design and low-level analysis of microarray data. Int Rev Neurobiol 2004, glioma. Proc Natl Acad Sci USA 2007, 104:20007-20012. 60:25-58. 16. Weir BA, Woo MS, Getz G, Perner S, Ding L, Beroukhim R, Lin WM, 31. Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Province MA, Kraja A, Johnson LA, Shah K, Sato M, Thomas RK, Barletta JA, Ally A, Cao M, Birch P, Brown-John M, Fernandes N, Go A, Kennedy G, Borecki IB, Broderick S, Chang AC, Chiang DY, Chirieac LR, Cho J, Fujii Y, Langlois S, Eydoux P, Friedman JM, Marra MA: Assessment of algorithms Gazdar AF, Giordano T, Greulich H, Hanna M, Johnson BE, Kris MG, Lash A, for high throughput detection of genomic copy number variation in Lin L, Lindeman N, et al: Characterizing the cancer genome in lung oligonucleotide microarray data. BMC Bioinformatics 2007, 8:368. adenocarcinoma. Nature 2007, 450:893-898. 32. Hupé P, Stransky N, Thiery J-P, Radvanyi F, Barillot E: Analysis of array CGH 17. Lin WM, Baker AC, Beroukhim R, Winckler W, Feng W, Marmion JM, Laine E, data: from signal ratio to gain and loss of DNA regions. Bioinformatics Greulich H, Tseng H, Gates C, Hodi FS, Dranoff G, Sellers WR, Thomas RK, 2004, 20:3413-3422. Meyerson M, Golub TR, Dummer R, Herlyn M, Getz G, Garraway LA: 33. Olshen AB, Venkatraman ES, Lucito R, Wigler M: Circular binary Modeling genomic diversity and tumor dependency in malignant segmentation for the analysis of array-based DNA copy number data. melanoma. Cancer Res 2008, 68:664-673. Biostatistics 2004, 5:557-572. 18. Firestein R, Bass AJ, Kim SY, Dunn IF, Silver SJ, Guney I, Freed E, Ligon AH, 34. Venkatraman ES, Olshen AB: A faster circular binary segmentation Vena N, Ogino S, Chheda MG, Tamayo P, Finn S, Shrestha Y, Boehm JS, algorithm for the analysis of array CGH data. Bioinformatics 2007, Jain S, Bojarski E, Mermel C, Barretina J, Chan JA, Baselga J, Tabernero J, 23:657-663. Root DE, Fuchs CS, Loda M, Shivdasani RA, Meyerson M, Hahn WC: CDK8 is 35. Nilsson B, Johansson M, Al-Shahrour F, Carpenter AE, Ebert BL: Ultrasome: a colorectal cancer oncogene that regulates beta-catenin activity. Nature efficient aberration caller for copy number studies of ultra-high 2008, 455:547-551. resolution. Bioinformatics 2009, 25:1078-1079. 19. Chiang DY, Villanueva A, Hoshida Y, Peix J, Newell P, Minguez B, 36. Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical LeBlanc AC, Donovan DJ, Thung SN, Sole M, Tovar V, Alsinet C, Ramos AH, and powerful approach to multiple testing. J R Stat Soc B (Methodological) Barretina J, Roayaie S, Schwartz M, Waxman S, Bruix J, Mazzaferro V, 1995, 57:289-300. Ligon AH, Najfeld V, Friedman SL, Sellers WR, Meyerson M, Llovet JM: Focal 37. Sanchez-Garcia F, Akavia UD, Mozes E, Pe’er D: JISTIC: identification of gains of VEGFA and molecular classification of hepatocellular carcinoma. significant targets in cancer. BMC Bioinformatics 2010, 11:189. Cancer Res 2008, 68:6779-6788. 38. GISTIC 2 Manuscript and Software Download Page. [http://www. 20. Etemadmoghadam D, deFazio A, Beroukhim R, Mermel C, George J, Getz G, broadinstitute.org/cancer/pub/GISTIC2]. Tothill R, Okamoto A, Raeder MB, Harnett P, Lade S, Akslen LA, Tinker AV, 39. GenePattern. [http://www.broadinstitute.org/cancer/software/genepattern/]. Locandro B, Alsop K, Chiew YE, Traficante N, Fereday S, Johnson D, Fox S, 40. Stephens PJ, Greenman CD, Fu B, Yang F, Bignell GR, Mudie LJ, Pleasance ED, Lau KW, Beare D, Stebbings LA, McLaren S, Lin ML, Sellers W, Urashima M, Salvesen HB, Meyerson M, Bowtell D, Bowtell D, Chenevix-Trench G, Green A, Webb P, deFazio A, et al: Integrated genome- McBride DJ, Varela I, Nik-Zainal S, Leroy C, Jia M, Menzies A, Butler AP, wide DNA copy number and expression analysis identifies distinct Teague JW, Quail MA, Burton J, Swerdlow H, Carter NP, Morsberger LA, mechanisms of primary chemoresistance in ovarian carcinomas. Clin Iacobuzio-Donahue C, Follows GA, Green AR, Flanagan AM, Stratton MR, Cancer Res 2009, 15:1417-1427. et al: Massive genomic rearrangement acquired in a single catastrophic 21. Northcott PA, Nakahara Y, Wu X, Feuk L, Ellison DW, Croul S, Mack S, event during cancer development. Cell 2011, 144:27-40. Kongkham PN, Peacock J, Dubuc A, Ra Y-S, Zilberberg K, McLeod J, 41. Dahlback HS, Brandal P, Meling TR, Gorunova L, Scheie D, Heim S: Genomic Scherer SW, Sunil Rao J, Eberhart CG, Grajkowska W, Gillespie Y, Lach B, aberrations in 80 cases of primary glioblastoma multiforme: Grundy R, Pollack IF, Hamilton RL, Van Meter T, Carlotti CG, Boop F, Pathogenetic heterogeneity and putative cytogenetic pathways. Genes Bigner D, Gilbertson RJ, Rutka JT, Taylor MD: Multiple recurrent genetic Chromosomes Cancer 2009, 48:908-924. events converge on control of histone lysine methylation in 42. Metzker M: Sequencing technologies - the next generation. Nat Rev Genet medulloblastoma. Nat Genet 2009, 41:465-472. 2009, 11:31-46. Mermel et al. Genome Biology 2011, 12:R41 Page 14 of 14 http://genomebiology.com/2011/12/4/R41 43. The Cancer Genome Atlas Data Portal, GBM Publication. [http://tcga-data. nci.nih.gov/docs/publications/gbm_2008/]. 44. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, Shapero MH, de Bakker PI, Maller JB, Kirby A, Elliott AL, Parkin M, Hubbell E, Webster T, Mei R, Veitch J, Collins PJ, Handsaker R, Lincoln S, Nizzari M, Blume J, Jones KW, Rava R, Daly MJ, Gabriel SB, Altshuler D: Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 2008, 40:1166-1174. 45. Schwarz G: Estimating the dimension of a model. Ann Statist 1978, 6:461-464. 46. Holland AJ, Cleveland DW: Boveri revisited: chromosomal instability, aneuploidy and tumorigenesis. Nat Rev Mol Cell Biol 2009, 10:478-487. doi:10.1186/gb-2011-12-4-r41 Cite this article as: Mermel et al.: GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biology 2011 12:R41. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color ﬁgure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

Journal

Genome Biology – Springer Journals

Published: Apr 28, 2011

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers

References (93)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies