Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Automated brain tumor identification using magnetic resonance imaging: A systematic review and meta-analysis

Automated brain tumor identification using magnetic resonance imaging: A systematic review and... Background. Automated brain tumor identification facilitates diagnosis and treatment planning. We evaluate the performance of traditional machine learning (TML) and deep learning (DL) in brain tumor detection and segmen- tation, using MRI. Methods. A systematic literature search from January 2000 to May 8, 2021 was conducted. Study quality was as- sessed using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Detection meta-analysis was per - formed using a unified hierarchical model. Segmentation studies were evaluated using a random effects model. Sensitivity analysis was performed for externally validated studies. Results. Of 224 studies included in the systematic review, 46 segmentation and 38 detection studies were eligible for meta-analysis. In detection, DL achieved a lower false positive rate compared to TML; 0.018 (95% CI, 0.011 to 0.028) and 0.048 (0.032 to 0.072) (P < .001), respectively. In segmentation, DL had a higher dice similarity coefficient (DSC), particularly for tumor core (TC); 0.80 (0.77 to 0.83) and 0.63 (0.56 to 0.71) (P < .001), persisting on sensitivity analysis. Both manual and automated whole tumor (WT) segmentation had “good” (DSC ≥ 0.70) performance. Manual TC segmentation was superior to automated; 0.78 (0.69 to 0.86) and 0.64 (0.53 to 0.74) (P = .014), respec- tively. Only 30% of studies reported external validation. Conclusions. The comparable performance of automated to manual WT segmentation supports its integration into clinical practice. However, manual outperformance for sub-compartmental segmentation highlights the need for further development of automated methods in this area. Compared to TML, DL provided superior performance for detection and sub-compartmental segmentation. Improvements in the quality and design of studies, including ex- ternal validation, are required for the interpretability and generalizability of automated models. Key Points • Human expertise outperformed automated methods in sub-compartmental segmentation. • DL performed superiorly to TML for detection and sub-compartmental segmentation. • Transparency and generalizability of models should be improved. © The Author(s) 2022. Published by Oxford University Press, the Society for Neuro-Oncology and the European Association of Neuro-Oncology. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Kouli et al. Automated brain tumor detection and segmentation Importance of the Study Despite the increasing research on artificial automated methods, deep learning was found intelligence techniques in medical imaging, to be superior to traditional machine learning their safe implementation into clinical practice in detection and sub-compartmental segmen- depends on rigorous and generalizable evi- tation, but explaining this was hindered by the dence. This study systematically evaluated the paucity in reported methods of model inter- performance of automated brain tumor detec- pretability. Less than a third of studies reported tion and segmentation methods, and assessed external validation of their automated method. the quality of reporting using the Checklist for The variability found in study reporting under- Artificial Intelligence in Medical Imaging guide- mines the credibility of automated methods, line. Although automated and manual methods impacting their benefit for patients and health in whole tumor segmentation performed com- systems. Hence, there is a need for adher- parably, manual methods performed better ence to international reporting standards and in sub-compartmental segmentation. Within guidelines. Brain tumors present a significant burden on healthcare assessing the quality of studies in this field. We present worldwide due to the neurological deficits produced and the largest systematic review and meta-analysis that ob- subsequent poor prognosis, with an average 5-year survival jectively evaluates performance of automated detection of 35% in malignant subtypes. MRI is the gold standard mo- and segmentation techniques and assesses the reporting dality engendering brain tumor diagnosis and subsequently quality of included studies. informing surgical intervention, radiotherapy planning, and chemotherapy. Inevitably, qualitative MRI assessment has always been subject to high inter-rater variability, as well as Materials and Methods being a notoriously laborious process. However, the emer- gence of Artificial Intelligence (AI) has sparked the hope of Search Strategy overcoming these limitations. This systematic review and meta-analysis were con- The advent of Computer-Aided Diagnosis (CAD) using ducted in accordance with the Preferred Reporting Items AI can potentially improve brain tumor patient out- for Systematic Reviews and Meta-Analyses statement comes. Traditional machine learning (TML) techniques (PROSPERO; CRD42021247925). We searched PubMed, have become widely used for image classification but Web of Science, and Scopus for studies published be- are restricted by a requirement for specifying “feature tween January 1, 2000, and May 8, 2021. The search was vectors” for extraction from the raw data. Conversely, initially performed on June 19, 2020 and updated on May deep learning (DL) techniques provide effective and au- 8, 2021. The search strategy is found in the Supplementary tomatic representation of complex image features, which Appendix. The search was limited to publications written has contributed to their increased popularity, but the in- in English. The citations of included articles were hand- terpretation of automatically identified features remains searched to identify additional appropriate articles. a problem. In addition, both TML and DL techniques are vulnerable to overfitting and selection bias. Therefore, to safely use CAD in clinical settings, large robust studies Inclusion and Exclusion Criteria which evaluate their quality and generalizability are cru- cial. Holistic and standardized evaluation of scientific Studies were included if they developed or validated a reporting is facilitated by established guidelines, such as semi-automatic or fully automatic adult brain tumor detec- the recently proposed Checklist for Artificial Intelligence in tion or segmentation method using MRI. Exclusion criteria: Medical Imaging (CLAIM). (1) studies reporting tumor classification or tumor grading The research on AI in neuro-oncology imaging has methods only; (2) studies utilizing MRI spectroscopy only been amplified by the introduction of open access image for method development; (3) studies reporting methods on datasets, such as the annual Multimodal Brain Tumor pediatric, pituitary, and/or brainstem tumors only; (4) ab- Segmentation Challenge (BRATS). This provides the ideal stracts or conference proceedings; and (5) no performance foundation for an in-depth review to identify optimal au- metrics reported. tomated methods. Three former systematic reviews and meta-analyses evaluated performance of AI-related 8–10 techniques in neuro-oncological imaging. However, Study Selection and Data Extraction these focused on specific brain tumor types and whole tumor (WT) segmentation, and none have evaluated sub- Extracted citations were imported into the Rayyan system- compartmental segmentation nor addressed performance atic review site (https://www.rayyan.ai) for study selec- disparities between CAD and human expert segmentation. tion. Following removal of duplicates, titles and abstracts Moreover, there remains a paucity in comprehensively were screened, and full texts of relevant publications Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation reviewed. Study screening was completed by two inde- from meta-analysis. A  unified hierarchical summary re - pendent reviewers (O.K., J.D.S.), with disagreements re- ceiver operating characteristic model was developed for solved through a consensus-based approach with the the detection meta-analysis. Summary estimates of sensi- wider group. tivity and specificity with 95% CIs were derived using the automated methods, deep learning was found Two independent reviewers (O.K., A.H.) extracted study random-effects bivariate binomial model parameters and to be superior to traditional machine learning characteristics from included studies with disagreements equivalence equations of Harbord et  al. The reason for in detection and sub-compartmental segmen- resolved through consensus. Data extracted included: (1) using the hierarchical model is that it considers the corre- tation, but explaining this was hindered by the Author; (2) year; (3) dataset(s) utilized with the number lation between sensitivity and specificity, accounting for paucity in reported methods of model inter- of patients/images; (4) type of tumors(s) studied; (5) MRI within-study variability, as well as variability (also called pretability. Less than a third of studies reported modality; (6) performance evaluation metrics; (7) type heterogeneity) in effects between studies (ie, between- external validation of their automated method. of algorithm utilized; (8) feature extraction; (9) inference study variability). Receiver operating characteristic (ROC) The variability found in study reporting under- time in slice/second for segmentation; (10) user interac- curves were used to plot summary estimates of sensitivity mines the credibility of automated methods, tion (ie, automatic vs semi-automatic); and (11) validation against false positive rate (FPR, ie, 1-specificity). The ROC impacting their benefit for patients and health technique(s). curve plots also exhibit the uncertainty around the sum- systems. Hence, there is a need for adher- mary estimates via 95% confidence regions, and hetero- geneity between accuracy estimates via 95% prediction ence to international reporting standards and Reporting and Quality Evaluation regions. guidelines. Segmentation methods were evaluated using a random The reporting quality of studies was assessed according effects model, and reported in terms of pooled DSC, a to CLAIM. The risk of bias and applicability was assessed universally used and reported metric. The restricted max- using the Quality Assessment of Diagnostic Accuracy imum likelihood estimator was used to calculate the het- Studies 2 (QUADAS-2) guideline, with consideration of erogeneity variance (τ ). The inverse variance method some CLAIM items (see Supplementary Appendix). Three was used to calculate a pooled effect size. Knapp-Hartung reviewers (O.K., A.H., D.B.) independently appraised in- adjustments were used to calculate the confidence in- cluded studies with any disagreements resolved through terval. A  prerequisite for study inclusion in the meta- consensus. A “good” domain was deemed by its reporting analysis was reporting outcome of interest (ie, DSC), in in ≥70% of studies. combination with an SD. Subgroup analysis comparing tumor types was performed where possible. A compara- tive analysis was conducted to evaluate the performance Definitions of CAD versus human experts. Sensitivity analysis was performed looking at studies that only performed out-of- DL was referred to studies that utilized deep neural net- sample external validation. Subgroup or sensitivity anal- works as their method of choice. TML was referred to ysis was avoided when the number of studies in a group is methods not classified as DL. Detection studies were small (n < 5). Study heterogeneity was formally evaluated defined as those that reported performance results for 2 2 using Higgins’ inconsistency index (I ) (I > 50%  =  sig- techniques that identified the presence of a tumor in an nificant heterogeneity). All analyses were performed image. Segmentation studies were defined as those that in R (version 4.0.2, http://www.r-project.org/) using the reported performance results for techniques that seg- tidyverse, metaDTA, dmetar, meta, and ComplexUpset mented brain tumors, whether it was WT, tumor core (TC), packages. and/or enhancing tumor (ET) segmentations as defined by BRATS. Following previous work, dice similarity coef- ficient (DSC) of ≥0.7 was considered to represent “good” overlap. Results Statistical Analysis Our search identified 2367 records, of which 1515 re - cords were screened (Figure 1). An additional 22 texts A meta-analysis was conducted for both automated de- were identified through cross-referencing. Two-hundred tection and segmentation studies to compare DL with TML and sixty-two full texts were assessed for eligibility and methods and to evaluate the segmentation performance of 224 were included in the systematic review: 188 seg- CAD to that of manual experts. Studies providing perfor- mentation and 46 detection studies (10 studies reported mance metrics for their method on different datasets were both detection and segmentation results; see “Eligible assumed to be independent of each other. This is because Studies” in Supplementary Appendix). Forty-six segmen- we are interested in providing an overview of the two 15–60 39,42,45,61–95 tation and 38 detection studies were eligible methods rather than exact point estimates. for meta-analysis. For detection methods, contingency tables consisting of True Positive, False Positive, False Negative, and True Negative were constructed. For studies that did not directly Study Characteristics provide contingency tables, missing data were calcu- lated with Review Manager 5.3 (https://revman.cochrane. Study characteristics are shown in Supplementary Table org/) using sensitivity, specificity, and number of images. 1 (segmentation) and Supplementary Table 2 (detection). If neither contingency tables nor sufficient data were re - 40.6% (n =95) of studies used DL and 59.4% (n = 139) used ported for computation, then the study was excluded TML methods. There was a clear increase in the use of DL Kouli et al. Automated brain tumor detection and segmentation Additional records identified Medline/PubMed Web of science Scopus from citation searching (n = 692) (n = 814) (n = 861) (n = 22) Duplicated records removed (n = 852) Records screened based on title and abstract (n = 1515) Records excluded (n = 1231) Full text articles assessed (n = 262) Full text articles excluded (n = 38) - Abstracts or conference proceedings (n = 17) - Tumour grading/classification method (n = 9) - Unclear/No performance metrics (n = 7) - Method not applied to brain tumours (n = 3) Total articles included in the - Method applied on paediatric tumours only (n = 1) review - Method not applied on MRI scans (n = 1) (n = 224) Detection articles Segmentation articles (n = 46)* (n = 188)* Articles included in Articles included in meta-analysis meta-analysis (n = 38) (n = 46) Figure 1. Study selection flow diagram (*10 studies reported both detection and segmentation results). from 2018 (Supplementary Figure 1). Most studies utilized (n = 168; 71.8%) (Supplementary Figure 5). 55.1% (n = 129) a fully automated algorithm (n = 222; 94.9%). of studies reported internal validation. 31.2% (n  =  73) 80.7% (n = 189) used data from open-access repositories, used random split validation and 32.1% (n  =  75) used with BRATS being the most popular of them (n  =  156; resampling methods (Supplementary Figure 6). Overall, 66.7%). 29.0% (n  =  68) used local datasets, all of which less than a third of studies (n  =  70; 30%) performed ex- were retrospectively collected data. 11.9% (n  =  28) used ternal validation. Specifically, 49.5% ( n = 47/95) of DL and both local and public datasets. 2.1% (n = 5) did not specify 16.5% (n = 23/139) of TML studies reported external valida- dataset(s) used (Supplementary Figure 2). Publicly avail- tion (Supplementary Figure 7). Details of algorithm perfor- able datasets are detailed in Supplementary T able 3. mance and validation techniques of studies are found in The most studied tumors were high-grade gliomas Supplementary T able 4 (segmentation) and Supplementary (HGG) (n  =  173; 73.9%) and low-grade gliomas (LGG) Table 5 (detection). Regarding segmentation inference (n = 171; 73.1%), with 59.0% (n = 138) of studies involving time, DL methods performed the fastest (median: 0.2  s/ both (Supplementary Figure 3). 9.8% (n  =  23) did not re- MRI slice, interquartile range [IQR]: 0.1–0.9), whereas port the type of tumor studied. Regarding MRI sequences, fully automated TML methods achieved a median of 2.6 s T2 (n  =  169; 72.2%), fluid-attenuated inversion recovery (IQR: 1.1–12.6) and semi-automated techniques achieved (FLAIR) (n  =  165; 70.5%), T1-contrast enhanced (T1CE) 48.16  s (IQR: 6.2–134.9) (P < .001; Kruskal-Wallis test) (n = 164; 70.1%), and T1 (n = 143; 61.1%) modalities were the (Supplementary Figure 5). most studied (Supplementary Figure 4). 48.3% (n = 113) of studies combined all these sequences, and 20.1% (n = 47) Reporting Quality used just one for the algorithm development. A small mi- nority (n = 19, 8.1%) did not report the type of MRI used. Detailed CLAIM assessment is presented in Supplementary The most common metrics used for evaluating per- Table 6 (segmentation) and Supplementary Table 7 (detec- formance were DSC (n  =  168; 71.8%) and sensitivity tion). With respect to “good” reported CLAIM items, 95.3% Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation (n = 223) stated the source of the data (CLAIM item 7) and 0.73 (95% CI, 0.69 to 0.76; I   =  99.99%) for ET (Figure 3A; 86.8% (n  =  203) clearly reported how ground truths were Supplementary Table 11) were achieved. This persisted on derived (CLAIM items 14–18). Almost all studies reported sensitivity analysis of externally validated studies; a DSC detailed model structure and initialization of parameters of 0.85 (95% CI, 0.82 to 0.87; I   =  99.97%) was achieved (CLAIM items 22–24). 83.8% (n  =  196) clearly reported for WT and 0.76 (95% CI, 0.70 to 0.80; I   =  99.96%) for TC training procedures and hyperparameters in sufficient de - (Figure 3A; Supplementary T able 12). tail (CLAIM item 25) (Supplementary Figure 9). However, only 1.3% (n  =  3) of studies clarified missing TML versus DL segmentation meta-analysis—DL was com- data handling. No studies reported sample size calcula- parable to TML for WT segmentation, 0.86 (95% CI, 0.84 to 2 2 tions (CLAIM item 19). Less than two-thirds (n = 144, 61.5%) 0.88; I  = 99.99%) and 0.83 (95% CI, 0.80 to 0.87; I  = 99.99%; specified how data was partitioned (CLAIM item 20). Only P  =  .21), respectively (Figure 3A; Supplementary Table 32.5% (n = 76) of studies reported uncertainty around per- 11). This was relatively consistent on sensitivity analysis; formance metrics (CLAIM item 29). 67.1% (n = 157) studies 0.87 (95% CI, 0.85 to 0.88; I   =  100%) and 0.81 (95% CI, reported performing internal and/or external validation 0.73 to 0.89; I  = 99.94%; P = .10), respectively (Figure 3A; (CLAIM item 32). Just 2.6% (n = 6) specified inclusion and Supplementary T able 12). exclusion flow of participants or images (CLAIM item In terms of TC segmentation, DL achieved a statistically 33) and only 6% (n = 14) defined demographics and clinical significant higher DSC compared to TML, 0.80 (95% CI, 0.77 2 2 characteristics of cases in each partition (CLAIM item 34). to 0.83; I  = 99.97%) and 0.63 (95% CI, 0.56 to 0.71; I  = 100%; Ten studies made the algorithm source code publicly avail- P < .001). This remained unchanged on sensitivity analysis; able (CLAIM item 41; for available links to source codes see 0.80 (95% CI, 0.77 to 0.83; I   =  99.97%) and 0.64 (95% CI, Supplementary T able 8). 0.49 to 0.79; I  = 99.87%; P = .009), respectively. Risk of bias and applicability assessment—Detailed Finally, for ET segmentation, DL methods achieved QUADAS-2 assessment is presented in Supplementary higher DSC when compared to TML, 0.75 (95% CI, 0.72 to 2 2 Table 9 (segmentation) and Supplementary T able 10 (detec- 0.78; I  = 99.91%) and 0.69 (95% CI, 0.59 to 0.78; I  = 100%), tion). In the patient selection domain of risk of bias, 21.4% respectively. However, this did not reach statistical signifi - (n  =  50) studies were considered to have unclear or high cance (P = .17). risk of bias as they did not express the exclusion criteria in the utilized dataset(s). In the reference standard do- Subgroup analysis by tumor  type—Most studies (91.3%; main, 13.2% (n = 31) were deemed to have unclear or high n = 42/46) applied their segmentation method on gliomas risk of bias as they did not clearly define how the ground (91.3%; n = 42/46) HGG and 84.78% (n = 39/46; LGG), 10.87% truth segmentation was derived. In terms of applicability, (n = 5/46) on metastatic brain tumors, 4.35% (n = 2/46) on the main source of concern was in the index test domain; meningiomas, and 1.79% (n  =  1/46) on nerve sheath tu- 31.6% (n = 74) had high applicability concerns as they did mors. 58.69% of studies (n = 27/46) sufficiently categorized not validate the algorithm (Supplementary Figure 10). their segmentation results by tumor type required for sub- group analysis (Supplementary T able 13). Since few studies applied their segmentation tech- Meta-analysis niques to meningiomas and nerve sheath tumors, they could not be included in subgroup analyses. The sub- Detection meta-analysis—Thirty-eight detection studies group analysis thus compared HGG, LGG, and meta- provided sufficient data to construct contingency tables (69 static brain tumors. Only WT segmentation results for tables). Only one study performed an external validation. metastatic brain tumors were possible to compute due 28.9% (n  =  11; 20 tables) of studies utilized DL methods to limited studies. ET segmentation was predominantly and the remaining 71.1% (n  =  27; 49 tables) utilized TML performed on HGG, thereby excluding it from subgroup methods (Table 1). analysis. It was not possible to compare DL and TML Overall, the pooled sensitivity was 0.98 (95% CI, 0.97 to methods in diagnosing different types of tumors due to 0.99) and the FPR was 0.035 (95% CI, 0.025 to 0.048). DL the small number of studies. and TML had comparable sensitivity, but DL achieved a For WT segmentation, no difference was observed be- lower FPR compared to TML; 0.018 (95% CI, 0.011 to 0.028) tween HGG, LGG, and metastatic tumors, 0.83 (95% and 0.048 (95% CI, 0.032 to 0.072) (P < .001), respectively CI, 0.79 to 0.86; I   =  99.99%), 0.80 (95% CI, 0.74 to 0.86; (Figures 2A and 2B). 2 2 I   =  99.98%) and 0.80 (95% CI, 0.74 to 0.86; I   =  99.95%; P = .64), respectively (Figure 3B; Supplementary T able 13). For TC segmentation, a higher DSC was achieved for HGG Segmentation meta-analysis—Due to limited numbers compared to LGG, 0.67 (95% CI, 0.60 to 0.74; I  = 99.97%) of semi-automated studies, segmentation meta-analysis and 0.49 (95% CI, 0.37 to 0.61; I   =  99.98%; P  =  .0027), solely focused on fully automated methods. Forty-six respectively. fully automated segmentation studies provided sufficient data to be included in the meta-analysis. 34.8% (n = 16) of studies utilized DL and 65.2% (n = 30) utilized TML methods. Automated versus human expert segmentation—Only Less than half (n = 19; 41.3%) of studies performed external 30.4% (n  =  14/46) of studies provided sufficient data for validation. 97.8% (n = 45) of studies provided segmentation comparison between automated and expert manual seg- results for WT, 41.3% (n = 19) for TC and 39.1% (n = 18) for ET. mentation for WT and TC segmentation. All studies in- Overall, a DSC of 0.84 (95% CI, 0.82 to 0.87; I  = 99.99%) cluded multiple (>1) independent expert operators for for WT, 0.72 (95% CI, 0.67 to 0.76; I  = 99.99%) for TC, and generating ground truth segmentations; one study (7.1%) Kouli et al. Automated brain tumor detection and segmentation Table 1. Detection Meta-analysis Results Method Author Year TP FN FP TN Total Sensitivity Specificity Weighted Specificity Weighted Sensitivity DL Çinar and Yildirim 2020 147 8 0 98 253 0.948 1 5.129 4.467 DL Devanathan and Venkatachalapathy 2020 153 2 3 95 253 0.987 0.969 4.241 4.708 DL Gurunathan and Krishnan 2020 514 25 21 1752 2312 0.954 0.988 6.356 5.694 DL Rai et al. 2021 1232 141 135 2421 3929 0.897 0.947 6.417 6.582 DL Abd-Ellah et al. 2018 239 1 0 109 349 0.996 1 2.887 3.571 DL Atici et al. 2019 1082 220 110 2171 3583 0.831 0.952 6.424 6.567 DL Kaur and Ghandi 2020 20 0 0 30 50 1 1 2.003 0.991 DL Kaur and Ghandi 2020 52 0 0 22 74 1 1 1.343 1.889 DL Kaur and Ghandi 2020 140 0 0 20 160 1 1 0.877 2.958 DL Kaur and Ghandi 2020 238 12 18 238 506 0.952 0.93 5.983 6.155 DL Thangarajan and Chokkalingam 2020 159 10 7 93 269 0.941 0.93 5.501 5.835 DL Kalaiselvi et al. 2020 56 7 17 201 281 0.889 0.922 6.145 5.163 DL Rajinikanth et al. 2020 388 12 11 589 1000 0.97 0.982 6.092 5.725 DL Rajinikanth et al. 2020 387 13 9 591 1000 0.968 0.985 6.105 5.611 DL Rajinikanth et al. 2020 392 8 8 592 1000 0.98 0.987 5.948 5.45 DL Rajinikanth et al. 2020 395 5 4 596 1000 0.988 0.993 5.692 4.845 DL Rajinikanth et al. 2020 393 7 9 391 800 0.983 0.978 5.727 5.771 DL Rajinikanth et al. 2020 395 5 6 194 600 0.988 0.97 4.967 5.814 DL Huang et al. 2020 4244 52 34 6348 10678 0.988 0.995 6.36 6.373 DL Huang et al. 2020 397 6 12 480 895 0.985 0.976 5.805 5.83 TML Jayachandran and Dhanasekaran 2013 10 0 1 4 15 1 0.8 0.424 1.387 TML Deepa and Emmanuel 2018 68 2 0 11 81 0.971 1 1.517 1.927 TML Selvapandian and Manivannan 2018 47 3 3 72 125 0.94 0.96 3.22 1.897 TML Chen et al. 2021 238 9 3 54 304 0.964 0.947 2.98 2.475 TML Edalati-rad and Mosleh 2019 42 0 1 36 79 1 0.973 1.442 1.71 TML Dahshan et al. 2014 87 0 1 13 101 1 0.929 0.534 2.283 TML Song et al. 2019 242 7 1 56 306 0.972 0.982 2.853 2.346 TML Johnpeter and Ponnuchamy 2019 58 2 2 98 160 0.967 0.98 3.142 1.729 TML Amin et al. 2017 42 4 0 39 85 0.913 1 3.12 1.362 TML Amin et al. 2017 60 5 0 35 100 0.923 1 3.045 1.619 TML Amin et al. 2019 70 0 2 14 86 1 0.875 0.64 2.333 TML Amin et al. 2019 61 1 7 17 86 0.984 0.708 1.725 2.463 TML Amin et al. 2019 68 6 2 10 86 0.919 0.833 2.433 2.375 Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation Table 1. Continued Method Author Year TP FN FP TN Total Sensitivity Specificity Weighted Specificity Weighted Sensitivity TML Amin et al. 2019 69 0 5 12 86 1 0.706 0.646 2.473 TML Amin et al. 2019 69 0 4 15 88 1 0.789 0.72 2.432 TML Amin et al. 2019 72 0 4 10 86 1 0.714 0.543 2.469 TML Amin et al. 2019 290 0 10 106 406 1 0.914 1.481 2.544 TML Amin et al. 2019 296 0 5 105 406 1 0.955 1.445 2.487 TML Amin et al. 2019 301 50 5 50 406 0.858 0.909 3.363 2.558 TML Amin et al. 2019 296 0 10 100 406 1 0.909 1.422 2.55 TML Amin et al. 2019 306 11 0 89 406 0.965 1 3.151 2.187 TML Amin et al. 2019 295 10 1 100 406 0.967 0.99 3.175 2.298 TML Amin et al. 2019 70 4 1 11 86 0.946 0.917 2.117 2.236 TML Amin et al. 2019 70 0 2 14 86 1 0.875 0.64 2.333 TML Amin et al. 2019 74 6 0 6 86 0.925 1 1.798 2.045 TML Amin et al. 2019 71 0 3 12 86 1 0.8 0.59 2.419 TML Amin et al. 2019 74 0 0 12 86 1 1 0.518 1.969 TML Amin et al. 2019 70 3 0 13 86 0.959 1 1.965 1.916 TML Jayachandran and Dhanasekaran 2012 4 1 0 5 10 0.8 1 1.719 0.482 TML Wang et al. 2020 25 0 1 24 50 1 0.96 1.271 1.525 TML Kesav and Rajini 2020 43 1 1 21 66 0.977 0.955 1.91 1.891 TML Alam et al. 2019 38 1 0 1 40 0.974 1 0.193 1.806 TML Murali and Meena 2020 182 5 0 25 212 0.973 1 2.234 2.225 TML Arunkumar et al. 2018 20 0 1 19 40 1 0.95 1.146 1.457 TML Gupta and Khanna 2017 600 0 13 488 1101 1 0.974 2.358 2.511 TML Gupta and Khanna 2017 320 0 12 269 601 1 0.957 2.221 2.489 TML Bahadure et al. 2015 128 3 4 65 200 0.977 0.942 2.835 2.371 TML Dvorák et al. 2013 63 9 9 122 203 0.875 0.931 3.453 2.246 TML Sriramakrishnan et al. 2019 4441 70 60 78 4649 0.984 0.565 3.02 2.63 TML Patil and Hamde 2021 50 0 0 44 94 1 1 1.558 1.421 TML Patil and Hamde 2021 50 0 0 44 94 1 1 1.558 1.421 TML Anitha and Raja 2017 14 1 2 83 100 0.933 0.976 3.199 0.92 TML Anitha and Raja 2017 14 1 1 59 75 0.933 0.983 3.075 0.848 Kouli et al. Automated brain tumor detection and segmentation utilized two operators and 13 (92.9%) utilized four op- erators as part of the BRATS challenge (Supplementary Table 14). For WT segmentation, both achieved “good” perfor- mance, but higher DSC was achieved in the manual group than the automated group 0.86 (95% CI, 0.85 to 0.86; 2 2 I   =  99.90%) and 0.80 (95% CI, 0.73 to 0.87; I   =  99.98%; P = .11), respectively (Figure 3C; Supplementary Table 14). However, for TC segmentation, manual segmentation out- performed automated segmentation, 0.78 (95% CI, 0.69 to 2 2 0.86; I  = 99.94%) and 0.64 (95% CI, 0.53 to 0.74; I  = 99.98%; P = .014), respectively. For HGG tumors, manual segmentation outperformed automated 0.88 (95% CI, 0.87 to 0.88; I  = 28.85%) and 0.81 (95% CI, 0.74 to 0.87; I   =  99.98%; P  =  .015), respectively. Conversely, manual was comparable to automated seg- mentation for LGG; 0.84 (95% CI, 0.83 to 0.85; I  = 95.68%) and 0.79 (95% CI, 0.68 to 0.90, I   =  99.96%, P  =  .33), respectively. Discussion To date, this is the largest meta-analysis evaluating auto- mated brain tumor segmentation and detection methods. Automation provides benefits including elimination of human inter-rater variability and reduced inference time ; particularly DL methods, which showed an impressive me- dian inference time of 0.2 seconds/MRI slice. Previous studies have concluded that, in general, au- tomated methods are comparable to human expertise in 10,96 terms of performance. However, our research high- lights that this only holds true for WT segmentation in brain tumors. Notably, we found that manual methods outperformed automated techniques for TC segmenta- tion. Sub-compartmental segmentation, including TC, is a major influence on tumor progression monitoring and radiotherapy planning. Hence, our finding cautions the application of machine learning in all its potential uses in routine clinical practice and highlights the need for further research on sub-compartmental automated segmentation (TC and ET). Since most methods used conventional MRI scans (ie, T1, T2, T1CE, and FLAIR), future studies could combine these multimodal sequences with other special- ized MRI sequences to increase the number of features, assessing for potential enhanced segmentation results. 30 98 Soltaninejad et  al. and Durmo et  al. incorporated fea- tures obtained from diffusion-weighted and diffusion tensor imaging and showed promising results in the auto- mated identification of brain tumors. Including other MRI sequences in publicly available datasets, such as BRATS, could facilitate investigations into the diagnostic value of additional features. Regarding automated detection, we have replicated the findings of Cho et  al.’s systematic review on brain tumor metastasis ; DL had a significantly lower FPR than TML, whilst sensitivity between the two methods remained similar. To the best of our knowledge, there has been no previous evaluation of automated sub-compartmental segmentation of brain tumors. Our study extends confi - dence in DL to tumor segmentation; the DL group achieved Table 1. Continued Method Author Year TP FN FP TN Total Sensitivity Specificity Weighted Specificity Weighted Sensitivity TML Kebir et al. 2019 961 332 805 1630 3728 0.743 0.669 3.547 2.625 TML Lahmiri 2017 20 0 1 29 50 1 0.967 1.496 1.311 TML Simaiya et al. 2017 504 63 111 843 1521 0.889 0.884 3.536 2.598 TML Kalaiselvi et al. 2019 1683 73 154 2740 4650 0.958 0.947 3.539 2.61 TML Kalaiselvi et al. 2019 63 0 7 211 281 1 0.968 2.734 1.917 TML Tejas P and Padma 2021 74 6 0 20 100 0.925 1 2.749 1.875 DL, deep learning; FN, False Negative; FP, false Positive; TML, Traditional Machine Learning; TN, True Negative; TP, True Positive. Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation AB DL Detection methods (20 tables) TML Detection methods (49 tables) 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 HSROC curve HSROC curve Sensitivity Sensitivity Summary estimate Summary estimate 97.4% (95% CI 95.8–98.4) 98.6% (95% CI 97.4–99.2) 95% Confidence region 95% Confidence region False positive rate: False positive rate: 95% Predictive region 95% Predictive region 1.8% (95% CI 1.1–2.8) 4.8% (95% CI 3.2–7.2) 0.0 Data 0.0 Data 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (1 - Specificity) False positive rate (1 - Specificity) Figure 2. Hierarchical receiving operating curves (ROC) of (A) deep learning (DL) and (B) traditional machine learning (TML) studies included in detection meta-analysis. “good” (DSC ≥ 0.7) performance for all segmentation segmentation statistically outperformed automated seg- types (WT, TC, ET), whereas for TML, “good” performance mentation for HGG, both achieved “good” performance was limited to WT segmentation. This trend persisted (DSC ≥ 0.7). On the other hand, for LGG tumors, manual with sensitivity analysis investigating only externally val- and automated segmentation were statistically compa- idated studies, reinforcing these results. DL techniques rable in terms of performance; however, only manual seg- support the automatic identification of complex features mentation achieved “good” performance. This could be unlike TML, which requires hand-crafted feature vectors. because LGGs can simply conform to normal anatomy (eg, However, the advantages of DL remain ambiguous, due to expanding gyri), making them difficult to diagnose, espe - its “black box” nature; the interpretability of learned fea- cially when small. This further highlights the need for fu- tures and the explainability of the model’s decisions could ture work on improving machine learning performance to 3,4 be improved. Certain methods, such as saliency maps or segment LGG more accurately to achieve comparable re- feature attribution attempt to deduce how these learning sults to that of manual segmentation. algorithms detect complex features. However, just 2.1% Reporting guidelines reinforce robust evaluation and (n = 5) of studies reported such methods, hindering model generalizability of diagnostic models. The recent CLAIM interpretation. This highlights the importance of future checklist, developed on the foundations of earlier well-es- work reporting DL interpretation to improve comprehen- tablished guidelines, is the first to address AI applications sion and transparency of algorithmic predictions. in medical imaging. This is the first study to adopt this per - Van Kempen et  al. reported good performance of ma- tinent guideline for the comprehensive assessment of re- chine learning algorithms for glioma WT segmentation, porting quality for brain tumor identification. Although also showing that automated segmentation for both HGG over 70% of studies detailed data sources, model design, and LGG were comparable. Our subgroup analysis, strati- and ground truth definitions, only a minority reported fied by tumor type, showed “good” performance, and no missing data handling, data partitioning, study participant statistically significant difference between tumor types flow, and external validation. This is consistent with Yusuf for WT segmentation. However, this was not consistent et  al.’s systematic review which found poor reporting of for TC segmentation; both HGG and LGG tumors did not the study participant flow, the distribution of disease se- reach “good” performance as was evident for WT. This is verity, and model validation techniques within ML-based clinically pertinent, because of the aforementioned value diagnosis models. Such findings reiterate the necessity for of reliable automated sub-compartmental segmentation in studies to employ guidelines to aid their interpretation and treatment pathways. HGG TC segmentation performance reusability. This is paramount in ensuring reliable research was found to be significantly better than LGG. This may be is the basis of pioneering novel techniques into clinical due to LGG’s slow growth, lack of surrounding vasogenic practice. edema, and poor enhancement on MRI, making LGGs radi- The absence of external validation jeopardizes the ologically more difficult to identify. Moreover, HGGs are generalizability of models for clinical use. Our study highly proliferative tumors resulting in higher lesion con- highlights such a limitation, with only 41.3% (n  =  19/46) trast and enhancement, making them radiologically more of segmentation and 2.6% (n = 1/38) of detection studies noticeable. This study shows that although manual WT in the meta-analysis undertaking external validation. Sensitivity Sensitivity All studies Externally validated only All studies All studies HGG studies only LGG studies only Kouli et al. Automated brain tumor detection and segmentation WT TC ET WT TC All DL TML 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pooled DSC WT TC ET HGG LGG MET 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pooled DSC WT TC WT WT Automated Human 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pooled DSC Figure 3. Segmentation meta-analysis for (A) all studies and externally validated only studies, stratified by deep learning (DL) and traditional ma- chine learning (TML), (B) subgroup segmentation meta-analysis by tumor type (high-grade glioma [HGG], low-grade glioma [LGG], and metastatic brain tumor [MET]), and (C) automated versus human segmentation. To address this, we performed a sensitivity analysis on ensure that future studies externally validate their ma- segmentation models that were externally validated, chine learning algorithms, authors should utilize the which showed similar results to the original analysis. To CLAIM guideline when reporting their study. In addition, Segmentation type Segmentation type Segmentation type Neuro-Oncology Advances All studies Externally validated only All studies All studies HGG studies only LGG studies only Kouli et al. Automated brain tumor detection and segmentation journals should encourage authors to provide details writing of the manuscript. O.K., K.H.-I., and J.D.S. guarantee the about elements of reporting outlined CLAIM for edi- integrity of the work. A tors and reviewers during the assessment of AI-related manuscripts in medical imaging. Secondly, high heter- WT ogeneity was observed which may be due to methodo- logical diversity in machine learning techniques. Thirdly, only a quarter of included studies were eligible for meta- TC References analysis because of inadequate reporting, particularly the uncertainty values of performance metrics, thus ET compromising data availability. This issue has been rec- 1. Lapointe  S, Perry  A, Butowski  NA. Primary brain tumours in adults. ognized by non-neuro-oncology systematic reviews. Lancet. 2018; 392(10145):432–446. Fourthly, most studies failed to report manual segmen- 2. Porz N, Bauer S, Pica A, et al. Multi-modal glioblastoma segmentation: WT tation results, impeding a direct comparison of the man versus machine. PLoS One. 2014; 9(5):e96873. techniques. To promote standardization of ground-truth 3. LeCun  Y, Bengio  Y, Hinton  G. Deep learning. Nature. 2015; images for training AI algorithms, experts should utilize 521(7553):436–444. TC 100 structured reporting during manual segmentation. 4. Montavon G, Samek W, Müller KR. Methods for interpreting and under- All DL TML Finally, most studies tested and trained their algorithms standing deep neural networks. Digit Signal Process. 2018; 73:1–15. on open-access datasets. We propose that available au- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 5. Yusuf M, Atal I, Li J, et al. Reporting quality of studies using machine tomated algorithms be applied to prospective, routinely Pooled DSC learning models for medical diagnosis: a systematic review. BMJ Open. collected MRI data to assess performance and feasibility 2020; 10(3):e034568. for use in daily clinical practice. 6. Mongan  J, Moy  L, Kahn Jr CE. Checklist for Artificial Intelligence in To conclude, we found promising results for the use of AI Medical Imaging (CLAIM): a guide for authors and reviewers. WT algorithms in brain tumor identification and highlight the 7. Menze BH, Jakab A, Bauer S, et al. The multimodal Brain Tumor Image areas for future research. Further improvements to study Segmentation Benchmark (BRATS). IEEE Trans Med Imaging. 2014; design are needed, with adherence to reporting guidelines, 34(10):1993–2024. which will avail transparent evaluation and generalizability 8. Cho SJ, Sunwoo L, Baik SH, et al. Brain metastasis detection using ma- of diagnostic AI models. TC chine learning: a systematic review and meta-analysis. Neuro-oncology. 2021; 23(2):214–225. 9. van  Kempen  EJ, Post  M, Mannil  M, et  al. Performance of machine learning algorithms for glioma segmentation of brain MRI: a systematic Supplementary Material literature review and meta-analysis. Eur Radiol. 2021; 31(12):9638–9653. ET 10. Zheng  Q, Yang  L, Zeng  B, et  al. Artificial intelligence performance in Supplementary material is available at Neuro-Oncology HGG LGG MET detecting tumor metastasis from medical radiology imaging: a system- Advances online. atic review and meta-analysis. EClinicalMedicine. 2021; 31:100669. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 11. Salameh JP, Bossuyt PM, McGrath TA, et al. Preferred reporting items for Pooled DSC C systematic review and meta-analysis of diagnostic test accuracy studies (PRISMA-DTA): explanation, elaboration, and checklist. BMJ. 2020; 370. WT Keywords 12. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern artificial intelligence | brain tumor | machine learning | meta- TC Med. 2011; 155(8):529–536. analysis | segmentation. 13. Zou KH, Warfield SK, Bharatha A, et al. Statistical validation of image segmentation quality based on a spatial overlap index1: scientific re - ports. Acad Radiol. 2004; 11(2):178–189. WT 14. Harbord RM, Deeks JJ, Egger M, et al. A unification of models for meta- Funding analysis of diagnostic accuracy studies. Biostatistics. 2007; 8(2):239–251. 15. Sanjuán A, Price CJ, Mancini L, et al. Automated identification of brain This work was supported by the SINAPSE innovation fund. The tumors from single MR images based on segmentation with refined funders had no role in study design, analysis, and interpretation, WT patient-specific priors. Front Neurosci. 2013; 7:241. or writing. 16. Wu W, Chen AY, Zhao L, et al. Brain tumor detection and segmentation in Automated Human a CRF (conditional random fields) framework with pixel-pairwise affinity 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 and superpixel-level features. Int J Comput Assist Radiol Surg. 2014; Pooled DSC 9(2):241–253. Conflict of interest statement . The authors declare no con- 17. Dvorak  P, Bartusek  K, Kropatsch  WG, et  al. Automated multi-contrast flicts of interest. brain pathological area extraction from 2D MR images. J Appl Res Figure 3. Segmentation meta-analysis for (A) all studies and externally validated only studies, stratified by deep learning (DL) and traditional ma - Technol. 2015; 13(1):58–69. chine learning (TML), (B) subgroup segmentation meta-analysis by tumor type (high-grade glioma [HGG], low-grade glioma [LGG], and metastatic 18. Steed  TC, Treiber  JM, Patel  KS, et  al. Iterative probabilistic voxel brain tumor [MET]), and (C) automated versus human segmentation. labeling: automated segmentation for analysis of The Cancer Authorship statement. O.K., K.H.-I., and J.D.S.  contributed Imaging Archive glioblastoma images. Am J Neuroradiol. 2015; to study conception and design. O.K., A.H., D.B., K.H.-I., and 36(4):678–685. J.D.S.  contributed to literature search, data extraction, and 19. Hasan  AM, Meziane  F, Aspin  R, et  al. Segmentation of brain tumors quality assessment of studies. All authors had access to the in MRI images using three-dimensional active contour without edge. raw data, and contributed to data analysis, interpretation, and Symmetry. 2016; 8(11):132. Segmentation type Segmentation type Segmentation type Kouli et al. Automated brain tumor detection and segmentation 20. Ilunga-Mbuyamba  E, Avina–Cervantes  JG, Garcia–Perez  A, et  al. 40. Mlynarski  P, Delingette  H, Criminisi  A, et  al. 3D convolutional neural Localized active contour model with background intensity compensation networks for tumor segmentation using long-range 2D context. Comput applied on automatic MR brain tumor segmentation. Neurocomputing. Med Imaging Graph. 2019; 73:60–72. 2017; 220:84–97. 41. Mallick PK, Ryu SH, Satapathy SK, et al. Brain MRI image classification 21. Thiruvenkadam  K, Perumal  N. Fully automatic method for segmenta- for cancer detection using deep wavelet autoencoder-based deep neural tion of brain tumor from multimodal magnetic resonance images using network. IEEE Access. 2019; 7:46278–46287. wavelet transformation and clustering technique. Int J Imaging Syst 42. Tchoketch Kebir S, Mekaoui S, Bouhedda M. A fully automatic method- Technol. 2016; 26(4):305–314. ology for MRI brain tumour detection and segmentation. Imaging Sci J. 22. Liu  Y, Stojadinovic  S, Hrycushko  B, et  al. Automatic metastatic brain 2019; 67(1):42–62. tumor segmentation for stereotactic radiosurgery applications. Phys 43. Alagarsamy  S, Kamatchi  K, Govindaraj  V, et  al. Multi-channeled MR Med Biol. 2016; 61(24):8440. brain image segmentation: a new automated approach combining BAT 23. Li Y, Jia F, Qin J. Brain tumor segmentation from multimodal magnetic reso- and clustering technique for better identification of heterogeneous tu - nance images via sparse representation. Artif Intell Med. 2016; 73:1–13. mors. Biocybern Biomed Eng. 2019; 39(4):1005–1035. 24. Soltaninejad M, Yang G, Lambrou T, et al. Automated brain tumour detec- 44. Shivhare SN, Kumar N, Singh N. A hybrid of active contour model and tion and segmentation using superpixel-based extremely randomized trees convex hull for automated brain tumor segmentation in multimodal MRI. in FLAIR MRI. Int J Comput Assist Radiol Surg. 2017; 12(2):183–203. Multimedia Tools Appl. 2019; 78(24):34207–34229. 25. Imtiaz T, Rifat S, Fattah SA, et al. Automated brain tumor segmentation 45. Kalaiselvi  T, Kumarashankar  P, Sriramakrishnan  P. Three-phase auto- based on multi-planar superpixel level features extracted from 3D MR matic brain tumor diagnosis system using patches based updated run images. IEEE Access. 2019; 8:25335–25349. length region growing technique. J Digit Imaging. 2020; 33(2):465–479. 26. Kaur T, Saini BS, Gupta S. A novel fully automatic multilevel thresholding 46. Wu Y, Zhao Z, Wu W, et al. Automatic glioma segmentation based on technique based on optimized intuitionistic fuzzy sets and tsallis entropy adaptive superpixel. BMC Med Imaging. 2019; 19(1):1–4. for MR brain tumor image segmentation. Australas Phys Eng Sci Med. 47. Rehman ZU, Naqvi SS, Khan TM, et al. Fully automated multi-parametric 2018; 41(1):41–58. brain tumour segmentation using superpixel based classification. Expert 27. Liu  Y, Stojadinovic  S, Hrycushko  B, et  al. A deep convolutional neural Syst Appl. 2019; 118:598–613. network-based automatic delineation strategy for multiple brain metas- 48. Zhou C, Ding C, Wang X, et al. One-pass multi-task networks with cross- tases stereotactic radiosurgery. PLoS One. 2017; 12(10):e0185844. task guided attention for brain tumor segmentation. IEEE Trans Image 28. Essadike  A, Ouabida  E, Bouzid  A. Brain tumor segmentation with Process. 2020; 29:4516–4529. Vander Lugt correlator based active contour. Comput Methods Programs 49. Khan H, Shah PM, Shah MA, et al. Cascading handcrafted features and Biomed. 2018; 160:103–117. Convolutional Neural Network for IoT-enabled brain tumor segmenta- 29. Pinto  A, Pereira  S, Rasteiro  D, et  al. Hierarchical brain tumour seg- tion. Comput Commun. 2020; 153:196–207. mentation using extremely randomized trees. Pattern Recognit. 2018; 50. Xue  J, Wang  B, Ming  Y, et  al. Deep learning–based detection and 82:105–117. segmentation-assisted management of brain metastases. Neuro- 30. Soltaninejad  M, Yang  G, Lambrou  T, et  al. Supervised learning based oncology. 2020; 22(4):505–514. multimodal MRI brain tumour segmentation using texture features from 51. Thiruvenkadam  K, Nagarajan  K. Fully automatic brain tumor extrac- supervoxels. Comput Methods Programs Biomed. 2018; 157:69–84. tion and tissue segmentation from multimodal MRI brain images. Int J 31. Charron O, Lallement A, Jarnet D, et al. Automatic detection and seg- Imaging Syst Technol. 2021; 31(1):336–350. mentation of brain metastases on multimodal MR images with a deep 52. Ben  naceur  M, Akil  M, Saouli  R, et  al. Fully automatic brain tumor convolutional neural network. Comput Biol Med. 2018; 95:43–54. segmentation with deep learning-based selective attention using 32. Li Q, Gao Z, Wang Q, et al. Glioma segmentation with a unified algorithm overlapping patches and multi-class weighted cross-entropy. Med in multimodal MRI images. IEEE Access. 2018; 6:9543–9553. Image Anal. 2020; 63:101692. 33. Grøvik E, Yi D, Iv M, et al. Deep learning enables automatic detection 53. Aboelenein  NM, Songhao  P, Koubaa  A, et  al. HTTU-Net: hybrid two and segmentation of brain metastases on multisequence MRI. J Magn track U-net for automatic brain tumor segmentation. IEEE Access. 2020; Reson Imaging. 2020; 51(1):175–182. 8:101406–101415. 34. Eltayeb EN, Salem NM, Al-Atabany W. Automated brain tumor segmen- 54. Hassen OA, Abter SO, Abdulhussein AA, et al. Nature-inspired level set tation from multi-slices FLAIR MRI images. BioMed Mater Eng. 2019; segmentation model for 3D-MRI brain tumor detection. CMC Comput 30(4):449–462. Mater Contin. 2021; 68(1):961–981. 35. Wang  G, Li  W, Ourselin  S, et  al. Automatic brain tumor segmentation 55. Kao  PY, Shailja  S, Jiang  J, et  al. Improving patch-based convolutional based on cascaded convolutional neural networks with uncertainty esti- neural networks for MRI brain tumor segmentation by leveraging loca- mation. Front Comput Neurosci. 2019; 13:56. tion information. Front Neurosci. 2020; 13:1449. 36. Li  H, Li  A, Wang  M. A novel end-to-end brain tumor segmentation 56. Debnath  S, Talukdar  FA, Islam  M. Combination of contrast enhanced method using improved fully convolutional networks. Comput Biol Med. fuzzy c-means (CEFCM) clustering and pixel based voxel mapping tech- 2019; 108:150–160. nique (PBVMT) for three dimensional brain tumour detection. J Ambient 37. Tong J, Zhao Y, Zhang P, et al. MRI brain tumor segmentation based on Intell Hum Comput. 2021; 12(2):2421–2433. texture features and kernel sparse coding. Biomed Signal Proc Control. 57. Baid  U, Talbar  S, Rane  S, et  al. A novel approach for fully automatic 2019; 47:387–392. intra-tumor segmentation with 3D U-Net architecture for gliomas. Front 38. Dogra J, Jain S, Sood M. Glioma extraction from MR images employing Comput Neurosci. 2020; 14:10. gradient based kernel selection graph cut technique. Vis Comput. 2020; 58. Mitchell JR, Kamnitsas K, Singleton KW, et al. Deep neural network to 36(5):875–891. locate and segment brain tumors outperformed the expert technicians 39. Sriramakrishnan  P, Kalaiselvi  T, Rajeswaran  R. Modified local ternary who created the training data. J Med Imaging. 2020; 7(5):055501. patterns technique for brain tumour segmentation and volume es- 59. Sran PK, Gupta S, Singh S. Integrating saliency with fuzzy thresholding timation from MRI multi-sequence scans with GPU CUDA machine. for brain tumor extraction in MR images. J Vis Commun Image Biocybern Biomed Eng. 2019; 39(2):470–487. Represent. 2021; 74:102964. Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation 60. Takahashi S, Takahashi M, Kinoshita M, et al. Fine-tuning approach for 80. Çinar  A, Yildirim  M. Detection of tumors on brain MRI images using segmentation of gliomas in brain magnetic resonance images with a the hybrid convolutional neural network architecture. Med Hypotheses. machine learning method to normalize image differences among facil- 2020; 139:109684. ities. Cancers. 2021; 13(6):1415. 81. Devanathan B, Venkatachalapathy K. Brain tumor detection and classi- 61. Bahadure  NB, Ray  AK, Thethi  HP. Image analysis for MRI based brain fication model using optimal Kapur’s thresholding based segmentation tumor detection and feature extraction using biologically inspired BWT with deep neural networks. IIOABJ. 2020; 11:1–8. and SVM. Int J Biomed Imaging. 2017; 2017. 82. Gurunathan  A, Krishnan  B. Detection and diagnosis of brain tumors 62. Gupta  N, Khanna  P. A non-invasive and adaptive CAD system to de- using deep learning convolutional neural networks. Int J Imaging Syst tect brain tumor from T2-weighted MRIs using customized Otsu’s Technol. 2021; 31(3):1174–1184. thresholding with prominent features and supervised learning. Signal 83. Wang J, Shao W, Kim J. Automated classification for brain MRIs based Process Image Commun. 2017; 59:18–26. on 2D MF-DFA method. Fractals. 2020; 28(06):2050109. 63. Abd-Ellah MK, Awad AI, Khalaf AA, et al. Two-phase multi-model auto- 84. Kesav OH, Rajini GK. Automated detection system for texture feature matic brain tumour diagnosis system from magnetic resonance images based classification on different image datasets using S-transform. Int using convolutional neural networks. EURASIP J Image Video Process. J Speech Technol. 2021; 24(2):251–258. 2018; 2018(1):1–0. 85. Murali  E, Meena  K. Brain tumor detection from MRI using adaptive 64. Amin J, Sharif M, Raza M, et al. Brain tumor detection using statistical thresholding and histogram based techniques. Scalable Comput Pract and machine learning method. Comput Methods Programs Biomed. Exper. 2020; 21(1):3–10. 2019; 177:69–79. 86. Kaur  T, Gandhi  TK. Deep convolutional neural networks with transfer 65. Rai HM, Chatterjee K, Dashkevich S. Automatic and accurate abnormality learning for automated brain image classification. Mach Vis Appl. 2020; detection from brain MR images using a novel hybrid UnetResNext-50 deep 31(3):1–6. CNN model. Biomed Signal Proc Control. 2021; 66:102477. 87. Thangarajan SK, Chokkalingam A. Integration of optimized neural net- 66. Jayachandran A, Dhanasekaran R. Automatic detection of brain tumor work and convolutional neural network for automated brain tumor de- in magnetic resonance images using multi-texton histogram and support tection. Sensor Rev. 2021; 41(1):16–34. vector machine. Int J Imaging Syst Technol. 2013; 23(2):97–103. 88. Kalaiselvi T, Padmapriya T, Sriramakrishnan P, et al. Development of au- 67. Jayachandran  A, Dhanasekaran  R. Brain tumor severity analysis using tomatic glioma brain tumor detection system using deep convolutional modified multi-texton histogram and hybrid kernel SVM. Int J Imaging neural networks. Int J Imaging Syst Technol. 2020; 30(4):926–938. Syst Technol. 2014; 24(1):72–82. 89. Rajinikanth V, Joseph Raj AN, Thanaraj KP, et al. A customized VGG19 68. Dvořák  P, Kropatsch  WG, Bartušek  K. Automatic brain tumor detec- network with concatenation of deep and handcrafted features for brain tion in t2-weighted magnetic resonance images. Meas Sci Rev. 2013; tumor detection. Appl Sci. 2020; 10(10):3429. 13(5):223–230. 90. Huang Z, Xu H, Su S, et al. A computer-aided diagnosis system for brain 69. Amin J, Sharif M, Yasmin M, et al. A distinctive approach in brain tumor detec- magnetic resonance imaging images using a novel differential feature tion and classification using MRI. Pattern Recognit Lett. 2017; 139:118–127. neural network. Comput Biol Med. 2020; 121:103818. 70. Anitha  R, Siva  Sundhara  Raja  D. Development of computer-aided ap- 91. Chen B, Zhang L, Chen H, et al. A novel extended Kalman filter with support proach for brain tumor detection using random forest classifier. Int J vector machine based method for the automatic diagnosis and segmenta- Imaging Syst Technol. 2018; 28(1):48–53. tion of brain tumors. Comput Methods Programs Biomed. 2021; 200:105797. 71. Lahmiri  S. Glioma detection based on multi-fractal features of seg- 92. Patil DO, Hamde ST. Automated detection of brain tumor disease using mented brain MRI by particle swarm optimization techniques. Biomed empirical wavelet transform based LBP variants and ant-lion optimiza- Signal Proc Control. 2017; 31:148–155. tion. Multimedia Tools Appl. 2021; 80(12):17955–17982. 72. Deepa  AR, Emmanuel  WS. An efficient detection of brain tumor 93. Simaiya S, Lilhore UK, Prasad D, et al. MRI brain tumour detection & using fused feature adaptive firefly backpropagation neural network. image segmentation by hybrid hierarchical K-means clustering with Multimedia Tools Appl. 2019; 78(9):11799–11814. FCM based machine learning model. Ann Romanian Soc Cell Biol. 73. Selvapandian  A, Manivannan  K. Performance analysis of meningioma 2021; 28:88–94. brain tumor classifications based on gradient boosting classifier. Int J 94. Tejas P. A novel hybrid approach to detect brain tumor in MRI images. Imaging Syst Technol. 2018; 28(4):295–301. Turk J Comput Math Educ. 2021; 12(3):3412–3416. 74. Arunkumar N, Mohammed MA, et al. Fully automatic model-based seg- 95. El-Dahshan ES, Mohsen HM, Revett K, et al. Computer-aided diagnosis mentation and classification approach for MRI brain tumor using artifi - of human brain tumor through MRI: a survey and a new algorithm. cial neural networks. Concurrency Comput Pract Exp. 2020; 32(1):e4962. Expert Syst Appl. 2014; 41(11):5526–5545. 75. Edalati-rad  A, Mosleh  M. Improving brain tumor diagnosis using MRI 96. Liu  X, Faes  L, Kale  AU, et  al. A comparison of deep learning perfor- segmentation based on collaboration of beta mixture model and learning mance against health-care professionals in detecting diseases from automata. Arab J Sci Eng. 2019; 44(4):2945–2957. medical imaging: a systematic review and meta-analysis. Lancet Digit 76. Song G, Huang Z, Zhao Y, et al. A noninvasive system for the automatic Health. 2019; 1(6):e271–e297. detection of gliomas based on hybrid features and PSO-KSVM. IEEE 97. Ghaffari  M, Samarasinghe  G, Jameson  M, et  al. Automated post- Access. 2019; 7:13842–13855. operative brain tumour segmentation: a deep learning model based 77. Johnpeter JH, Ponnuchamy T. Computer aided automated detection and on transfer learning from pre-operative images. Magn Reson Imaging. classification of brain tumors using CANFIS classification method. Int J 2022; 86:28–36. Imaging Syst Technol. 2019; 29(4):431–438. 98. Durmo  F, Lätt  J, Rydelius  A, et  al. Brain tumor characterization using 78. Alam MS, Rahman MM, Hossain MA, et al. Automatic human brain tumor multibiometric evaluation of MRI. Tomography. 2018; 4(1):14–25. detection in MRI image using template-based K means and improved fuzzy 99. Shad  R, Cunningham  JP, Ashley  EA, et  al. Designing clinically trans- C means clustering algorithm. Big Data Cogn Comput. 2019; 3(2):27. latable artificial intelligence systems for high-dimensional medical im - 79. Atici MA, Sagiroglu S, Celtikci P, et al. A novel deep learning algorithm aging. Nat Mach Intell. 2021; 3(11):929–935. for the automatic detection of high-grade gliomas on T2-weighted mag- 100. Dos Santos DP, Brodehl S, Baeßler B, et al. Structured report data can netic resonance images: a preliminary machine learning study. Turk be used to develop deep learning algorithms: a proof of concept in Neurosurg. 2020; 30(2):199–205. ankle radiographs. Insights Imaging. 2019; 10(1):1–8. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Neuro-Oncology Advances Oxford University Press

Automated brain tumor identification using magnetic resonance imaging: A systematic review and meta-analysis

Loading next page...
 
/lp/oxford-university-press/automated-brain-tumor-identification-using-magnetic-resonance-imaging-mCHQ93s53p

References (189)

Publisher
Oxford University Press
Copyright
© The Author(s) 2022. Published by Oxford University Press, the Society for Neuro-Oncology and the European Association of Neuro-Oncology.
eISSN
2632-2498
DOI
10.1093/noajnl/vdac081
Publisher site
See Article on Publisher Site

Abstract

Background. Automated brain tumor identification facilitates diagnosis and treatment planning. We evaluate the performance of traditional machine learning (TML) and deep learning (DL) in brain tumor detection and segmen- tation, using MRI. Methods. A systematic literature search from January 2000 to May 8, 2021 was conducted. Study quality was as- sessed using the Checklist for Artificial Intelligence in Medical Imaging (CLAIM). Detection meta-analysis was per - formed using a unified hierarchical model. Segmentation studies were evaluated using a random effects model. Sensitivity analysis was performed for externally validated studies. Results. Of 224 studies included in the systematic review, 46 segmentation and 38 detection studies were eligible for meta-analysis. In detection, DL achieved a lower false positive rate compared to TML; 0.018 (95% CI, 0.011 to 0.028) and 0.048 (0.032 to 0.072) (P < .001), respectively. In segmentation, DL had a higher dice similarity coefficient (DSC), particularly for tumor core (TC); 0.80 (0.77 to 0.83) and 0.63 (0.56 to 0.71) (P < .001), persisting on sensitivity analysis. Both manual and automated whole tumor (WT) segmentation had “good” (DSC ≥ 0.70) performance. Manual TC segmentation was superior to automated; 0.78 (0.69 to 0.86) and 0.64 (0.53 to 0.74) (P = .014), respec- tively. Only 30% of studies reported external validation. Conclusions. The comparable performance of automated to manual WT segmentation supports its integration into clinical practice. However, manual outperformance for sub-compartmental segmentation highlights the need for further development of automated methods in this area. Compared to TML, DL provided superior performance for detection and sub-compartmental segmentation. Improvements in the quality and design of studies, including ex- ternal validation, are required for the interpretability and generalizability of automated models. Key Points • Human expertise outperformed automated methods in sub-compartmental segmentation. • DL performed superiorly to TML for detection and sub-compartmental segmentation. • Transparency and generalizability of models should be improved. © The Author(s) 2022. Published by Oxford University Press, the Society for Neuro-Oncology and the European Association of Neuro-Oncology. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Kouli et al. Automated brain tumor detection and segmentation Importance of the Study Despite the increasing research on artificial automated methods, deep learning was found intelligence techniques in medical imaging, to be superior to traditional machine learning their safe implementation into clinical practice in detection and sub-compartmental segmen- depends on rigorous and generalizable evi- tation, but explaining this was hindered by the dence. This study systematically evaluated the paucity in reported methods of model inter- performance of automated brain tumor detec- pretability. Less than a third of studies reported tion and segmentation methods, and assessed external validation of their automated method. the quality of reporting using the Checklist for The variability found in study reporting under- Artificial Intelligence in Medical Imaging guide- mines the credibility of automated methods, line. Although automated and manual methods impacting their benefit for patients and health in whole tumor segmentation performed com- systems. Hence, there is a need for adher- parably, manual methods performed better ence to international reporting standards and in sub-compartmental segmentation. Within guidelines. Brain tumors present a significant burden on healthcare assessing the quality of studies in this field. We present worldwide due to the neurological deficits produced and the largest systematic review and meta-analysis that ob- subsequent poor prognosis, with an average 5-year survival jectively evaluates performance of automated detection of 35% in malignant subtypes. MRI is the gold standard mo- and segmentation techniques and assesses the reporting dality engendering brain tumor diagnosis and subsequently quality of included studies. informing surgical intervention, radiotherapy planning, and chemotherapy. Inevitably, qualitative MRI assessment has always been subject to high inter-rater variability, as well as Materials and Methods being a notoriously laborious process. However, the emer- gence of Artificial Intelligence (AI) has sparked the hope of Search Strategy overcoming these limitations. This systematic review and meta-analysis were con- The advent of Computer-Aided Diagnosis (CAD) using ducted in accordance with the Preferred Reporting Items AI can potentially improve brain tumor patient out- for Systematic Reviews and Meta-Analyses statement comes. Traditional machine learning (TML) techniques (PROSPERO; CRD42021247925). We searched PubMed, have become widely used for image classification but Web of Science, and Scopus for studies published be- are restricted by a requirement for specifying “feature tween January 1, 2000, and May 8, 2021. The search was vectors” for extraction from the raw data. Conversely, initially performed on June 19, 2020 and updated on May deep learning (DL) techniques provide effective and au- 8, 2021. The search strategy is found in the Supplementary tomatic representation of complex image features, which Appendix. The search was limited to publications written has contributed to their increased popularity, but the in- in English. The citations of included articles were hand- terpretation of automatically identified features remains searched to identify additional appropriate articles. a problem. In addition, both TML and DL techniques are vulnerable to overfitting and selection bias. Therefore, to safely use CAD in clinical settings, large robust studies Inclusion and Exclusion Criteria which evaluate their quality and generalizability are cru- cial. Holistic and standardized evaluation of scientific Studies were included if they developed or validated a reporting is facilitated by established guidelines, such as semi-automatic or fully automatic adult brain tumor detec- the recently proposed Checklist for Artificial Intelligence in tion or segmentation method using MRI. Exclusion criteria: Medical Imaging (CLAIM). (1) studies reporting tumor classification or tumor grading The research on AI in neuro-oncology imaging has methods only; (2) studies utilizing MRI spectroscopy only been amplified by the introduction of open access image for method development; (3) studies reporting methods on datasets, such as the annual Multimodal Brain Tumor pediatric, pituitary, and/or brainstem tumors only; (4) ab- Segmentation Challenge (BRATS). This provides the ideal stracts or conference proceedings; and (5) no performance foundation for an in-depth review to identify optimal au- metrics reported. tomated methods. Three former systematic reviews and meta-analyses evaluated performance of AI-related 8–10 techniques in neuro-oncological imaging. However, Study Selection and Data Extraction these focused on specific brain tumor types and whole tumor (WT) segmentation, and none have evaluated sub- Extracted citations were imported into the Rayyan system- compartmental segmentation nor addressed performance atic review site (https://www.rayyan.ai) for study selec- disparities between CAD and human expert segmentation. tion. Following removal of duplicates, titles and abstracts Moreover, there remains a paucity in comprehensively were screened, and full texts of relevant publications Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation reviewed. Study screening was completed by two inde- from meta-analysis. A  unified hierarchical summary re - pendent reviewers (O.K., J.D.S.), with disagreements re- ceiver operating characteristic model was developed for solved through a consensus-based approach with the the detection meta-analysis. Summary estimates of sensi- wider group. tivity and specificity with 95% CIs were derived using the automated methods, deep learning was found Two independent reviewers (O.K., A.H.) extracted study random-effects bivariate binomial model parameters and to be superior to traditional machine learning characteristics from included studies with disagreements equivalence equations of Harbord et  al. The reason for in detection and sub-compartmental segmen- resolved through consensus. Data extracted included: (1) using the hierarchical model is that it considers the corre- tation, but explaining this was hindered by the Author; (2) year; (3) dataset(s) utilized with the number lation between sensitivity and specificity, accounting for paucity in reported methods of model inter- of patients/images; (4) type of tumors(s) studied; (5) MRI within-study variability, as well as variability (also called pretability. Less than a third of studies reported modality; (6) performance evaluation metrics; (7) type heterogeneity) in effects between studies (ie, between- external validation of their automated method. of algorithm utilized; (8) feature extraction; (9) inference study variability). Receiver operating characteristic (ROC) The variability found in study reporting under- time in slice/second for segmentation; (10) user interac- curves were used to plot summary estimates of sensitivity mines the credibility of automated methods, tion (ie, automatic vs semi-automatic); and (11) validation against false positive rate (FPR, ie, 1-specificity). The ROC impacting their benefit for patients and health technique(s). curve plots also exhibit the uncertainty around the sum- systems. Hence, there is a need for adher- mary estimates via 95% confidence regions, and hetero- geneity between accuracy estimates via 95% prediction ence to international reporting standards and Reporting and Quality Evaluation regions. guidelines. Segmentation methods were evaluated using a random The reporting quality of studies was assessed according effects model, and reported in terms of pooled DSC, a to CLAIM. The risk of bias and applicability was assessed universally used and reported metric. The restricted max- using the Quality Assessment of Diagnostic Accuracy imum likelihood estimator was used to calculate the het- Studies 2 (QUADAS-2) guideline, with consideration of erogeneity variance (τ ). The inverse variance method some CLAIM items (see Supplementary Appendix). Three was used to calculate a pooled effect size. Knapp-Hartung reviewers (O.K., A.H., D.B.) independently appraised in- adjustments were used to calculate the confidence in- cluded studies with any disagreements resolved through terval. A  prerequisite for study inclusion in the meta- consensus. A “good” domain was deemed by its reporting analysis was reporting outcome of interest (ie, DSC), in in ≥70% of studies. combination with an SD. Subgroup analysis comparing tumor types was performed where possible. A compara- tive analysis was conducted to evaluate the performance Definitions of CAD versus human experts. Sensitivity analysis was performed looking at studies that only performed out-of- DL was referred to studies that utilized deep neural net- sample external validation. Subgroup or sensitivity anal- works as their method of choice. TML was referred to ysis was avoided when the number of studies in a group is methods not classified as DL. Detection studies were small (n < 5). Study heterogeneity was formally evaluated defined as those that reported performance results for 2 2 using Higgins’ inconsistency index (I ) (I > 50%  =  sig- techniques that identified the presence of a tumor in an nificant heterogeneity). All analyses were performed image. Segmentation studies were defined as those that in R (version 4.0.2, http://www.r-project.org/) using the reported performance results for techniques that seg- tidyverse, metaDTA, dmetar, meta, and ComplexUpset mented brain tumors, whether it was WT, tumor core (TC), packages. and/or enhancing tumor (ET) segmentations as defined by BRATS. Following previous work, dice similarity coef- ficient (DSC) of ≥0.7 was considered to represent “good” overlap. Results Statistical Analysis Our search identified 2367 records, of which 1515 re - cords were screened (Figure 1). An additional 22 texts A meta-analysis was conducted for both automated de- were identified through cross-referencing. Two-hundred tection and segmentation studies to compare DL with TML and sixty-two full texts were assessed for eligibility and methods and to evaluate the segmentation performance of 224 were included in the systematic review: 188 seg- CAD to that of manual experts. Studies providing perfor- mentation and 46 detection studies (10 studies reported mance metrics for their method on different datasets were both detection and segmentation results; see “Eligible assumed to be independent of each other. This is because Studies” in Supplementary Appendix). Forty-six segmen- we are interested in providing an overview of the two 15–60 39,42,45,61–95 tation and 38 detection studies were eligible methods rather than exact point estimates. for meta-analysis. For detection methods, contingency tables consisting of True Positive, False Positive, False Negative, and True Negative were constructed. For studies that did not directly Study Characteristics provide contingency tables, missing data were calcu- lated with Review Manager 5.3 (https://revman.cochrane. Study characteristics are shown in Supplementary Table org/) using sensitivity, specificity, and number of images. 1 (segmentation) and Supplementary Table 2 (detection). If neither contingency tables nor sufficient data were re - 40.6% (n =95) of studies used DL and 59.4% (n = 139) used ported for computation, then the study was excluded TML methods. There was a clear increase in the use of DL Kouli et al. Automated brain tumor detection and segmentation Additional records identified Medline/PubMed Web of science Scopus from citation searching (n = 692) (n = 814) (n = 861) (n = 22) Duplicated records removed (n = 852) Records screened based on title and abstract (n = 1515) Records excluded (n = 1231) Full text articles assessed (n = 262) Full text articles excluded (n = 38) - Abstracts or conference proceedings (n = 17) - Tumour grading/classification method (n = 9) - Unclear/No performance metrics (n = 7) - Method not applied to brain tumours (n = 3) Total articles included in the - Method applied on paediatric tumours only (n = 1) review - Method not applied on MRI scans (n = 1) (n = 224) Detection articles Segmentation articles (n = 46)* (n = 188)* Articles included in Articles included in meta-analysis meta-analysis (n = 38) (n = 46) Figure 1. Study selection flow diagram (*10 studies reported both detection and segmentation results). from 2018 (Supplementary Figure 1). Most studies utilized (n = 168; 71.8%) (Supplementary Figure 5). 55.1% (n = 129) a fully automated algorithm (n = 222; 94.9%). of studies reported internal validation. 31.2% (n  =  73) 80.7% (n = 189) used data from open-access repositories, used random split validation and 32.1% (n  =  75) used with BRATS being the most popular of them (n  =  156; resampling methods (Supplementary Figure 6). Overall, 66.7%). 29.0% (n  =  68) used local datasets, all of which less than a third of studies (n  =  70; 30%) performed ex- were retrospectively collected data. 11.9% (n  =  28) used ternal validation. Specifically, 49.5% ( n = 47/95) of DL and both local and public datasets. 2.1% (n = 5) did not specify 16.5% (n = 23/139) of TML studies reported external valida- dataset(s) used (Supplementary Figure 2). Publicly avail- tion (Supplementary Figure 7). Details of algorithm perfor- able datasets are detailed in Supplementary T able 3. mance and validation techniques of studies are found in The most studied tumors were high-grade gliomas Supplementary T able 4 (segmentation) and Supplementary (HGG) (n  =  173; 73.9%) and low-grade gliomas (LGG) Table 5 (detection). Regarding segmentation inference (n = 171; 73.1%), with 59.0% (n = 138) of studies involving time, DL methods performed the fastest (median: 0.2  s/ both (Supplementary Figure 3). 9.8% (n  =  23) did not re- MRI slice, interquartile range [IQR]: 0.1–0.9), whereas port the type of tumor studied. Regarding MRI sequences, fully automated TML methods achieved a median of 2.6 s T2 (n  =  169; 72.2%), fluid-attenuated inversion recovery (IQR: 1.1–12.6) and semi-automated techniques achieved (FLAIR) (n  =  165; 70.5%), T1-contrast enhanced (T1CE) 48.16  s (IQR: 6.2–134.9) (P < .001; Kruskal-Wallis test) (n = 164; 70.1%), and T1 (n = 143; 61.1%) modalities were the (Supplementary Figure 5). most studied (Supplementary Figure 4). 48.3% (n = 113) of studies combined all these sequences, and 20.1% (n = 47) Reporting Quality used just one for the algorithm development. A small mi- nority (n = 19, 8.1%) did not report the type of MRI used. Detailed CLAIM assessment is presented in Supplementary The most common metrics used for evaluating per- Table 6 (segmentation) and Supplementary Table 7 (detec- formance were DSC (n  =  168; 71.8%) and sensitivity tion). With respect to “good” reported CLAIM items, 95.3% Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation (n = 223) stated the source of the data (CLAIM item 7) and 0.73 (95% CI, 0.69 to 0.76; I   =  99.99%) for ET (Figure 3A; 86.8% (n  =  203) clearly reported how ground truths were Supplementary Table 11) were achieved. This persisted on derived (CLAIM items 14–18). Almost all studies reported sensitivity analysis of externally validated studies; a DSC detailed model structure and initialization of parameters of 0.85 (95% CI, 0.82 to 0.87; I   =  99.97%) was achieved (CLAIM items 22–24). 83.8% (n  =  196) clearly reported for WT and 0.76 (95% CI, 0.70 to 0.80; I   =  99.96%) for TC training procedures and hyperparameters in sufficient de - (Figure 3A; Supplementary T able 12). tail (CLAIM item 25) (Supplementary Figure 9). However, only 1.3% (n  =  3) of studies clarified missing TML versus DL segmentation meta-analysis—DL was com- data handling. No studies reported sample size calcula- parable to TML for WT segmentation, 0.86 (95% CI, 0.84 to 2 2 tions (CLAIM item 19). Less than two-thirds (n = 144, 61.5%) 0.88; I  = 99.99%) and 0.83 (95% CI, 0.80 to 0.87; I  = 99.99%; specified how data was partitioned (CLAIM item 20). Only P  =  .21), respectively (Figure 3A; Supplementary Table 32.5% (n = 76) of studies reported uncertainty around per- 11). This was relatively consistent on sensitivity analysis; formance metrics (CLAIM item 29). 67.1% (n = 157) studies 0.87 (95% CI, 0.85 to 0.88; I   =  100%) and 0.81 (95% CI, reported performing internal and/or external validation 0.73 to 0.89; I  = 99.94%; P = .10), respectively (Figure 3A; (CLAIM item 32). Just 2.6% (n = 6) specified inclusion and Supplementary T able 12). exclusion flow of participants or images (CLAIM item In terms of TC segmentation, DL achieved a statistically 33) and only 6% (n = 14) defined demographics and clinical significant higher DSC compared to TML, 0.80 (95% CI, 0.77 2 2 characteristics of cases in each partition (CLAIM item 34). to 0.83; I  = 99.97%) and 0.63 (95% CI, 0.56 to 0.71; I  = 100%; Ten studies made the algorithm source code publicly avail- P < .001). This remained unchanged on sensitivity analysis; able (CLAIM item 41; for available links to source codes see 0.80 (95% CI, 0.77 to 0.83; I   =  99.97%) and 0.64 (95% CI, Supplementary T able 8). 0.49 to 0.79; I  = 99.87%; P = .009), respectively. Risk of bias and applicability assessment—Detailed Finally, for ET segmentation, DL methods achieved QUADAS-2 assessment is presented in Supplementary higher DSC when compared to TML, 0.75 (95% CI, 0.72 to 2 2 Table 9 (segmentation) and Supplementary T able 10 (detec- 0.78; I  = 99.91%) and 0.69 (95% CI, 0.59 to 0.78; I  = 100%), tion). In the patient selection domain of risk of bias, 21.4% respectively. However, this did not reach statistical signifi - (n  =  50) studies were considered to have unclear or high cance (P = .17). risk of bias as they did not express the exclusion criteria in the utilized dataset(s). In the reference standard do- Subgroup analysis by tumor  type—Most studies (91.3%; main, 13.2% (n = 31) were deemed to have unclear or high n = 42/46) applied their segmentation method on gliomas risk of bias as they did not clearly define how the ground (91.3%; n = 42/46) HGG and 84.78% (n = 39/46; LGG), 10.87% truth segmentation was derived. In terms of applicability, (n = 5/46) on metastatic brain tumors, 4.35% (n = 2/46) on the main source of concern was in the index test domain; meningiomas, and 1.79% (n  =  1/46) on nerve sheath tu- 31.6% (n = 74) had high applicability concerns as they did mors. 58.69% of studies (n = 27/46) sufficiently categorized not validate the algorithm (Supplementary Figure 10). their segmentation results by tumor type required for sub- group analysis (Supplementary T able 13). Since few studies applied their segmentation tech- Meta-analysis niques to meningiomas and nerve sheath tumors, they could not be included in subgroup analyses. The sub- Detection meta-analysis—Thirty-eight detection studies group analysis thus compared HGG, LGG, and meta- provided sufficient data to construct contingency tables (69 static brain tumors. Only WT segmentation results for tables). Only one study performed an external validation. metastatic brain tumors were possible to compute due 28.9% (n  =  11; 20 tables) of studies utilized DL methods to limited studies. ET segmentation was predominantly and the remaining 71.1% (n  =  27; 49 tables) utilized TML performed on HGG, thereby excluding it from subgroup methods (Table 1). analysis. It was not possible to compare DL and TML Overall, the pooled sensitivity was 0.98 (95% CI, 0.97 to methods in diagnosing different types of tumors due to 0.99) and the FPR was 0.035 (95% CI, 0.025 to 0.048). DL the small number of studies. and TML had comparable sensitivity, but DL achieved a For WT segmentation, no difference was observed be- lower FPR compared to TML; 0.018 (95% CI, 0.011 to 0.028) tween HGG, LGG, and metastatic tumors, 0.83 (95% and 0.048 (95% CI, 0.032 to 0.072) (P < .001), respectively CI, 0.79 to 0.86; I   =  99.99%), 0.80 (95% CI, 0.74 to 0.86; (Figures 2A and 2B). 2 2 I   =  99.98%) and 0.80 (95% CI, 0.74 to 0.86; I   =  99.95%; P = .64), respectively (Figure 3B; Supplementary T able 13). For TC segmentation, a higher DSC was achieved for HGG Segmentation meta-analysis—Due to limited numbers compared to LGG, 0.67 (95% CI, 0.60 to 0.74; I  = 99.97%) of semi-automated studies, segmentation meta-analysis and 0.49 (95% CI, 0.37 to 0.61; I   =  99.98%; P  =  .0027), solely focused on fully automated methods. Forty-six respectively. fully automated segmentation studies provided sufficient data to be included in the meta-analysis. 34.8% (n = 16) of studies utilized DL and 65.2% (n = 30) utilized TML methods. Automated versus human expert segmentation—Only Less than half (n = 19; 41.3%) of studies performed external 30.4% (n  =  14/46) of studies provided sufficient data for validation. 97.8% (n = 45) of studies provided segmentation comparison between automated and expert manual seg- results for WT, 41.3% (n = 19) for TC and 39.1% (n = 18) for ET. mentation for WT and TC segmentation. All studies in- Overall, a DSC of 0.84 (95% CI, 0.82 to 0.87; I  = 99.99%) cluded multiple (>1) independent expert operators for for WT, 0.72 (95% CI, 0.67 to 0.76; I  = 99.99%) for TC, and generating ground truth segmentations; one study (7.1%) Kouli et al. Automated brain tumor detection and segmentation Table 1. Detection Meta-analysis Results Method Author Year TP FN FP TN Total Sensitivity Specificity Weighted Specificity Weighted Sensitivity DL Çinar and Yildirim 2020 147 8 0 98 253 0.948 1 5.129 4.467 DL Devanathan and Venkatachalapathy 2020 153 2 3 95 253 0.987 0.969 4.241 4.708 DL Gurunathan and Krishnan 2020 514 25 21 1752 2312 0.954 0.988 6.356 5.694 DL Rai et al. 2021 1232 141 135 2421 3929 0.897 0.947 6.417 6.582 DL Abd-Ellah et al. 2018 239 1 0 109 349 0.996 1 2.887 3.571 DL Atici et al. 2019 1082 220 110 2171 3583 0.831 0.952 6.424 6.567 DL Kaur and Ghandi 2020 20 0 0 30 50 1 1 2.003 0.991 DL Kaur and Ghandi 2020 52 0 0 22 74 1 1 1.343 1.889 DL Kaur and Ghandi 2020 140 0 0 20 160 1 1 0.877 2.958 DL Kaur and Ghandi 2020 238 12 18 238 506 0.952 0.93 5.983 6.155 DL Thangarajan and Chokkalingam 2020 159 10 7 93 269 0.941 0.93 5.501 5.835 DL Kalaiselvi et al. 2020 56 7 17 201 281 0.889 0.922 6.145 5.163 DL Rajinikanth et al. 2020 388 12 11 589 1000 0.97 0.982 6.092 5.725 DL Rajinikanth et al. 2020 387 13 9 591 1000 0.968 0.985 6.105 5.611 DL Rajinikanth et al. 2020 392 8 8 592 1000 0.98 0.987 5.948 5.45 DL Rajinikanth et al. 2020 395 5 4 596 1000 0.988 0.993 5.692 4.845 DL Rajinikanth et al. 2020 393 7 9 391 800 0.983 0.978 5.727 5.771 DL Rajinikanth et al. 2020 395 5 6 194 600 0.988 0.97 4.967 5.814 DL Huang et al. 2020 4244 52 34 6348 10678 0.988 0.995 6.36 6.373 DL Huang et al. 2020 397 6 12 480 895 0.985 0.976 5.805 5.83 TML Jayachandran and Dhanasekaran 2013 10 0 1 4 15 1 0.8 0.424 1.387 TML Deepa and Emmanuel 2018 68 2 0 11 81 0.971 1 1.517 1.927 TML Selvapandian and Manivannan 2018 47 3 3 72 125 0.94 0.96 3.22 1.897 TML Chen et al. 2021 238 9 3 54 304 0.964 0.947 2.98 2.475 TML Edalati-rad and Mosleh 2019 42 0 1 36 79 1 0.973 1.442 1.71 TML Dahshan et al. 2014 87 0 1 13 101 1 0.929 0.534 2.283 TML Song et al. 2019 242 7 1 56 306 0.972 0.982 2.853 2.346 TML Johnpeter and Ponnuchamy 2019 58 2 2 98 160 0.967 0.98 3.142 1.729 TML Amin et al. 2017 42 4 0 39 85 0.913 1 3.12 1.362 TML Amin et al. 2017 60 5 0 35 100 0.923 1 3.045 1.619 TML Amin et al. 2019 70 0 2 14 86 1 0.875 0.64 2.333 TML Amin et al. 2019 61 1 7 17 86 0.984 0.708 1.725 2.463 TML Amin et al. 2019 68 6 2 10 86 0.919 0.833 2.433 2.375 Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation Table 1. Continued Method Author Year TP FN FP TN Total Sensitivity Specificity Weighted Specificity Weighted Sensitivity TML Amin et al. 2019 69 0 5 12 86 1 0.706 0.646 2.473 TML Amin et al. 2019 69 0 4 15 88 1 0.789 0.72 2.432 TML Amin et al. 2019 72 0 4 10 86 1 0.714 0.543 2.469 TML Amin et al. 2019 290 0 10 106 406 1 0.914 1.481 2.544 TML Amin et al. 2019 296 0 5 105 406 1 0.955 1.445 2.487 TML Amin et al. 2019 301 50 5 50 406 0.858 0.909 3.363 2.558 TML Amin et al. 2019 296 0 10 100 406 1 0.909 1.422 2.55 TML Amin et al. 2019 306 11 0 89 406 0.965 1 3.151 2.187 TML Amin et al. 2019 295 10 1 100 406 0.967 0.99 3.175 2.298 TML Amin et al. 2019 70 4 1 11 86 0.946 0.917 2.117 2.236 TML Amin et al. 2019 70 0 2 14 86 1 0.875 0.64 2.333 TML Amin et al. 2019 74 6 0 6 86 0.925 1 1.798 2.045 TML Amin et al. 2019 71 0 3 12 86 1 0.8 0.59 2.419 TML Amin et al. 2019 74 0 0 12 86 1 1 0.518 1.969 TML Amin et al. 2019 70 3 0 13 86 0.959 1 1.965 1.916 TML Jayachandran and Dhanasekaran 2012 4 1 0 5 10 0.8 1 1.719 0.482 TML Wang et al. 2020 25 0 1 24 50 1 0.96 1.271 1.525 TML Kesav and Rajini 2020 43 1 1 21 66 0.977 0.955 1.91 1.891 TML Alam et al. 2019 38 1 0 1 40 0.974 1 0.193 1.806 TML Murali and Meena 2020 182 5 0 25 212 0.973 1 2.234 2.225 TML Arunkumar et al. 2018 20 0 1 19 40 1 0.95 1.146 1.457 TML Gupta and Khanna 2017 600 0 13 488 1101 1 0.974 2.358 2.511 TML Gupta and Khanna 2017 320 0 12 269 601 1 0.957 2.221 2.489 TML Bahadure et al. 2015 128 3 4 65 200 0.977 0.942 2.835 2.371 TML Dvorák et al. 2013 63 9 9 122 203 0.875 0.931 3.453 2.246 TML Sriramakrishnan et al. 2019 4441 70 60 78 4649 0.984 0.565 3.02 2.63 TML Patil and Hamde 2021 50 0 0 44 94 1 1 1.558 1.421 TML Patil and Hamde 2021 50 0 0 44 94 1 1 1.558 1.421 TML Anitha and Raja 2017 14 1 2 83 100 0.933 0.976 3.199 0.92 TML Anitha and Raja 2017 14 1 1 59 75 0.933 0.983 3.075 0.848 Kouli et al. Automated brain tumor detection and segmentation utilized two operators and 13 (92.9%) utilized four op- erators as part of the BRATS challenge (Supplementary Table 14). For WT segmentation, both achieved “good” perfor- mance, but higher DSC was achieved in the manual group than the automated group 0.86 (95% CI, 0.85 to 0.86; 2 2 I   =  99.90%) and 0.80 (95% CI, 0.73 to 0.87; I   =  99.98%; P = .11), respectively (Figure 3C; Supplementary Table 14). However, for TC segmentation, manual segmentation out- performed automated segmentation, 0.78 (95% CI, 0.69 to 2 2 0.86; I  = 99.94%) and 0.64 (95% CI, 0.53 to 0.74; I  = 99.98%; P = .014), respectively. For HGG tumors, manual segmentation outperformed automated 0.88 (95% CI, 0.87 to 0.88; I  = 28.85%) and 0.81 (95% CI, 0.74 to 0.87; I   =  99.98%; P  =  .015), respectively. Conversely, manual was comparable to automated seg- mentation for LGG; 0.84 (95% CI, 0.83 to 0.85; I  = 95.68%) and 0.79 (95% CI, 0.68 to 0.90, I   =  99.96%, P  =  .33), respectively. Discussion To date, this is the largest meta-analysis evaluating auto- mated brain tumor segmentation and detection methods. Automation provides benefits including elimination of human inter-rater variability and reduced inference time ; particularly DL methods, which showed an impressive me- dian inference time of 0.2 seconds/MRI slice. Previous studies have concluded that, in general, au- tomated methods are comparable to human expertise in 10,96 terms of performance. However, our research high- lights that this only holds true for WT segmentation in brain tumors. Notably, we found that manual methods outperformed automated techniques for TC segmenta- tion. Sub-compartmental segmentation, including TC, is a major influence on tumor progression monitoring and radiotherapy planning. Hence, our finding cautions the application of machine learning in all its potential uses in routine clinical practice and highlights the need for further research on sub-compartmental automated segmentation (TC and ET). Since most methods used conventional MRI scans (ie, T1, T2, T1CE, and FLAIR), future studies could combine these multimodal sequences with other special- ized MRI sequences to increase the number of features, assessing for potential enhanced segmentation results. 30 98 Soltaninejad et  al. and Durmo et  al. incorporated fea- tures obtained from diffusion-weighted and diffusion tensor imaging and showed promising results in the auto- mated identification of brain tumors. Including other MRI sequences in publicly available datasets, such as BRATS, could facilitate investigations into the diagnostic value of additional features. Regarding automated detection, we have replicated the findings of Cho et  al.’s systematic review on brain tumor metastasis ; DL had a significantly lower FPR than TML, whilst sensitivity between the two methods remained similar. To the best of our knowledge, there has been no previous evaluation of automated sub-compartmental segmentation of brain tumors. Our study extends confi - dence in DL to tumor segmentation; the DL group achieved Table 1. Continued Method Author Year TP FN FP TN Total Sensitivity Specificity Weighted Specificity Weighted Sensitivity TML Kebir et al. 2019 961 332 805 1630 3728 0.743 0.669 3.547 2.625 TML Lahmiri 2017 20 0 1 29 50 1 0.967 1.496 1.311 TML Simaiya et al. 2017 504 63 111 843 1521 0.889 0.884 3.536 2.598 TML Kalaiselvi et al. 2019 1683 73 154 2740 4650 0.958 0.947 3.539 2.61 TML Kalaiselvi et al. 2019 63 0 7 211 281 1 0.968 2.734 1.917 TML Tejas P and Padma 2021 74 6 0 20 100 0.925 1 2.749 1.875 DL, deep learning; FN, False Negative; FP, false Positive; TML, Traditional Machine Learning; TN, True Negative; TP, True Positive. Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation AB DL Detection methods (20 tables) TML Detection methods (49 tables) 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 HSROC curve HSROC curve Sensitivity Sensitivity Summary estimate Summary estimate 97.4% (95% CI 95.8–98.4) 98.6% (95% CI 97.4–99.2) 95% Confidence region 95% Confidence region False positive rate: False positive rate: 95% Predictive region 95% Predictive region 1.8% (95% CI 1.1–2.8) 4.8% (95% CI 3.2–7.2) 0.0 Data 0.0 Data 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False positive rate (1 - Specificity) False positive rate (1 - Specificity) Figure 2. Hierarchical receiving operating curves (ROC) of (A) deep learning (DL) and (B) traditional machine learning (TML) studies included in detection meta-analysis. “good” (DSC ≥ 0.7) performance for all segmentation segmentation statistically outperformed automated seg- types (WT, TC, ET), whereas for TML, “good” performance mentation for HGG, both achieved “good” performance was limited to WT segmentation. This trend persisted (DSC ≥ 0.7). On the other hand, for LGG tumors, manual with sensitivity analysis investigating only externally val- and automated segmentation were statistically compa- idated studies, reinforcing these results. DL techniques rable in terms of performance; however, only manual seg- support the automatic identification of complex features mentation achieved “good” performance. This could be unlike TML, which requires hand-crafted feature vectors. because LGGs can simply conform to normal anatomy (eg, However, the advantages of DL remain ambiguous, due to expanding gyri), making them difficult to diagnose, espe - its “black box” nature; the interpretability of learned fea- cially when small. This further highlights the need for fu- tures and the explainability of the model’s decisions could ture work on improving machine learning performance to 3,4 be improved. Certain methods, such as saliency maps or segment LGG more accurately to achieve comparable re- feature attribution attempt to deduce how these learning sults to that of manual segmentation. algorithms detect complex features. However, just 2.1% Reporting guidelines reinforce robust evaluation and (n = 5) of studies reported such methods, hindering model generalizability of diagnostic models. The recent CLAIM interpretation. This highlights the importance of future checklist, developed on the foundations of earlier well-es- work reporting DL interpretation to improve comprehen- tablished guidelines, is the first to address AI applications sion and transparency of algorithmic predictions. in medical imaging. This is the first study to adopt this per - Van Kempen et  al. reported good performance of ma- tinent guideline for the comprehensive assessment of re- chine learning algorithms for glioma WT segmentation, porting quality for brain tumor identification. Although also showing that automated segmentation for both HGG over 70% of studies detailed data sources, model design, and LGG were comparable. Our subgroup analysis, strati- and ground truth definitions, only a minority reported fied by tumor type, showed “good” performance, and no missing data handling, data partitioning, study participant statistically significant difference between tumor types flow, and external validation. This is consistent with Yusuf for WT segmentation. However, this was not consistent et  al.’s systematic review which found poor reporting of for TC segmentation; both HGG and LGG tumors did not the study participant flow, the distribution of disease se- reach “good” performance as was evident for WT. This is verity, and model validation techniques within ML-based clinically pertinent, because of the aforementioned value diagnosis models. Such findings reiterate the necessity for of reliable automated sub-compartmental segmentation in studies to employ guidelines to aid their interpretation and treatment pathways. HGG TC segmentation performance reusability. This is paramount in ensuring reliable research was found to be significantly better than LGG. This may be is the basis of pioneering novel techniques into clinical due to LGG’s slow growth, lack of surrounding vasogenic practice. edema, and poor enhancement on MRI, making LGGs radi- The absence of external validation jeopardizes the ologically more difficult to identify. Moreover, HGGs are generalizability of models for clinical use. Our study highly proliferative tumors resulting in higher lesion con- highlights such a limitation, with only 41.3% (n  =  19/46) trast and enhancement, making them radiologically more of segmentation and 2.6% (n = 1/38) of detection studies noticeable. This study shows that although manual WT in the meta-analysis undertaking external validation. Sensitivity Sensitivity All studies Externally validated only All studies All studies HGG studies only LGG studies only Kouli et al. Automated brain tumor detection and segmentation WT TC ET WT TC All DL TML 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pooled DSC WT TC ET HGG LGG MET 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pooled DSC WT TC WT WT Automated Human 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Pooled DSC Figure 3. Segmentation meta-analysis for (A) all studies and externally validated only studies, stratified by deep learning (DL) and traditional ma- chine learning (TML), (B) subgroup segmentation meta-analysis by tumor type (high-grade glioma [HGG], low-grade glioma [LGG], and metastatic brain tumor [MET]), and (C) automated versus human segmentation. To address this, we performed a sensitivity analysis on ensure that future studies externally validate their ma- segmentation models that were externally validated, chine learning algorithms, authors should utilize the which showed similar results to the original analysis. To CLAIM guideline when reporting their study. In addition, Segmentation type Segmentation type Segmentation type Neuro-Oncology Advances All studies Externally validated only All studies All studies HGG studies only LGG studies only Kouli et al. Automated brain tumor detection and segmentation journals should encourage authors to provide details writing of the manuscript. O.K., K.H.-I., and J.D.S. guarantee the about elements of reporting outlined CLAIM for edi- integrity of the work. A tors and reviewers during the assessment of AI-related manuscripts in medical imaging. Secondly, high heter- WT ogeneity was observed which may be due to methodo- logical diversity in machine learning techniques. Thirdly, only a quarter of included studies were eligible for meta- TC References analysis because of inadequate reporting, particularly the uncertainty values of performance metrics, thus ET compromising data availability. This issue has been rec- 1. Lapointe  S, Perry  A, Butowski  NA. Primary brain tumours in adults. ognized by non-neuro-oncology systematic reviews. Lancet. 2018; 392(10145):432–446. Fourthly, most studies failed to report manual segmen- 2. Porz N, Bauer S, Pica A, et al. Multi-modal glioblastoma segmentation: WT tation results, impeding a direct comparison of the man versus machine. PLoS One. 2014; 9(5):e96873. techniques. To promote standardization of ground-truth 3. LeCun  Y, Bengio  Y, Hinton  G. Deep learning. Nature. 2015; images for training AI algorithms, experts should utilize 521(7553):436–444. TC 100 structured reporting during manual segmentation. 4. Montavon G, Samek W, Müller KR. Methods for interpreting and under- All DL TML Finally, most studies tested and trained their algorithms standing deep neural networks. Digit Signal Process. 2018; 73:1–15. on open-access datasets. We propose that available au- 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 5. Yusuf M, Atal I, Li J, et al. Reporting quality of studies using machine tomated algorithms be applied to prospective, routinely Pooled DSC learning models for medical diagnosis: a systematic review. BMJ Open. collected MRI data to assess performance and feasibility 2020; 10(3):e034568. for use in daily clinical practice. 6. Mongan  J, Moy  L, Kahn Jr CE. Checklist for Artificial Intelligence in To conclude, we found promising results for the use of AI Medical Imaging (CLAIM): a guide for authors and reviewers. WT algorithms in brain tumor identification and highlight the 7. Menze BH, Jakab A, Bauer S, et al. The multimodal Brain Tumor Image areas for future research. Further improvements to study Segmentation Benchmark (BRATS). IEEE Trans Med Imaging. 2014; design are needed, with adherence to reporting guidelines, 34(10):1993–2024. which will avail transparent evaluation and generalizability 8. Cho SJ, Sunwoo L, Baik SH, et al. Brain metastasis detection using ma- of diagnostic AI models. TC chine learning: a systematic review and meta-analysis. Neuro-oncology. 2021; 23(2):214–225. 9. van  Kempen  EJ, Post  M, Mannil  M, et  al. Performance of machine learning algorithms for glioma segmentation of brain MRI: a systematic Supplementary Material literature review and meta-analysis. Eur Radiol. 2021; 31(12):9638–9653. ET 10. Zheng  Q, Yang  L, Zeng  B, et  al. Artificial intelligence performance in Supplementary material is available at Neuro-Oncology HGG LGG MET detecting tumor metastasis from medical radiology imaging: a system- Advances online. atic review and meta-analysis. EClinicalMedicine. 2021; 31:100669. 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 11. Salameh JP, Bossuyt PM, McGrath TA, et al. Preferred reporting items for Pooled DSC C systematic review and meta-analysis of diagnostic test accuracy studies (PRISMA-DTA): explanation, elaboration, and checklist. BMJ. 2020; 370. WT Keywords 12. Whiting PF, Rutjes AW, Westwood ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern artificial intelligence | brain tumor | machine learning | meta- TC Med. 2011; 155(8):529–536. analysis | segmentation. 13. Zou KH, Warfield SK, Bharatha A, et al. Statistical validation of image segmentation quality based on a spatial overlap index1: scientific re - ports. Acad Radiol. 2004; 11(2):178–189. WT 14. Harbord RM, Deeks JJ, Egger M, et al. A unification of models for meta- Funding analysis of diagnostic accuracy studies. Biostatistics. 2007; 8(2):239–251. 15. Sanjuán A, Price CJ, Mancini L, et al. Automated identification of brain This work was supported by the SINAPSE innovation fund. The tumors from single MR images based on segmentation with refined funders had no role in study design, analysis, and interpretation, WT patient-specific priors. Front Neurosci. 2013; 7:241. or writing. 16. Wu W, Chen AY, Zhao L, et al. Brain tumor detection and segmentation in Automated Human a CRF (conditional random fields) framework with pixel-pairwise affinity 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 and superpixel-level features. Int J Comput Assist Radiol Surg. 2014; Pooled DSC 9(2):241–253. Conflict of interest statement . The authors declare no con- 17. Dvorak  P, Bartusek  K, Kropatsch  WG, et  al. Automated multi-contrast flicts of interest. brain pathological area extraction from 2D MR images. J Appl Res Figure 3. Segmentation meta-analysis for (A) all studies and externally validated only studies, stratified by deep learning (DL) and traditional ma - Technol. 2015; 13(1):58–69. chine learning (TML), (B) subgroup segmentation meta-analysis by tumor type (high-grade glioma [HGG], low-grade glioma [LGG], and metastatic 18. Steed  TC, Treiber  JM, Patel  KS, et  al. Iterative probabilistic voxel brain tumor [MET]), and (C) automated versus human segmentation. labeling: automated segmentation for analysis of The Cancer Authorship statement. O.K., K.H.-I., and J.D.S.  contributed Imaging Archive glioblastoma images. Am J Neuroradiol. 2015; to study conception and design. O.K., A.H., D.B., K.H.-I., and 36(4):678–685. J.D.S.  contributed to literature search, data extraction, and 19. Hasan  AM, Meziane  F, Aspin  R, et  al. Segmentation of brain tumors quality assessment of studies. All authors had access to the in MRI images using three-dimensional active contour without edge. raw data, and contributed to data analysis, interpretation, and Symmetry. 2016; 8(11):132. Segmentation type Segmentation type Segmentation type Kouli et al. Automated brain tumor detection and segmentation 20. Ilunga-Mbuyamba  E, Avina–Cervantes  JG, Garcia–Perez  A, et  al. 40. Mlynarski  P, Delingette  H, Criminisi  A, et  al. 3D convolutional neural Localized active contour model with background intensity compensation networks for tumor segmentation using long-range 2D context. Comput applied on automatic MR brain tumor segmentation. Neurocomputing. Med Imaging Graph. 2019; 73:60–72. 2017; 220:84–97. 41. Mallick PK, Ryu SH, Satapathy SK, et al. Brain MRI image classification 21. Thiruvenkadam  K, Perumal  N. Fully automatic method for segmenta- for cancer detection using deep wavelet autoencoder-based deep neural tion of brain tumor from multimodal magnetic resonance images using network. IEEE Access. 2019; 7:46278–46287. wavelet transformation and clustering technique. Int J Imaging Syst 42. Tchoketch Kebir S, Mekaoui S, Bouhedda M. A fully automatic method- Technol. 2016; 26(4):305–314. ology for MRI brain tumour detection and segmentation. Imaging Sci J. 22. Liu  Y, Stojadinovic  S, Hrycushko  B, et  al. Automatic metastatic brain 2019; 67(1):42–62. tumor segmentation for stereotactic radiosurgery applications. Phys 43. Alagarsamy  S, Kamatchi  K, Govindaraj  V, et  al. Multi-channeled MR Med Biol. 2016; 61(24):8440. brain image segmentation: a new automated approach combining BAT 23. Li Y, Jia F, Qin J. Brain tumor segmentation from multimodal magnetic reso- and clustering technique for better identification of heterogeneous tu - nance images via sparse representation. Artif Intell Med. 2016; 73:1–13. mors. Biocybern Biomed Eng. 2019; 39(4):1005–1035. 24. Soltaninejad M, Yang G, Lambrou T, et al. Automated brain tumour detec- 44. Shivhare SN, Kumar N, Singh N. A hybrid of active contour model and tion and segmentation using superpixel-based extremely randomized trees convex hull for automated brain tumor segmentation in multimodal MRI. in FLAIR MRI. Int J Comput Assist Radiol Surg. 2017; 12(2):183–203. Multimedia Tools Appl. 2019; 78(24):34207–34229. 25. Imtiaz T, Rifat S, Fattah SA, et al. Automated brain tumor segmentation 45. Kalaiselvi  T, Kumarashankar  P, Sriramakrishnan  P. Three-phase auto- based on multi-planar superpixel level features extracted from 3D MR matic brain tumor diagnosis system using patches based updated run images. IEEE Access. 2019; 8:25335–25349. length region growing technique. J Digit Imaging. 2020; 33(2):465–479. 26. Kaur T, Saini BS, Gupta S. A novel fully automatic multilevel thresholding 46. Wu Y, Zhao Z, Wu W, et al. Automatic glioma segmentation based on technique based on optimized intuitionistic fuzzy sets and tsallis entropy adaptive superpixel. BMC Med Imaging. 2019; 19(1):1–4. for MR brain tumor image segmentation. Australas Phys Eng Sci Med. 47. Rehman ZU, Naqvi SS, Khan TM, et al. Fully automated multi-parametric 2018; 41(1):41–58. brain tumour segmentation using superpixel based classification. Expert 27. Liu  Y, Stojadinovic  S, Hrycushko  B, et  al. A deep convolutional neural Syst Appl. 2019; 118:598–613. network-based automatic delineation strategy for multiple brain metas- 48. Zhou C, Ding C, Wang X, et al. One-pass multi-task networks with cross- tases stereotactic radiosurgery. PLoS One. 2017; 12(10):e0185844. task guided attention for brain tumor segmentation. IEEE Trans Image 28. Essadike  A, Ouabida  E, Bouzid  A. Brain tumor segmentation with Process. 2020; 29:4516–4529. Vander Lugt correlator based active contour. Comput Methods Programs 49. Khan H, Shah PM, Shah MA, et al. Cascading handcrafted features and Biomed. 2018; 160:103–117. Convolutional Neural Network for IoT-enabled brain tumor segmenta- 29. Pinto  A, Pereira  S, Rasteiro  D, et  al. Hierarchical brain tumour seg- tion. Comput Commun. 2020; 153:196–207. mentation using extremely randomized trees. Pattern Recognit. 2018; 50. Xue  J, Wang  B, Ming  Y, et  al. Deep learning–based detection and 82:105–117. segmentation-assisted management of brain metastases. Neuro- 30. Soltaninejad  M, Yang  G, Lambrou  T, et  al. Supervised learning based oncology. 2020; 22(4):505–514. multimodal MRI brain tumour segmentation using texture features from 51. Thiruvenkadam  K, Nagarajan  K. Fully automatic brain tumor extrac- supervoxels. Comput Methods Programs Biomed. 2018; 157:69–84. tion and tissue segmentation from multimodal MRI brain images. Int J 31. Charron O, Lallement A, Jarnet D, et al. Automatic detection and seg- Imaging Syst Technol. 2021; 31(1):336–350. mentation of brain metastases on multimodal MR images with a deep 52. Ben  naceur  M, Akil  M, Saouli  R, et  al. Fully automatic brain tumor convolutional neural network. Comput Biol Med. 2018; 95:43–54. segmentation with deep learning-based selective attention using 32. Li Q, Gao Z, Wang Q, et al. Glioma segmentation with a unified algorithm overlapping patches and multi-class weighted cross-entropy. Med in multimodal MRI images. IEEE Access. 2018; 6:9543–9553. Image Anal. 2020; 63:101692. 33. Grøvik E, Yi D, Iv M, et al. Deep learning enables automatic detection 53. Aboelenein  NM, Songhao  P, Koubaa  A, et  al. HTTU-Net: hybrid two and segmentation of brain metastases on multisequence MRI. J Magn track U-net for automatic brain tumor segmentation. IEEE Access. 2020; Reson Imaging. 2020; 51(1):175–182. 8:101406–101415. 34. Eltayeb EN, Salem NM, Al-Atabany W. Automated brain tumor segmen- 54. Hassen OA, Abter SO, Abdulhussein AA, et al. Nature-inspired level set tation from multi-slices FLAIR MRI images. BioMed Mater Eng. 2019; segmentation model for 3D-MRI brain tumor detection. CMC Comput 30(4):449–462. Mater Contin. 2021; 68(1):961–981. 35. Wang  G, Li  W, Ourselin  S, et  al. Automatic brain tumor segmentation 55. Kao  PY, Shailja  S, Jiang  J, et  al. Improving patch-based convolutional based on cascaded convolutional neural networks with uncertainty esti- neural networks for MRI brain tumor segmentation by leveraging loca- mation. Front Comput Neurosci. 2019; 13:56. tion information. Front Neurosci. 2020; 13:1449. 36. Li  H, Li  A, Wang  M. A novel end-to-end brain tumor segmentation 56. Debnath  S, Talukdar  FA, Islam  M. Combination of contrast enhanced method using improved fully convolutional networks. Comput Biol Med. fuzzy c-means (CEFCM) clustering and pixel based voxel mapping tech- 2019; 108:150–160. nique (PBVMT) for three dimensional brain tumour detection. J Ambient 37. Tong J, Zhao Y, Zhang P, et al. MRI brain tumor segmentation based on Intell Hum Comput. 2021; 12(2):2421–2433. texture features and kernel sparse coding. Biomed Signal Proc Control. 57. Baid  U, Talbar  S, Rane  S, et  al. A novel approach for fully automatic 2019; 47:387–392. intra-tumor segmentation with 3D U-Net architecture for gliomas. Front 38. Dogra J, Jain S, Sood M. Glioma extraction from MR images employing Comput Neurosci. 2020; 14:10. gradient based kernel selection graph cut technique. Vis Comput. 2020; 58. Mitchell JR, Kamnitsas K, Singleton KW, et al. Deep neural network to 36(5):875–891. locate and segment brain tumors outperformed the expert technicians 39. Sriramakrishnan  P, Kalaiselvi  T, Rajeswaran  R. Modified local ternary who created the training data. J Med Imaging. 2020; 7(5):055501. patterns technique for brain tumour segmentation and volume es- 59. Sran PK, Gupta S, Singh S. Integrating saliency with fuzzy thresholding timation from MRI multi-sequence scans with GPU CUDA machine. for brain tumor extraction in MR images. J Vis Commun Image Biocybern Biomed Eng. 2019; 39(2):470–487. Represent. 2021; 74:102964. Neuro-Oncology Advances Kouli et al. Automated brain tumor detection and segmentation 60. Takahashi S, Takahashi M, Kinoshita M, et al. Fine-tuning approach for 80. Çinar  A, Yildirim  M. Detection of tumors on brain MRI images using segmentation of gliomas in brain magnetic resonance images with a the hybrid convolutional neural network architecture. Med Hypotheses. machine learning method to normalize image differences among facil- 2020; 139:109684. ities. Cancers. 2021; 13(6):1415. 81. Devanathan B, Venkatachalapathy K. Brain tumor detection and classi- 61. Bahadure  NB, Ray  AK, Thethi  HP. Image analysis for MRI based brain fication model using optimal Kapur’s thresholding based segmentation tumor detection and feature extraction using biologically inspired BWT with deep neural networks. IIOABJ. 2020; 11:1–8. and SVM. Int J Biomed Imaging. 2017; 2017. 82. Gurunathan  A, Krishnan  B. Detection and diagnosis of brain tumors 62. Gupta  N, Khanna  P. A non-invasive and adaptive CAD system to de- using deep learning convolutional neural networks. Int J Imaging Syst tect brain tumor from T2-weighted MRIs using customized Otsu’s Technol. 2021; 31(3):1174–1184. thresholding with prominent features and supervised learning. Signal 83. Wang J, Shao W, Kim J. Automated classification for brain MRIs based Process Image Commun. 2017; 59:18–26. on 2D MF-DFA method. Fractals. 2020; 28(06):2050109. 63. Abd-Ellah MK, Awad AI, Khalaf AA, et al. Two-phase multi-model auto- 84. Kesav OH, Rajini GK. Automated detection system for texture feature matic brain tumour diagnosis system from magnetic resonance images based classification on different image datasets using S-transform. Int using convolutional neural networks. EURASIP J Image Video Process. J Speech Technol. 2021; 24(2):251–258. 2018; 2018(1):1–0. 85. Murali  E, Meena  K. Brain tumor detection from MRI using adaptive 64. Amin J, Sharif M, Raza M, et al. Brain tumor detection using statistical thresholding and histogram based techniques. Scalable Comput Pract and machine learning method. Comput Methods Programs Biomed. Exper. 2020; 21(1):3–10. 2019; 177:69–79. 86. Kaur  T, Gandhi  TK. Deep convolutional neural networks with transfer 65. Rai HM, Chatterjee K, Dashkevich S. Automatic and accurate abnormality learning for automated brain image classification. Mach Vis Appl. 2020; detection from brain MR images using a novel hybrid UnetResNext-50 deep 31(3):1–6. CNN model. Biomed Signal Proc Control. 2021; 66:102477. 87. Thangarajan SK, Chokkalingam A. Integration of optimized neural net- 66. Jayachandran A, Dhanasekaran R. Automatic detection of brain tumor work and convolutional neural network for automated brain tumor de- in magnetic resonance images using multi-texton histogram and support tection. Sensor Rev. 2021; 41(1):16–34. vector machine. Int J Imaging Syst Technol. 2013; 23(2):97–103. 88. Kalaiselvi T, Padmapriya T, Sriramakrishnan P, et al. Development of au- 67. Jayachandran  A, Dhanasekaran  R. Brain tumor severity analysis using tomatic glioma brain tumor detection system using deep convolutional modified multi-texton histogram and hybrid kernel SVM. Int J Imaging neural networks. Int J Imaging Syst Technol. 2020; 30(4):926–938. Syst Technol. 2014; 24(1):72–82. 89. Rajinikanth V, Joseph Raj AN, Thanaraj KP, et al. A customized VGG19 68. Dvořák  P, Kropatsch  WG, Bartušek  K. Automatic brain tumor detec- network with concatenation of deep and handcrafted features for brain tion in t2-weighted magnetic resonance images. Meas Sci Rev. 2013; tumor detection. Appl Sci. 2020; 10(10):3429. 13(5):223–230. 90. Huang Z, Xu H, Su S, et al. A computer-aided diagnosis system for brain 69. Amin J, Sharif M, Yasmin M, et al. A distinctive approach in brain tumor detec- magnetic resonance imaging images using a novel differential feature tion and classification using MRI. Pattern Recognit Lett. 2017; 139:118–127. neural network. Comput Biol Med. 2020; 121:103818. 70. Anitha  R, Siva  Sundhara  Raja  D. Development of computer-aided ap- 91. Chen B, Zhang L, Chen H, et al. A novel extended Kalman filter with support proach for brain tumor detection using random forest classifier. Int J vector machine based method for the automatic diagnosis and segmenta- Imaging Syst Technol. 2018; 28(1):48–53. tion of brain tumors. Comput Methods Programs Biomed. 2021; 200:105797. 71. Lahmiri  S. Glioma detection based on multi-fractal features of seg- 92. Patil DO, Hamde ST. Automated detection of brain tumor disease using mented brain MRI by particle swarm optimization techniques. Biomed empirical wavelet transform based LBP variants and ant-lion optimiza- Signal Proc Control. 2017; 31:148–155. tion. Multimedia Tools Appl. 2021; 80(12):17955–17982. 72. Deepa  AR, Emmanuel  WS. An efficient detection of brain tumor 93. Simaiya S, Lilhore UK, Prasad D, et al. MRI brain tumour detection & using fused feature adaptive firefly backpropagation neural network. image segmentation by hybrid hierarchical K-means clustering with Multimedia Tools Appl. 2019; 78(9):11799–11814. FCM based machine learning model. Ann Romanian Soc Cell Biol. 73. Selvapandian  A, Manivannan  K. Performance analysis of meningioma 2021; 28:88–94. brain tumor classifications based on gradient boosting classifier. Int J 94. Tejas P. A novel hybrid approach to detect brain tumor in MRI images. Imaging Syst Technol. 2018; 28(4):295–301. Turk J Comput Math Educ. 2021; 12(3):3412–3416. 74. Arunkumar N, Mohammed MA, et al. Fully automatic model-based seg- 95. El-Dahshan ES, Mohsen HM, Revett K, et al. Computer-aided diagnosis mentation and classification approach for MRI brain tumor using artifi - of human brain tumor through MRI: a survey and a new algorithm. cial neural networks. Concurrency Comput Pract Exp. 2020; 32(1):e4962. Expert Syst Appl. 2014; 41(11):5526–5545. 75. Edalati-rad  A, Mosleh  M. Improving brain tumor diagnosis using MRI 96. Liu  X, Faes  L, Kale  AU, et  al. A comparison of deep learning perfor- segmentation based on collaboration of beta mixture model and learning mance against health-care professionals in detecting diseases from automata. Arab J Sci Eng. 2019; 44(4):2945–2957. medical imaging: a systematic review and meta-analysis. Lancet Digit 76. Song G, Huang Z, Zhao Y, et al. A noninvasive system for the automatic Health. 2019; 1(6):e271–e297. detection of gliomas based on hybrid features and PSO-KSVM. IEEE 97. Ghaffari  M, Samarasinghe  G, Jameson  M, et  al. Automated post- Access. 2019; 7:13842–13855. operative brain tumour segmentation: a deep learning model based 77. Johnpeter JH, Ponnuchamy T. Computer aided automated detection and on transfer learning from pre-operative images. Magn Reson Imaging. classification of brain tumors using CANFIS classification method. Int J 2022; 86:28–36. Imaging Syst Technol. 2019; 29(4):431–438. 98. Durmo  F, Lätt  J, Rydelius  A, et  al. Brain tumor characterization using 78. Alam MS, Rahman MM, Hossain MA, et al. Automatic human brain tumor multibiometric evaluation of MRI. Tomography. 2018; 4(1):14–25. detection in MRI image using template-based K means and improved fuzzy 99. Shad  R, Cunningham  JP, Ashley  EA, et  al. Designing clinically trans- C means clustering algorithm. Big Data Cogn Comput. 2019; 3(2):27. latable artificial intelligence systems for high-dimensional medical im - 79. Atici MA, Sagiroglu S, Celtikci P, et al. A novel deep learning algorithm aging. Nat Mach Intell. 2021; 3(11):929–935. for the automatic detection of high-grade gliomas on T2-weighted mag- 100. Dos Santos DP, Brodehl S, Baeßler B, et al. Structured report data can netic resonance images: a preliminary machine learning study. Turk be used to develop deep learning algorithms: a proof of concept in Neurosurg. 2020; 30(2):199–205. ankle radiographs. Insights Imaging. 2019; 10(1):1–8.

Journal

Neuro-Oncology AdvancesOxford University Press

Published: May 27, 2022

Keywords: artificial intelligence; brain tumor; machine learning; meta-analysis; segmentation

There are no references for this article.