Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A tree-based intelligence ensemble approach for spatial prediction of potential groundwater

A tree-based intelligence ensemble approach for spatial prediction of potential groundwater INTERNATIONAL JOURNAL OF DIGITAL EARTH 2020, VOL. 13, NO. 12, 1408–1429 https://doi.org/10.1080/17538947.2020.1718785 A tree-based intelligence ensemble approach for spatial prediction of potential groundwater a a b,c d Mohammadtaghi Avand , Saeid Janizadeh , Dieu Tien Bui , Viet Hoa Pham , e f Phuong Thao T. Ngo and Viet-Ha Nhu a b Faculty of Natural Resources and Marine Sciences, Tarbiat Modares University, Tehran, Iran; Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam; Ho Chi Minh City Institute of Resources Geography, Vietnam Academy of Science and Technology, Ho Chi Minh City, Vietnam; Institute of Research and Development, Duy Tan University, Da Nang, Vietnam; Department of Geological-Geotechnical Engineering, Hanoi University of Mining and Geology, Hanoi, Vietnam ABSTRACT ARTICLE HISTORY Received 9 September 2019 The objective of this research is to propose and confirm a new machine Accepted 16 January 2020 learning approach of Best-First tree (BFtree), AdaBoost (AB), MultiBoosting (MB), and Bagging (Bag) ensembles for potential KEYWORDS groundwater mapping and assessing role of influencing factors. The Environmental modeling; Yasuj-Dena area (Iran) is selected as a case study. For this regard, a groundwater potential; GIS; Yasuj-Dena database was established with 362 springs locations and 12 ensemble model; decision groundwater-influencing factors (slope, aspect, elevation, stream power tree index (SPI), length of slope (LS), topographic wetness index (TWI), topographic position index (TPI), land use, lithology, distance from fault, distance from river, and rainfall). The database was employed to train and validate the proposed groundwater models. The area under the curve (AUC) and statistical metrics were employed to check and confirm the quality of the models. The result shows that the BFTree-Bag model (AUC = 0.810, kappa = 0.495) has the highest prediction performance, followed by the BFTree-MB model (AUC = 0.785, kappa = 0.477), and the BFTree-MB model (AUC = 0.745, kappa = 0.422). Compared to the benchmark of Random Forests, the BFTree-Bag model performs better; therefore, we conclude that the BFtree-Bag is a new tool should be used for modeling of groundwater potential. 1. Introduction Water below the surface, which accounts for nearly 30% of the freshwater worldwide (Lo et al. 2016), has a particularly important role to human consumption, socio-economic development, and ecologi- cal processes (Bui et al. 2018; Kooy, Walter, and Prabaharyaka 2018; Lv, Ling et al. 2019). However, due to population growth and industrial development (de Graaf et al. 2019; Zhang et al. 2019), groundwater withdrawals are much higher than their natural rates. Thus, over-exploitations of groundwater have reached an alarming rate at many countries, in particular, at arid and semi- arid countries, where the surface water is limited (Alfarrah and Walraevens 2018; Kammoun et al. 2018; Cavalcante Júnior et al. 2019; Razzaq et al. 2019; Suryanarayana and Mahammood 2019); therefore, accurately determination of groundwater potential is considered as a critical issue of the groundwater sustainable strategies to protect and manage this vital resource. This has clearly stated by the United Nations in the world water development report (Connor 2015). CONTACT Dieu Tien Bui [email protected] Ton DucThang University, 119 Nguyen Huu Tho street, Tan Phong ward, District 7, Ho Chi Minh City, Vietnam © 2020 Informa UK Limited, trading as Taylor & Francis Group INTERNATIONAL JOURNAL OF DIGITAL EARTH 1409 A literature review shows that accurate identification of groundwater potential is still difficult. Although surface water sources penetrate the earth’s surface through penetrations and fractures of the earth, the availability of groundwater also depends on the type and physical properties of rocks, including porosity, permeability, portability, and storage capacity. Besides, other factors, i.e. elevation, lithology, slope, aspect, land use, river network density, faults, and soil play important roles (Naghibi et al. 2016; Rahmati et al. 2018; Moghaddam et al. 2020). Moreover, the occurrence of intermittent and prolonged droughts and high weather fluctuations are other factors affecting the determination of potential groundwater areas (Ziolkowska and Reyes 2017; Bloomfield, Marchant, and Mckenzie 2019). These make the fact that the identification of groundwater potential areas with high accuracy is complex and requires much time as well as labor costs. To identify groundwater potential areas, many methods and techniques have been proposed, and among them, geophysical techniques (Worthington 1977; Okereke, Esu, and Edet 1998; Hasan et al. 2018), and hydrogeological methods (Panagopoulos, Antonakos, and Lambrakis 2006; Amaya et al. 2018; Gu et al. 2018; Nsiah, Appiah-Adjei, and Adjei 2018), and geology methods (Chilton and Foster 1995; Kim and Hamm 1999; Gheith and Sultan 2002; Gulden et al. 2007) are the most widely used. However, extensive field surveys with drilling are required, which are time-consuming and costly. For large areas, geostatistics and Geographic information system (GIS) have been considered, including (i) analytical techniques, i.e. Inverse distance weighted and ordinal Kriging methods (Kumar 2006), which is also called the first law of geography for spatial prediction (Tobler 2004); (ii) spatial heterogeneity related techniques, i.e. universal Kriging (Kambhammettu, Allena, and King 2011) and Box–Cox Kriging (Varouchakis, Hristopulos, and Karatzas 2012) and they have been named as the second law of geography for spatial prediction; and (ii) techniques based on simi- larities in geographic environment, which refer to the third law of geography (Zhu et al. 2018), i.e. self- organizing maps (Rezaei, Ahmadzadeh, and Safavi 2017; Fang et al. 2019). Overall, both the analytical techniques and spatial heterogeneity related techniques require sufficient samples in order to obtain reliable results, whereas the last one needs more extensive studies to have reasonable conclusions. Thus, literature review shows that statistical techniques are the most popular used, i.e. frequency ratio (Ozdemir 2011), logistic regression (LR) (Ozdemir 2011; Chen et al. 2018), maximum entropy (Rahmati et al. 2016), evidential belief function (EBF) (Amiri et al. 2019; Chen, Pradhan et al. 2019; Tahmassebipoor et al. 2016), advanced decision trees (Naghibi and Pourghasemi 2015), generalized additive model (Falah et al. 2016), weight of evidence (WoE) (Ghorbani Nejad et al. 2016). The criti- cal issue of using these techniques is the ability to characterizing the relationship of the groundwater potential and its geo-environmental variables. Nevertheless, the accuracy of the potential ground- water models is not always satisfied. Recently, innovations of Geographic information system (GIS) and machine learning has pro- vided new and powerful tools for groundwater potential modeling. GIS provides a geospatial plat- form for handling multiple groundwater-related factors, whereas, machine learning is capable of exploring nonlinear relationships and mining hidden patterns of groundwater data employed (Bar- zegar et al. 2018). Consequently, various artificial intelligent approaches have been successfully pro- posed, i.e. neural networks (Corsini, Cervi, and Ronchetti 2009), CART (Naghibi and Pourghasemi 2015), regression tree with booting (BRT) (Golkarian et al. 2018; Naghibi et al. 2016), logistic model trees (Rahmati et al. 2018), C5.0 (Golkarian et al. 2018), multivariate adaptive regression spline (Ara- bameri et al. 2019), random forest (Rahmati et al. 2019), and support vector machines (Chen, Tsan- garatos et al. 2019). Overall, the prediction capability of the groundwater potential maps has improved significantly, but no single method is the best for all areas. In more recent years, hybrid approaches that combine two or more methods and techniques have been considered for potential groundwater modeling, i.e. genetic-based random forest (Naghibi, Ahmadi, and Daneshi 2017), ensemble of LR and WoE (Chen, Li et al. 2018), EBF-BRT (Kordestani et al. 2019), hybridization of Fisher function and rotation forest (Chen, Pradhan et al. 2019), bag- ging-DRASTIC (Barzegar et al. 2019), and Bagging-Decision Stump (Pham et al. 2019), logistic regression-based multi-adaptive boosting ensemble (Rizeei et al. 2019), Self-Learning Framework 1410 M. AVAND ET AL. based random forest (Sameen, Pradhan, and Lee 2019), metaheuristic based neural fuzzy (Chen, Panahi et al. 2019), tree-based rotation forest ensembles (Naghibi et al. 2019). The prominent result is that the performance of the groundwater potential models has been enhanced significantly; there- fore, more exploration of new ensemble approaches for groundwater potential modeling should be carried out. The aim of this research is, therefore, to expand the body of groundwater modeling by proposing and affirming a new machine learning approach, which is based on ensembles of the Best-First tree (BFtree), AdaBoost, MultiBoost, and Bagging for the mapping of groundwater potential. The BFtree is a relatively new tree intelligence algorithm that has proven efficient for classification purposes (Jegadeeshwaran and Sugumaran 2013; Chen, Zhang et al. 2018), whereas AdaBoost, MultiBoost, and Bagging are powerful machine learning ensembles. Thus, to the best of our knowledge, explora- tion of these methods has not been carried out for groundwater modeling. The Yasuj-Dena area (Iran) is selected as a case study. This is a typical mountainous highland area in Iran, where ground- water plays a vital role; however, no study on groundwater potential has been carried out. Finally, the result was compared to a benchmark of Random Forests and conclusions were given. 2. Study area and data used 2.1. Geographical summary of the study area ′ ′′ The Yasuj-Dena area belongs to the middle-west region of Iran, between longitudes 51°4 30 and ′ ′′ ′ ′′ ′ ′′ 51°55 5 , and between latitudes 31°6 32 and 31°16 4 , covering an area of 2159.9 km . Due to the geographic location, this area is a place for the arrival of the western and southern air masses. This is a hilly area with the elevation varies from 1346.1 m to 4407.1 m above the sea level. The high- est point is the Dena peak with an altitude of 4409.1 m, whereas the lowest point is the Lishtar plain with an altitude of 1346.1 m. Hydrologically, the upstream of the Yasuj-Dena area is the source of important rivers, providing essential water resources for people, who mainly occupy in lower plains with irrigated farming. Besides, springs have a massive share in downstream drinking water and play a vital role in the devel- opment of tourism and the creation of strong tourist attractions in the area. The rainfall in theseareas is concentrated mainly from November to May. The total yearly rainfall varies from 300 mm to 800 mm. It is noted that the large part of the highland in this area is covered by permanent glaciers. Geologically, the study area is dominated by calcareous formations (Asmari, Sarvak, and Bakh- teyari) and Quaternary sediments (Khazaei, Padyab, and Feyznia 2013). Dena is the main fault in this area stretching from northwest to southeast, which is a part of the High Zagros Fault system in Iran (Bachmanov et al. 2004; Sepehr and Cosgrove 2005). 2.2. Groundwater spring inventory To identify groundwater potential areas, groundwater springs inventories are essential information that should be collected. These inventories can be correlated to springs influencing factors to explore relationships between these factors and groundwater potential areas (Oh et al. 2011; Rahmati et al. 2018). In this research, a total of 362 spring locations for the Yasuj-Dena (Figure 1) were identified and mapped based on the documents provided by the Iranian Ministry of Water Resources in 2018. Our fieldwork with statistical analysis of these springs showed that the level of groundwater increases from March to April yearly, but it decreases from August to February. Discharge of these springs 3 −1 3 −1 varies from 1 m s in winter to 10 m s in the summer. 2.3. Groundwater influencing factor Determination of groundwater influencing factors is an important task which influences the result of potential groundwater maps (Rahmati et al. 2018; Miraki et al. 2019); therefore, they should be INTERNATIONAL JOURNAL OF DIGITAL EARTH 1411 Figure 1. Location of the Yasuj-Dena and the spring locations. carefully selected. In this research, a total of 12 influencing factors were considered: slope ( ), aspect, elevation (m), stream power index (SPI), length of slope (LS), topographic wetness index (TWI), topographic position index (TPI), land use, lithology, distance from fault (m), distance from river (m), and rainfall (mm). A 30 m resolution DEM (digital elevation model) for the Yasuj-Dena, which was generated from the ALOS sensor (Rosenqvist et al. 2007) and are available at the JAXA website (JAXA 2019). Using this DEM, seven factors were derived:slope, aspect,elevation,SPI,LS,TWI, and TPI. The slope should be selected for potential groundwater modeling because it influences water accumulation (Bouwer 2002), which relates to groundwater recharges. Aspect presents slope directions that influence the amount of rainfall, solar radiation, wind speed, and land cover (Solomon and Quiel 2006), which indirectly affect to amount of water infiltrating to the earth, and thus, influencing groundwater. The slope map and aspect map are presented in Figure 2(a and b), respectively. The elevation is considered because the altitude of topography controls the speed surface runoff direction at ground level; therefore, it influences water perme- abilities in the layers of the earth (Zhang and Li 2009). The elevation map in this research is shown in Figure 2d. SPI measures the destructive power of the water flow of the catchment (Chen, Li et al. 2018); therefore, it is considered as an influencing factor for potential groundwater modeling. In this research, SPI map (Figure 2d) was computed based on the following formula: SPI = AS∗ tan b (1) where AS is the watershed area and b is the local slope. 1412 M. AVAND ET AL. Figure 2. Groundwater influencing factors: (a) Slope, (b) Aspect, (c) Elevation, (d) SPI, (e) LS, (f) TWI, (g) TPI, (h) Land use, (i) Lithol- ogy, (j) Distance from fault; (k) Distance from river, and (l) Rainfall. Regarding LS, it should be considered for this analysis because LS influences rates of surface flow (Klute, Scott, and Whisler 1965), and as the length of the slope increases, more water will accumu- late, which relates to processes of water infiltrations on the earth. In this research, LS map (Figure 2e) was computed using the below equation: 0.6 1.3 AS sin a LS = (2) 22.13 0.0896 where AS is the watershed area and α is the upslope area. TWI is one of the important factors because it relates to soil moisture, saturation areas, and flow accumulation (Kalantar et al. 2019), which influence groundwater. In this analysis, the TWI map (Figure 2f) was estimated using the following equation (Beven et al. 1984): TWI = ln (3) tan b where a is the upslope area, whereas b is the local slope. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1413 Figure 2 Continued TPI describes characteristics of slope position. Thus, topographic ridges are related to the higher positive values, whereas valleys are associated with negative values, and flat areas are referred to as near-zero values (Ågren et al. 2014). These features are related to the accumulation and infiltration degree of water. In this research, TPI map (Figure 2g) is computed using Eq.4 below (Arulbalaji, Pad- malal, and Sreelash 2019): TPI = E − 1/n E (4) 0 n n−1 where E is the elevation of the considered grid, E is the elevation of the grid; n is the total number of 0 n the surrounding grids used. Regarding land use, this is one of the main factors controlling the process of underground water supply. This factor provides not only indicators for groundwater availability, but also indirect information on infiltrations and near-surface waters (Scanlon et al. 2005). Thus, the presence of vegetation may reduce the flow rate and cause more water to penetrate the soil; therefore, land useisselected forthis groundwaterpotentialmodeling.Inthisresearch, land usemap (Figure 2h) of the Yasuj-Dena was derived from Landsat 8 OLI imagery acquired on 15 June 2016 (USGS 2016) using the Maximum likelihood algorithm. Forthisregard, sixdifferent classes, forest, 1414 M. AVAND ET AL. Table 1. Type of geological formations in the study area. Code Name Main lithology 1 E Dolomite and buff dolomitic limestone 2 Jkk Massive limestone 3 K Massive limestone and limestone mixing with marl 4 Mgs Anhydrite, argillaceous limestone, and limestone 5 OMa Limestone and shale 6 Plb Alternating hard of consolidated, massive, feature forming conglomerate, and low-weathering cross-bedded sandstone 7 Q Piedmont fan and valley deposits 8 Tr Grey dolomite, greenish shale, and argillaceous limestone. 9 OE Undivided Asmari and Jahrom formations 10 SpH Rock salt, limestone, brown cherty dolomite, and red sandstone urban, orchard, agriculture, range, and bare land were used based on documents of the local authority. Lithology has been employed widely for predicting potential groundwater resources (Ozdemir 2011; Mukherjee, Singh, and Mukherjee 2012; Fenta et al. 2015). This is because lithology and its related properties such as texture, age, and degree of purity of rocks play an essential role in porosity, permeability, and concentration of groundwater flow inside the rocks. The lithology map (Figure 2i) of the Yasuj-Dena was prepared using the national lithology map at a scale of 1: 1,00,000 (Sepehr and Cosgrove 2005), in which 10 lithological units were used (Table 1). Distance from fault was con- sidered in this analysis because fractures and faults can pass water to the ground layers and prevent water from escaping. Thus, the fault is a crucial factor in identifying groundwater sources. In this research, the fault was extracted from the above national lithology map, and then, was used to derive distance from fault map (Figure 2j). River network, which relates to lithology, has an important role in the availability of groundwater. Thus, river flow has been found affecting recharge of groundwater aquifers, which leads to fluctu- ations in groundwater levels (Zektser and Loaiciga 1993); therefore, distance from river was con- sidered for the groundwater potential analysis. In this analysis, the river network of the Yasuj- Dena was taken from topographic maps at a scale of 1:50,000 (Pourghasemi et al. 2014), and then, it was buffered to obtain distance from river map (Figure 2k). Regarding rainfall, this factor has a significant impact on the potential of groundwater and its productivity (Yu and Lin 2015) because it strongly influences the amount of water penetrating to groundwater systems. In this research, the average yearly rainfall during the last ten years provided by the Iranian Meteorological Organization (Rahmati et al. 2015) was used (Figure 2l). 3. Background of the decision trees and ensemble algorithms used 3.1. Best-First tree algorithm Best-First tree (BFTree) is a relatively new and robust tree-based learning algorithm, which is initially proposed by Friedman, Hastie, and Tibshirani (2000) and then improved by Haijian (2007). Structurally, BFTree has three types of nodes, a root node, internal nodes, and leaves. Tree growing of the BFTree algorithm is followed the standard divide-and-conquer procedure; however, it uses the best-first order for expanding, instead of the depth-first as in C4.5 and Classifi- cation And Regression Tree (CART). The maximum impurity reduction measured by Information gain or Gini index is the criteria for determining which node is the best, among the available nodes, for splitting. Using the groundwater data D = (IF , CL), where IF is n input groundwater factors and n n CL is the output class. First, a root node is created, and then, the best groundwater factor, IF , that has the maximum impurity reduction is searched to split the dataset, and sub- nodes are generated. The next step of the BFTree algorithm is to find thebest nodetobe INTERNATIONAL JOURNAL OF DIGITAL EARTH 1415 split and expanded. This procedure is repeated until no node could be split anymore and samplesbelongtoCL. 3.2. Homogeneous ensemble algorithms 3.2.1. AdaBoost AdaBoost (AB), which is proposed by Freund and Schapire (1997), is one of the most robust ensem- ble algorithms in machine learning. The working procedure of this algorithm is summarized as fol- lows: first, a sub-dataset is generated from the groundwater dataset, and then, a groundwater model is constructed using the BFTree algorithm. In this step, the weight of the samples in the dataset is equally assigned. Subsequently, the model is applied to run on the whole groundwater dataset to determine misclassified samples. Then, these samples are assigned higher weights. Next, the weights of all samples in the groundwater dataset are normalized. Finally, a new-sub dataset is randomly gen- erated to construct a next groundwater model. This procedure is continued until a stopping criterion is reached (Tien Bui, Ho et al. 2016). The final model is derived by a weighted sum of all the ground- water models. 3.2.2. Bagging Bagging (Bag), which was proposed by Breiman (1996), is considered as one of the most successful ensemble algorithms. This algorithm can improve the prediction performance of classifiers in var- ious real-world problems (Erdal and Karakurt 2013; Tien Bui, Ho et al. 2016; Alobaidi, Chebana, and Meguid 2018). The practical manner of the bagging algorithm is as follows: first, sub-datasets are built from the groundwater dataset using the bootstrap sampling technique. Then, each sub-data- set is used to generate a groundwater model using the BFTree algorithm. Finally, all the groundwater models are aggregated to obtain the final model. 3.2.3. MultiBoosting Proposed by Webb (2000), MultiBoosting (MB) is a robust ensemble algorithm, which has capable of reducing variance and bias. The working principle of this algorithm is as follows: first, using the groundwater dataset, sub-datasets are derived using the bootstrap sampling technique, and then, the BFTree algorithm is used to construct groundwater models. Subsequently, misclassified samples are reset weights, and new sub-datasets are sampled to build new groundwater models. Finally, the final groundwater model is obtained. 3.3 . Benchmark model of random forest Random forest (RF) introduced by Breiman (2001)is anefficient ensemble algorithm that includes various decision trees. RF is considered one of the most successful classification algorithms, which has widely used in geosciences (Carranza and Laborte 2015; Khatami, Mountrakis, and Stehman 2016; Kuhn, Cracknell, and Reading 2019). Regarding the groundwater potential modeling, as con- clusions in Rahmati et al. (2016) and Golkarian et al. (2018), RF is the best in determining potential groundwater areas; therefore, it is selected as a benchmark. In this algorithm, the bootstrap sampling technique is employed to derive sub-datasets from the groundwater dataset. Subsequently, each of the sub-datasets will be employed to build a groundwater sub-model using the CART algorithm (Breiman 2017). Finally, the final groundwater model is obtained by aggregating all the sub-models above. It should be emphasized that the behavior of the RF model is controlled by its turning parameters such as depth of the tree (d), the number of the sub-datasets used (n), and the number of the groundwater-influencing factor employed (m); therefore, they should be carefully selected. 1416 M. AVAND ET AL. 4. Proposed approach based on best-first tree and homogeneous ensemble for identification of potential groundwater areas This section provides the methodological chart proposed in this study. It should be noted that the preparation of the groundwater data was carried out in ArcGIS 10.6, whereas all the proposed models BFTree-Bag, BFTree-AB, and BFTree-MB were programmed by us using Python-based Weka API wrapper (Reutemann 2019)(Figure 3). 4.1. The Yasuj-Dena database First, a groundwater database for the Yasuj-Dena was constructed which consists of 362 springs locations and 12 groundwater influencing factors above. In this regard, the database with the file geodatabase model of the Esri ArcCatalog was employed due to the ability to optimize its per- formance (Childs 2009). Subsequently, all the factors were coded and rescaled into the range of 0.01 and 0.99. Among the 362 springs locations, 70% or 253 locations were randomly selected and used to training the groundwater models and the rest (109 locations) were employed to check and confirm the model accuracy as suggested in Lee, Kim, and Oh (2012). Because the groundwater modeling in this research employed an approach of a binary recognition; therefore, the same amount of non-springs locations was randomly generated for the study area. Finally, an extraction process in ArcGIS was conducted to derive values of 12 influencing factors for these locations. As a result, the training dataset and the validation dataset consists of 506 and 218 samples, respectively. 4.2. Multi-collinearity checking of groundwater influencing factors As mentioned above, a total of 12 influencing factors were initial selected for this study area, how- ever, for the modeling, these factors should be checked their multicollinearities to ensure that they will not cause noises to the groundwater models. Literature review shows that variance inflation Figure 3. Overall methodological flow chart adopted in this study. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1417 factors (VIF) and tolerance (TOL) (Mansfield and Helms 1982) are the most widely used indicators for the multicollinearity checking in geosciences, including groundwater modeling (Kavzoglu, Sahin, and Colkesen 2014; Tien Bui, Tuan et al. 2016; Khosravi, Sartaj et al. 2018; Arabameri et al. 2019; Lv, Xiao et al. 2019; Maity and Mandal 2019). Therefore, they were selected for this analysis. Thus, a factor with a VIF higher than 10 and a TOL less than 0.1 indicating a problem of multicollinearity existed (Dou et al. 2019). Besides, Pearson correlation was also used to detect collinearity of two fac- tors, and herein, a pair with a Pearson value larger than 0.7 indicates a collinearity problem (Liu, Zhang, and Balay 2018). 4.3. Feature selection with the permutation method To check the if the groundwater influencing factors have contributions to the model, in this work, the permutation method (Altmann et al. 2010) is considered because it is considered as an efficient tech- nique that works well in practice (Alaa et al. 2019). Herein, the importance of an influencing factor is measured by the increase in the prediction error of the model after we permuted it, which breaks the relationship between the factor and the true outcome. The permutation importance is an intuitive, model-agnostic method to estimate the feature importance for classifier and regression models. The importance degree of the factors is used to select them to the groundwater modeling in next step. 4.4. Configuring and training the groundwater models To determine the number of BFTree and its parameters used in the ensemble models, a trial-and-test was carried out by varying the number of BFTree versus MSE (mean squared error) of each ensemble model on the training dataset and the validation dataset. For this regard, a minimum one sample in each leaf node of the BFTree was used and the 5-fold cross-validation was employed to prevent the model from overfitting (Sharma and Juglan 2018). As a result, the BFTree-Bag with 160 trees is the best for the groundwater data at hand; whereas, 8 trees and 50 trees are the most preferable for the BFTree-AB model and the BFTree-MB model, respectively. Regarding the RF model, 100 trees were used (Breiman 2001), whereas the maximum depth is 20 and the number of factors used in each tree is 12 as default values. 4.5. Quality assessment of the groundwater potential model To assess the quality of the groundwater potential maps, sensitivity, specificity, classification accu- racy (CA), positive predictive value (PPV), and negative predictive value (NPV), the ROC curve, kappa were used (Khosravi, Panahi, and Tien Bui 2018; Rahmati et al. 2018; Pham et al. 2019; Rah- mati et al. 2019). Sensitivity expresses groundwater spring predicted values against all groundwater spring outputs. The specificity of the expression is non-groundwater spring predicted values con- cerning all non-groundwater spring outputs. CA represents the number of correct predictions against all predicted items. PPV and NPV are proportions of groundwater spring and non-ground- water spring results in the analysis that are true groundwater spring and non-groundwater spring negative results, respectively. The area under the ROC curve (AUC) summary the performance glob- ally of the groundwater potential model, whereas Kappa is used to check the reliability of the model, which is the agreement of the predicted groundwater spring outcome and the inventories. 5. Results 5.1. Multicollinearity diagnosis of the groundwater influencing factors The result of the multicollinearity diagnosis is shown in Table 2. It could be seen that the TOL value is greater than 0.1 and the VIF value was less than 10 for all variables; therefore, the no 1418 M. AVAND ET AL. multicollinearity problem exists between the influencing factors used. To confirm this, Pearson’s cor- relation was further used and the result is presented in Figure 4. The highest correlation (0.69) is for between LS and slope has the highest correlation; however, this correlation value is still less than 0.7, which is the threshold value of the collinearity problem (Liu, Zhang, and Balay 2018). Therefore, it is concluded that no correlation problem among the considered factors. Although the considered factors are satisfied in the above multicollinearity diagnosis analysis; however, the predictive contribution of these factors should be checked before going ahead to the modeling process (Martínez-Álvarez et al. 2013; Bui et al. 2019; Hoa et al. 2019). Therefore, Person correlation was also used to check the predictive degree of the influencing factors to the groundwater potential. Herein, the higher the Person value, the better of that factor is for the groundwater poten- tial model. The result is shown in Table 2. We observer that rainfall (0.205) and lithology (0.164) have the highest value, whereas, distance to fault has the lowest value of 0.012; therefore, all the fac- tors are included in the modeling process. Table 2. The multicollinearity analysis for the groundwater influencing factors. No. Influencing factor VIF TOL Person value 1 Elevation 2.025 0.494 0.108 2 Slope 2.517 0.397 0.012 3 Aspect 1.043 0.959 0.047 4 SPI 1.009 0.991 0.036 5 TPI 1.508 0.663 0.024 6 TWI 1.817 0.550 0.050 7 LS 2.168 0.461 0.015 8 Distance to river 1.235 0.810 0.026 9 Land use 1.060 0.943 0.025 10 Lithology 1.164 0.859 0.164 11 Distance to fault 1.109 0.902 0.012 12 Rainfall 1.650 0.606 0.205 Figure 4. Pearson correlation of the groundwater influencing factors. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1419 5.2. Training and validating the potential groundwater models The training result of the groundwater potential models is shown in Table 3 and Figure 5. It could be seen that the three ensemble models have perfect degree-of-fit with the training dataset. Classifi- cation accuracy (CA) is higher than 98%, Kappa is larger than 0.96, and AUC is higher than 0.99. The highest fit is the BFTree-Bag model (CA = 99.8, kappa = 0.996, AUC = 0.995) and the BFTree-MB model (CA = 99.8, kappa = 0.996, AUC = 0.995), followed by the BFTree-AB model (CA = 98.2, kappa = 0.964, AUC = 0.990). In contrast to these models, the single BFTree model (CA = 86.0, kappa = 0.719, AUC = 0.939) has a lower fit significantly. The other metrics (PPV, NPV, sensitivity, and specificity) are depicted in Table 3. The validating result of the groundwater potential models is shown in Table 4 and Figure 6.We see that the three ensemble models predict groundwater potential with good results. The highest pre- diction performance is the BFTree-Bag model (CA = 74.8, kappa = 0.495), followed by the BFTree- MB model (CA = 73.9, kappa = 0.477), the BFTree-AB model (CA = 71.1, kappa = 0.422). In con- trast, the single BFTree model (CA = 69.7, kappa = 0. 0.395) has a lower prediction performance sig- nificantly. The AUC that summaries the global predicting performance of these models is shown in Figure 6. It is observed that AUC is 0.810 for the BFTree-Bag model, indicating that the prediction Table 3. Performance of the six groundwater potential models using the training dataset. CA: Classification Accuracy. Metrics BFTree-Bag BFTree-AB BFTree-MB BFTree RF True positive 252 251 253 221 252 True negative 253 246 252 214 253 False positive 1 2 0 32 1 False negative 0 7 1 39 0 PPV (%) 99.6 99.2 100.0 87.4 99.6 NPV (%) 100.0 97.2 99.6 84.6 100.0 Sensitivity (%) 100.0 97.3 99.6 85.0 100.0 Specificity (%) 99.6 99.2 100.0 87.0 99.6 CA (%) 99.8 98.2 99.8 86.0 99.8 Kappa 0.996 0.964 0.996 0.719 0.996 Figure 5. ROC curve and AUC of the models using the training dataset. 1420 M. AVAND ET AL. capability is 81.0%, followed by the BFTree-MB model (78.5%), the BFTree-AB model (74.5%). The single BFTree model has a low prediction capability (71.3). The other prediction metrics, PPV, NPV, sensitivity, and specificity are shown in Table 4. 5.3. Benchmark model and relative importance of the influencing factors The performance of the proposed ensemble models is further compared to that derived by the benchmark of RF. It is seen that the RF model (CA = 99.8, kappa = 0.996, AUC = 0.995) has a perfect degree-of-fit with the training dataset also (Table 3 and Figure 5). However, the prediction perform- ance of the RF model (CA = 72.5, kappa = 0.450, AUC = 0.801) is lower than that of the BFTree-Bag model (Table 4 and Figure 6). Regarding the relative importance of the groundwater influencing factors, the result of the per- mutation feature importance technique is shown in Table 5. We see that rainfall is the most impor- tant factor (merit value = 0.458), followed by land use (0.395), lithology (0.308), aspect (0.261), SPI (0.198), elevation (0.158), TPI (0.150), distance to river (0.102), and slope (0.095). In contrast, dis- tance to fault (0.016), TWI (0.040), and LS (0.055) have the lowest importance. Table 4. Prediction performance of the models using the validation dataset. CA: Classification Accuracy. Metrics BFTree-Bag BFTree-AB BFTree-MB BFTree RF True positive 80 78 81 71 76 True negative 83 77 80 81 82 False positive 29 31 28 38 33 False negative 26 32 29 28 27 PPV (%) 73.4 71.6 74.3 65.1 69.7 NPV (%) 76.1 70.6 73.4 74.3 75.2 Sensitivity (%) 75.5 70.9 73.6 71.7 73.8 Specificity (%) 74.1 71.3 74.1 68.1 71.3 CA (%) 74.8 71.1 73.9 69.7 72.5 Kappa 0.495 0.422 0.477 0.395 0.450 Figure 6. ROC curve and AUC of the six models using the validation dataset. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1421 Table 5. The relative importance of the groundwater influencing factors using the Permutation feature importance technique. No. Influencing factor Merit value 1 Rainfall 0.458 2 Land use 0.395 3 Lithology 0.308 4 Aspect 0.261 5 SPI 0.198 6 Elevation 0.158 7 TPI 0.150 8 Distance to river 0.102 9 Slope 0.095 10 LS 0.055 11 TWI 0.040 12 Distance to fault 0.016 5.4. Generating the groundwater potential maps Since the BFTree-Bag model is capable to provide the best prediction of groundwater potential, the model is then used to estimate the groundwater potential index for each of all the pixels in the Yasuj- Dena area. The result is then converted to a raster map to open in ArcGIS, and the result is shown in Figure 7. For the purpose of comparison, the other three groundwater potential maps produced by the BFTree-AB model (Figure 7b) the BFTree-MB model (Figure 7c), and the RF model (Figure 7d) were also generated. Visual interpretation of these potential maps indicating that the groundwater potential is in good agreement with the inventory data at hand. 6. Discussion Development of sustainability strategies for groundwater management is of major concern in many areas, particularly in countries locate at arid and semi-arid regions with limited surface water; there- fore, systematic efforts have been made worldwide (Gleeson et al. 2010; Vadiati, Adamowski, and Beynaghi 2018), and among them, producing groundwater potential maps with high accuracy is important and is still a critical issue. This research proposes and affirms a new machine learning ensemble approach based on BFTree, Bagging, AdaBoost, and MultiBoost with the aim is to enhance the quality of the identification of groundwater potential. The Yasuj-Dena area (Iran) is selected as a case study. Overall, all three proposed ensemble models, the BFTree-Bag, the BFTree-AB, and the BFTree- MB have proven its efficiency in predicting groundwater potential with the best one is the first model, followed by the last and the second. The advantage of the BFTree-Bag is the ability to reduce the variance of the groundwater samples through the use of the bootstrap sampling with replications technique. Thus, additional data for the training process were generated from the training dataset of groundwater. As a result, 160 individual trees, which were built from the boot- strap subsets, have provided a good diversity for the BFTree-Bag model. Consequently, high degree-of-fit and prediction performance are derived. For the BFTree-AB, the main advantage of this model is its adaptivity, which is the ability to adjust the weights of misclassified samples in the groundwater dataset. Consequently, the performance of the BFTree-AB model is improved compared to that of the BFTree; however, the BFTree-AB does not generate additional data from the groundwater samples, and as a result, the BFTree-AB model with 8 BFTrees was generated that limits the capability of reducing the variance of the groundwater samples used to compare to the BFTree-Bag. Regarding the BFTree-MB, this model is a balance of the BFTree-Bag and the BFTree- AB. Thus, both the bootstrap sampling with replications and the boosting techniques were used to derive subsets, which were used to generate 50 BFTrees; therefore, the prediction performance of the BFTree-MB is better compared to the BFTree-AB. However, the BFTree-MB performed lower 1422 M. AVAND ET AL. Figure 7. Groundwater potential map using: (a) the BFTree-Bag model, (b) the BFTree-AB model, (c) the BFTree-MB model, and (d) the RF model. than the BFTree-Bag model. This is because the groundwater data have some noises due to collect- ing the data from various sources with different spatial scales. As indicated in Kotsiantis (2011), MultiBoosting related models can be considered stronger than Bagging related models when the data used are noise-free. The valid of the proposed models is confirmed by comparing to the benchmark model of RF, which is a non-parametric and robust method and is suitable in the presence of concentration, noise and excessive (Rahman et al. 2020). For groundwater modeling, RF has been proven its par- ticularly suitable due to the ability to tackle complex nonlinear relationships between the affecting factors groundwater and the potential of groundwater. It can also automatically take into account the interactions between the affecting factors groundwater (Naghibi et al. 2018; Rahmati et al. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1423 2016). The result in this research shows that the BFTree-Bag model is slightly better than RF in pre- dicting groundwater potential indicating that BFTree-Bag model is a powerful tool. Investigating factors that influence groundwater potential modeling is an important issue because this helps us to develop a measure to protect and manage groundwater resources. In this research, rainfall, land use, and lithology are the most critical factors. This is a reasonable result because rain- fall affects the recharge and reinforcement of water resources (Zhang et al. 2019). Besides, it is con- sidered as one of the most effective factors in the dissolution of carbonate rocks and plays a significant role in the dissolution of lime and the formation of karst drips. In the Yasuj-Dena area, due to a large amount of rainfall in mountainous regions combined with the presence of seam and gap in the rocks, rainfall causes the penetration of water and the dissolution of limestone, resulting in an increase in groundwater and the formation and reinforcement of springs and ground- water potential in the study area. For land use, this is an important factor because the high volume of groundwater is mainly dis- tributed in forest and pasture areas in this study area. Thus, the presence of vegetation increases the penetration and strengthening of groundwater. Besides, in areas with higher altitudes, the moisture from the fog and rain as well as the production of carbon monoxide from the vegetation have played a vital role in the dissolution of limestone. Herein, various plant species influence the lime dissol- ution and through the penetration of rocks. Besides, the formation of seam and gaps have facilitated the penetration of water into the rock (Khazaei, Padyab, and Feyznia 2013). Thus, it can increase the water solubility by producing carbonic acid, causing the lime to dissolve and infiltrate more water on the ground and reinforcement groundwater. Regarding the importance of lithology, field works checking in areas with high and very high- water abstraction class showed that the major part of these areas has anthropomorphism of thick mass and dolomite lime including Asmari limestone formation, Asmari-Jahrom limestone for- mation, and Quaternary alluvial deposits. The mountainous part and the nearby areas of the Dena peak have limestone and dolomitic lithology and the plain area is more than quaternary sedi- ments (Farzin and Menbari 2018). Herein, Asmari Formation consists of colored and massive lime- stone stones with many porosity and fracture, and the presence of this porosity and fractures has led to the formation of superficial karst shapes such as Karen, Rill Karen, Ronel Karen, which play an important role in the dispersal of springs and their discharge (Barzegar et al. 2018). Thus, the pres- ence of high springs in these areas, originating from the Asmari Karst formation, confirms the high groundwater potential. Also, in the part of the plain with quaternary sediments, most of these depos- its consist of sediments and debris, composed of coarse aggregates and coarse alluvial deposits of the river, causing these substances and sediments to cause increasing the infiltration of water and recharge the groundwater supply in these areas. Chen et al. (2018) in a study designed to model groundwater potential, also showed that lithology and elevation variables are important in determin- ing the groundwater potential. 7. Conclusion In recent decades, with increasing population, the need for safe and drinkable water, including groundwater, is an uptrend, but this resource is limited, which require a better tool for managing this vital resource. This research proposed and verified a new machine learning approach for accu- rately mapping groundwater potential. Based on the finding in this research, some conclusions are as follows: Groundwater potential mapping with excellent prediction accuracy is still difficult but machine learning ensembles are good tools that should be used to enhance the quality of the groundwater potential map. BFTree-Bag with the prediction capability slightly better than RF is a new tool could be con- sidered for groundwater potential mapping in other regions. 1424 M. AVAND ET AL. . Rainfall, land use, and lithology are key factors for groundwater potential mapping The groundwater potential map in this study is a useful tool, which can be used as baseline infor- mation for local authorities and planners in developing strategies for sustainable management of groundwater. Disclosure statement No potential conflict of interest was reported by the author(s). References Ågren, A., W. Lidberg, M. Strömgren, J. Ogilvie, and P. Arp. 2014. “Evaluating Digital Terrain Indices for Soil Wetness Mapping–a Swedish Case Study.” Hydrology and Earth System Sciences 18 (9): 3623–3634. Alaa, A. M., T. Bolton, E. Di Angelantonio, J. H. Rudd, and M. Van Der Schaar. 2019. “Cardiovascular Disease Risk Prediction Using Automated Machine Learning: A Prospective Study of 423,604 uk Biobank Participants.” PloS one 14 (5): e0213653. Alfarrah, N., and K. Walraevens. 2018. “Groundwater Overexploitation and Seawater Intrusion in Coastal Areas of Arid and Semi-Arid Regions.” Water 10 (2): 143. Alobaidi, M. H., F. Chebana, and M. A. Meguid. 2018. “Robust Ensemble Learning Framework for day-Ahead Forecasting of Household Based Energy Consumption.” Applied Energy 212: 997–1012. Altmann, A., L. Toloşi, O. Sander, and T. Lengauer. 2010. “Permutation Importance: A Corrected Feature Importance Measure.” Bioinformatics (oxford, England) 26 (10): 1340–1347. Amaya, A. G., J. Ortiz, A. Durán, and M. Villazon. 2018. “Hydrogeophysical Methods and Hydrogeological Models: Basis for Groundwater Sustainable Management in Valle Alto (Bolivia).” Sustainable Water Resources Management 5: 1–10. Amiri, M., H. R. Pourghasemi, G. A. Ghanbarian, and S. F. Afzali. 2019. “Assessment of the Importance of Gully Erosion Effective Factors using Boruta Algorithm and its Spatial Modeling and Mapping using Three Machine Learning Algorithms.” Geoderma 340: 55–69. doi:10.1016/j.geoderma.2018.12.042. Arabameri, A., K. Rezaei, A. Cerda, L. Lombardo, and J. Rodrigo-Comino. 2019. “Gis-based Groundwater Potential Mapping in Shahroud Plain, Iran. A Comparison among Statistical (Bivariate and Multivariate), Data Mining and Mcdm Approaches.” Science of the Total Environment 658: 160–177. Arulbalaji, P., D. Padmalal, and K. Sreelash. 2019. “Gis and ahp Techniques Based Delineation of Groundwater Potential Zones: A Case Study From Southern Western Ghats, India.” Scientific Reports 9 (1): 2082. Bachmanov, D. M., V. G. Trifonov, K. T. Hessami, A. I. Kozhurin, T. P. Ivanova, E. A. Rogozhin, M. C. Hademi, and F. H. Jamali. 2004. “Active Faults in the Zagros and Central Iran.” Tectonophysics 380 (3): 221–241. http://www. sciencedirect.com/science/article/pii/S0040195103005080. Barzegar, R., A. A. Moghaddam, J. Adamowski, and A. H. Nazemi. 2019. “Delimitation of Groundwater Zones Under Contamination Risk Using a Bagged Ensemble of Optimized Drastic Frameworks.” Environmental Science and Pollution Research 26 (8): 8325–8339. Barzegar, R., A. A. Moghaddam, R. Deo, E. Fijani, and E. Tziritis. 2018. “Mapping Groundwater Contamination Risk of Multiple Aquifers Using Multi-Model Ensemble of Machine Learning Algorithms.” Science of the Total Environment 621: 697–712. Beven, K., M. Kirkby, N. Schofield, and A. Tagg. 1984. “Testing a Physically-Based Flood Forecasting Model (Topmodel) for Three uk Catchments.” Journal of Hydrology 69 (1): 119–143. Bloomfield, J. P., B. P. Marchant, and A. A. Mckenzie. 2019. “Changes in Groundwater Drought Associated with Anthropogenic Warming.” Hydrology and Earth System Sciences 23 (3): 1393–1408. Bouwer, H. 2002. “Artificial Recharge of Groundwater: Hydrogeology and Engineering.” Hydrogeology Journal 10 (1): 121–142. Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24 (2): 123–140. Breiman, L. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. Breiman, L. 2017. Classification and Regression Trees. Boca Raton, FL: CRC Press LLC. Bui, N. T., A. Kawamura, H. Amaguchi, D. Du Bui, N. T. Truong, and K. Nakagawa. 2018. “Social Sustainability Assessment of Groundwater Resources: A Case Study of Hanoi, Vietnam.” Ecological Indicators 93: 1034–1042. Bui, D. T., P.-T. T. Ngo, T. D. Pham, A. Jaafari, N. Q. Minh, P. V. Hoa, and P. Samui. 2019. “A Novel Hybrid Approach Based on a Swarm Intelligence Optimized Extreme Learning Machine for Flash Flood Susceptibility Mapping.” CATENA 179: 184–196. http://www.sciencedirect.com/science/article/pii/S034181621930147X. Carranza, E. J. M., and A. G. Laborte. 2015. “Data-driven Predictive Mapping of Gold Prospectivity, Baguio District, Philippines: Application of Random Forests Algorithm.” Ore Geology Reviews 71: 777–787. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1425 Cavalcante Júnior, R. G., M. A. Vasconcelos Freitas, N. F. da Silva, and F. R. de Azevedo Filho. 2019. “Sustainable Groundwater Exploitation Aiming at the Reduction of Water Vulnerability in the Brazilian Semi-Arid Region.” Energies 12 (5): 904. Chen, W., H. Li, E. Hou, S. Wang, G. Wang, M. Panahi, T. Li, et al. 2018. “Gis-based Groundwater Potential Analysis Using Novel Ensemble Weights-of-Evidence with Logistic Regression and Functional Tree Models.” Science of The Total Environment 634: 853–867. http://www.sciencedirect.com/science/article/pii/S0048969718312130. Chen, W., H. Li, E. Hou, S. Wang, G. Wang, M. Panahi, T. Li, T. Peng, C. Guo, and C. Niu. 2018. “GIS-based Groundwater Potential Analysis using Novel Ensemble Weights-of-evidence with Logistic Regression and Functional Tree Models.” Science of The Total Environment 634: 853–867. doi:10.1016/j.scitotenv.2018.04.055. Chen, W., M. Panahi, K. Khosravi, H. R. Pourghasemi, F. Rezaie, and D. Parvinnezhad. 2019. “Spatial Prediction of Groundwater Potentiality Using Anfis Ensembled with Teaching-Learning-Based and Biogeography-Based Optimization.” Journal of Hydrology 572: 435–448. Chen, W., B. Pradhan, S. Li, H. Shahabi, H. M. Rizeei, E. Hou, and S. Wang. 2019. “Novel Hybrid Integration Approach of Bagging-Based Fisher’s Linear Discriminant Function for Groundwater Potential Analysis.” Natural Resources Research 28 (4): 1239–1258. Chen, W., P. Tsangaratos, I. Ilia, Z. Duan, and X. Chen. 2019. “Groundwater Spring Potential Mapping Using Population-Based Evolutionary Algorithms and Data Mining Methods.” Science of The Total Environment 684: 31–49. http://www.sciencedirect.com/science/article/pii/S0048969719323599. Chen, W., S. Zhang, R. Li, and H. Shahabi. 2018. “Performance Evaluation of the gis-Based Data Mining Techniques of Best-First Decision Tree, Random Forest, and Naïve Bayes Tree for Landslide Susceptibility Modeling.” Science of the Total Environment 644: 1006–1018. Childs, C. 2009. “The Top Nine Reasons to Use a File Geodatabase.” A Scalable and Speedy Choice for Single Users or Small Groups. ArcUser, Spring 2009: 12–15. Chilton, P. J., and S. Foster. 1995. “Hydrogeological Characterisation and Water-Supply Potential of Basement Aquifers in Tropical Africa.” Hydrogeology Journal 3 (1): 36–49. Connor, R. 2015. The United Nations World Water Development Report 2015: Water for a Sustainable World. Paris: UNESCO publishing. Corsini, A., F. Cervi, and F. Ronchetti. 2009. “Weight of Evidence and Artificial Neural Networks for Potential Groundwater Spring Mapping: an Application to the Mt. Modino Area (Northern Apennines, Italy).” Geomorphology 111 (1-2): 79–87. doi:10.1016/j.geomorph.2008.03.015. de Graaf, I. E. M., T. Gleeson, L. P. H. (Rens) van Beek, E. H. Sutanudjaja, and M. F. P. Bierkens. 2019. “Environmental Flow Limits to Global Groundwater Pumping.” Nature 574 (7776): 90–94. doi:10.1038/s41586-019-1594-4. Dou, J., A. P. Yunus, D. Tien Bui, A. Merghadi, M. Sahana, Z. Zhu, C.-W. Chen, K. Khosravi, Y. Yang, and B. T. Pham. 2019. “Assessment of Advanced Random Forest and Decision Tree Algorithms for Modeling Rainfall-Induced Landslide Susceptibility in the izu-Oshima Volcanic Island, Japan.” Science of the Total Environment 662: 332–346. Erdal, H. I., and O. Karakurt. 2013. “Advancing Monthly Streamflow Prediction Accuracy of Cart Models Using Ensemble Learning Paradigms.” Journal of Hydrology 477: 119–128. Falah, F., S. Ghorbani Nejad, O. Rahmati, M. Daneshfar, and H. Zeinivand. 2016. “Applicability of Generalized Additive Model in Groundwater Potential Modelling and Comparison its Performance by Bivariate Statistical Methods.” Geocarto International 32 (10): 1069–1089. doi:10.1080/10106049.2016.1188166. Fang, H.-T., B.-C. Jhong, Y.-C. Tan, K.-Y. Ke, and M.-H. Chuang. 2019. “A two-Stage Approach Integrating som-and Moga-svm-Based Algorithms to Forecast Spatial-Temporal Groundwater Level with Meteorological Factors.” Water Resources Management 33 (2): 797–818. Farzin, M., and S. Menbari. 2018. “Zoning of Karstic Aquifer Protection on Tange-Konara Yasuj Using Cop Method.” Journal of Range and Watershed Managment 71 (2): 439–455. https://jrwm.ut.ac.ir/article_67999_ a3b3b3180c1cb5c03724fa623ff3904a.pdf. Fenta, A. A., A. Kifle, T. Gebreyohannes, and G. Hailu. 2015. “Spatial Analysis of Groundwater Potential Using Remote Sensing and Gis-Based Multi-Criteria Evaluation in Raya Valley, Northern Ethiopia.” Hydrogeology Journal 23 (1): 195–206. Freund, Y., and R. E. Schapire. 1997. “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting.” Journal of Computer and System Sciences 55 (1): 119–139. http://www.sciencedirect.com/science/article/ pii/S002200009791504X. Friedman, J., T. Hastie, and R. Tibshirani. 2000. “Additive Logistic Regression: A Statistical View of Boosting (with Discussion and a Rejoinder by the Authors).” The Annals of Statistics 28 (2): 337–407. Gheith, H., and M. Sultan. 2002. “Construction of a Hydrologic Model for Estimating Wadi Runoff and Groundwater Recharge in the Eastern Desert, Egypt.” Journal of Hydrology 263 (1-4): 36–55. Ghorbani Nejad, S., F. Falah, M. Daneshfar, A. Haghizadeh, and O. Rahmati. 2016. “Delineation of Groundwater Potential Zones using Remote Sensing and GIS-based Data-Driven Models.” Geocarto International 1–21. doi:10.1080/10106049.2015.1132481. Gleeson, T., J. Vandersteen, M. A. Sophocleous, M. Taniguchi, W. M. Alley, D. M. Allen, and Y. Zhou. 2010. “Groundwater Sustainability Strategies.” Nature Geoscience 3 (6): 378–379. 1426 M. AVAND ET AL. Golkarian, A., S. A. Naghibi, B. Kalantar, and B. Pradhan. 2018. “Groundwater Potential Mapping Using c5. 0, Random Forest, and Multivariate Adaptive Regression Spline Models in gis.” Environmental Monitoring and Assessment 190 (3): 149. Gu, H., F. Ma, J. Guo, K. Li, and R. Lu. 2018. “Assessment of Water Sources and Mixing of Groundwater in a Coastal Mine: The Sanshandao Gold Mine, China.” Mine Water and the Environment 37 (2): 351–365. Gulden, L. E., E. Rosero, Z. L. Yang, M. Rodell, C. S. Jackson, G. Y. Niu, P. J. F. Yeh, and J. Famiglietti. 2007. “Improving Land-Surface Model Hydrology: Is an Explicit Aquifer Model Better Than a Deeper Soil Profile?” Geophysical Research Letters 34 (9): 1–5. Haijian, S. 2007. Best-first Decision Tree Learning. Hamilton: Thesis, Master of Science. The University of Waikato. Hasan, M., Y. Shang, G. Akhter, and W. Jin. 2018. “Geophysical Assessment of Groundwater Potential: A Case Study From Mian Channu Area, Pakistan.” Groundwater 56 (5): 783–796. Hoa, P. V., N. V. Giang, N. A. Binh, L. V. H. Hai, T.-D. Pham, M. Hasanlou, and D. Tien Bui. 2019. “Soil Salinity Mapping Using sar Sentinel-1 Data and Advanced Machine Learning Algorithms: A Case Study at ben tre Province of the Mekong River Delta (Vietnam).” Remote Sensing 11 (2): 128. http://www.mdpi.com/2072-4292/ 11/2/128. Jaxa. 2019. Alos global digital surface model. Http://www.Eorc.Jaxa.Jp/alos/en/aw3d30/ [online]. Japan Aerospace Exploration Agency. Jegadeeshwaran, R., and V. Sugumaran. 2013. “Comparative Study of Decision Tree Classifier and Best First Tree Classifier for Fault Diagnosis of Automobile Hydraulic Brake System Using Statistical Features.” Measurement 46 (9): 3247–3260. Kalantar, B., H. A. H. Al-Najjar, B. Pradhan, V. Saeidi, A. A. Halin, N. Ueda, and S. A. Naghibi. 2019. “Optimized Conditioning Factors Using Machine Learning Techniques for Groundwater Potential Mapping.” Water 11 (9): 1909. doi:10.3390/w11091909. Kambhammettu, B., P. Allena, and J. P. King. 2011. “Application and Evaluation of Universal Kriging for Optimal Contouring of Groundwater Levels.” Journal of Earth System Science 120 (3): 413–422. Kammoun, S., R. Trabelsi, V. Re, K. Zouari, and J. Henchiri. 2018. “Groundwater Quality Assessment in Semi-Arid Regions Using Integrated Approaches: The Case of Grombalia Aquifer (ne Tunisia).” Environmental Monitoring and Assessment 190 (2): 87. Kavzoglu, T., E. K. Sahin, and I. Colkesen. 2014. “Landslide Susceptibility Mapping Using gis-Based Multi-Criteria Decision Analysis, Support Vector Machines, and Logistic Regression.” Landslides 11 (3): 425–439. Khatami, R., G. Mountrakis, and S. V. Stehman. 2016. “A Meta-Analysis of Remote Sensing Research on Supervised Pixel-Based Land-Cover Image Classification Processes: General Guidelines for Practitioners and Future Research.” Remote Sensing of Environment 177: 89–100. Khazaei, M., M. Padyab, and S. Feyznia. 2013. “Investigating the Effect of Diapiras on Water and Soil Salinization: Case Study: Deipir, Shar Kakan River Basin, Yasouj.” Geography and Development 32: 15–28. Khosravi, K., M. Panahi, and D. Tien Bui. 2018. “Spatial Prediction of Groundwater Spring Potential Mapping Based on an Adaptive Neuro-Fuzzy Inference System and Metaheuristic Optimization.” Hydrology & Earth System Sciences 22 (9): 4771–4792. Khosravi, K., M. Sartaj, F. T.-C. Tsai, V. P. Singh, N. Kazakis, A. M. Melesse, I. Prakash, D. Tien Bui, and B. T. Pham. 2018. “A Comparison Study of Drastic Methods with Various Objective Methods for Groundwater Vulnerability Assessment.” Science of the Total Environment 642: 1032–1049. Kim, Y. J., and S.-Y. Hamm. 1999. “Assessment of the Potential for Groundwater Contamination Using the Drastic/ Egis Technique, Cheongju Area, South Korea.” Hydrogeology Journal 7 (2): 227–235. Klute, A., E. Scott, and F. Whisler. 1965. “Steady State Water Flow in a Saturated Inclined Soil Slab.” Water Resources Research 1 (2): 287–294. Development of Urban Water Services in Jakarta: The Kooy, M., C. T. Walter, and I. Prabaharyaka. 2018. “Inclusive Role of Groundwater.” Habitat International 73: 109–118. Kordestani, M. D., S. A. Naghibi, H. Hashemi, K. Ahmadi, B. Kalantar, and B. Pradhan. 2019. “Groundwater Potential Mapping Using a Novel Data-Mining Ensemble Model.” Hydrogeology Journal 27 (1): 211–224. Kotsiantis, S. 2011. “Combining Bagging, Boosting, Rotation Forest and Random Subspace Methods.” Artificial Intelligence Review 35 (3): 223–240. Kuhn, S., M. J. Cracknell, and A. M. Reading. 2019. “Lithological Mapping in the Central African Copper Belt Using Random Forests and Clustering: Strategies for Optimised Results.” Ore Geology Reviews 112: 103015. Kumar, V. 2006. “Kriging of Groundwater Levels–a Case Study.” Journal of Spatial Hydrology 6 (1): 81–94. Lee, S., Y.-S. Kim, and H.-J. Oh. 2012. “Application of a Weights-of-Evidence Method and gis to Regional Groundwater Productivity Potential Mapping.” Journal of Environmental Management 96 (1): 91–105. http:// www.sciencedirect.com/science/article/pii/S0301479711003471. Liu, C., Z. Zhang, and J. W. Balay. 2018. “Posterior Assessment of Reference Gages for Water Resources Management Using Instantaneous Flow Measurements.” Science of the Total Environment 634: 12–19. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1427 Lo, M. H., J. S. Famiglietti, J. T. Reager, M. Rodell, S. Swenson, and W. Y. Wu. 2016. “Grace-based Estimates of Global Groundwater Depletion.” In Terrestrial Water Cycle and Climate Change: Natural and Human-Induced Impacts, edited by Q. Tang and T. Oki, 137–146. Florida: American Geophysical Union and John Wiley and Sons, Inc. Lv, C., M. Ling, Z. Wu, X. Guo, and Q. Cao. 2019. “Quantitative Assessment of Ecological Compensation for Groundwater Overexploitation Based on Emergy Theory.” Environmental Geochemistry and Health 1–12. doi:10.1007/s10653-019-00248-z. Lv, X., W. Xiao, Y. Zhao, W. Zhang, S. Li, and H. Sun. 2019. “Drivers of Spatio-Temporal Ecological Vulnerability in an Arid, Coal Mining Region in Western China.” Ecological Indicators 106: 105475. Maity, D. K., and S. Mandal. 2019. “Identification of Groundwater Potential Zones of the Kumari River Basin, India: An rs & gis Based Semi-Quantitative Approach.” Environment, Development and Sustainability 21 (2): 1013–1034. Mansfield, E. R., and B. P. Helms. 1982. “Detecting Multicollinearity.” The American Statistician 36 (3a): 158–160. Martínez-Álvarez, F., J. Reyes, A. Morales-Esteban, and C. Rubio-Escudero. 2013. “Determining the Best set of Seismicity Indicators to Predict Earthquakes. Two Case Studies: Chile and the Iberian Peninsula.” Knowledge- Based Systems 50 (0): 198–210. http://www.sciencedirect.com/science/article/pii/S0950705113001871. Miraki, S., S. H. Zanganeh, K. Chapi, V. P. Singh, A. Shirzadi, H. Shahabi, and B. T. Pham. 2019. “Mapping Groundwater Potential Using a Novel Hybrid Intelligence Approach.” Water Resources Management 33 (1): 281–302. Moghaddam, D. D., O. Rahmati, M. Panahi, J. Tiefenbacher, H. Darabi, A. Haghizadeh, A. T. Haghighi, O. A. Nalivan, and D. Tien Bui. 2020. “The Effect of Sample Size on Different Machine Learning Models for Groundwater Potential Mapping in Mountain Bedrock Aquifers.” CATENA 187: 104421. doi:10.1016/j.catena.2019.104421. P., C. K. Singh, and S. Mukherjee. 2012. “Delineation of Groundwater Potential Zones in Arid Region of Mukherjee, India—a Remote Sensing and gis Approach.” Water Resources Management 26 (9): 2643–2672. Naghibi, S. A., K. Ahmadi, and A. Daneshi. 2017. “Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping.” Water Resources Management 31 (9): 2761–2775. Naghibi, S. A., M. Dolatkordestani, A. Rezaei, P. Amouzegari, M. T. Heravi, B. Kalantar, and B. Pradhan. 2019. “Application of Rotation Forest with Decision Trees as Base Classifier and a Novel Ensemble Model in Spatial Modeling of Groundwater Potential.” Environmental Monitoring and Assessment 191 (4): 248. Naghibi, S. A., and H. R. Pourghasemi. 2015. “A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping.” Water Resources Management 29 (14): 5217–5236. doi:10.1007/s11269-015-1114-8. Naghibi, S. A., H. R. Pourghasemi, and K. Abbaspour. 2018. “A Comparison Between Ten Advanced and Soft Computing Models for Groundwater Qanat Potential Assessment in Iran using R and GIS.” Theoretical and Applied Climatology 131 (3-4): 967–984. doi:10.1007/s00704-016-2022-4. Naghibi, S.A., H. R. Pourghasemi, and B. Dixon. 2016. “GIS-based Groundwater Potential Mapping using Boosted Regression Tree, Classification and Regression Tree, and Random Forest Machine Learning Models in Iran.” Environmental Monitoring and Assessment 188 (1). doi:10.1007/s10661-015-5049-6. Nsiah, E., E. K. Appiah-Adjei, and K. A. Adjei. 2018. “Hydrogeological Delineation of Groundwater Potential Zones in the Nabogo Basin, Ghana.” Journal of African Earth Sciences 143: 1–9. Oh, H.-J., Y.-S. Kim, J.-K. Choi, E. Park, and S. Lee. 2011. “Gis Mapping of Regional Probabilistic Groundwater Potential in the Area of Pohang City, Korea.” Journal of Hydrology 399 (3-4): 158–172. Okereke, C., E. Esu, and A. Edet. 1998. “Determination of Potential Groundwater Sites Using Geological and Geophysical Techniques in the Cross River State, Southeastern Nigeria.” Journal of African Earth Sciences 27 (1): 149–163. Ozdemir, A. 2011. “Using a Binary Logistic Regression Method and gis for Evaluating and Mapping the Groundwater Spring Potential in the Sultan Mountains (Aksehir, Turkey).” Journal of Hydrology 405 (1-2): 123–136. Panagopoulos, G., A. Antonakos, and N. Lambrakis. 2006. “Optimization of the Drastic Method for Groundwater Vulnerability Assessment via the use of Simple Statistical Methods and gis.” Hydrogeology Journal 14 (6): 894–911. Pham, B. T., A. Jaafari, I. Prakash, S. K. Singh, N. K. Quoc, and D. Tien Bui. 2019. “Hybrid Computational Intelligence Models for Groundwater Potential Mapping.” CATENA 182: 104101. http://www.sciencedirect.com/science/article/ pii/S0341816219302437. Pourghasemi, H., H. Moradi, S. F. Aghda, C. Gokceoglu, and B. Pradhan. 2014. “Gis-based Landslide Susceptibility Mapping with Probabilistic Likelihood Ratio and Spatial Multi-Criteria Evaluation Models (North of Tehran, Iran).” Arabian Journal of Geosciences 7 (5): 1857–1878. Rahman, M. M., J. Karunasinghe, S. Clifford, L. D. Knibbs, and L. Morawska. 2020. “New Insights into the Spatial Distribution of Particle Number Concentrations by Applying Non-Parametric Land Use Regression Modelling.” Science of The Total Environment 702: 134708. doi:10.1016/j.scitotenv.2019.134708. Rahmati, O., D. D. Moghaddam, V. Moosavi, Z. Kalantari, M. Samadi, S. Lee, and D. Tien Bui. 2019. “An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping.” Remote Sensing 11 (11): 1375. https://www.mdpi.com/2072-4292/11/11/1375. 1428 M. AVAND ET AL. Rahmati, O., S. A. Naghibi, H. Shahabi, D. T. Bui, B. Pradhan, A. Azareh, E. Rafiei-Sardooi, A. N. Samani, and A. M. Melesse. 2018. “Groundwater Spring Potential Modelling: Comprising the Capability and Robustness of Three Different Modeling Approaches.” Journal of Hydrology 565: 248–261. http://www.sciencedirect.com/science/ article/pii/S002216941830622X. Rahmati, O., A. Nazari Samani, M. Mahdavi, H. R. Pourghasemi, and H. Zeinivand. 2015. “Groundwater Potential Mapping at Kurdistan Region of Iran Using Analytic Hierarchy Process and gis.” Arabian Journal of Geosciences 8 (9): 7059–7071. doi:10.1007/s12517-014-1668-4. Rahmati, O., H. R. Pourghasemi, and A. M. Melesse. 2016. “Application of GIS-based Data Driven Random Forest and Maximum Entropy Models for Groundwater Potential Mapping: A Case Study at Mehran Region, Iran.” CATENA 137: 360–372. doi:10.1016/j.catena.2015.10.010. Razzaq, A., P. Qing, M. Abid, M. Anwar, and I. Javed. 2019. “Can the Informal Groundwater Markets Improve Water use Efficiency and Equity? Evidence From a Semi-Arid Region of Pakistan.” Science of The Total Environment 666: 849–857. Reutemann, P. 2019. Python weka wrapper 3 0.1.7. Https://pypi.Org/project/python-weka-wrapper3/. Rezaei, F., M. R. Ahmadzadeh, and H. R. Safavi. 2017. “Som-drastic: Using Self-Organizing map for Evaluating Groundwater Potential to Pollution.” Stochastic Environmental Research and Risk Assessment 31 (8): 1941–1956. Rizeei, H. M., B. Pradhan, M. A. Saharkhiz, and S. Lee. 2019. “Groundwater Aquifer Potential Modeling Using an Ensemble Multi-Adoptive Boosting Logistic Regression Technique.” Journal of Hydrology 579: 124172. http:// www.sciencedirect.com/science/article/pii/S0022169419309072. Rosenqvist, A., M. Shimada, N. Ito, and M. Watanabe. 2007. “Alos Palsar: A Pathfinder Mission for Global-Scale Monitoring of the Environment.” IEEE Transactions on Geoscience and Remote Sensing 45 (11): 3307–3316. Sameen, M. I., B. Pradhan, and S. Lee. 2019. “Self-learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas.” Natural Resources Research 28 (3): 757–775. Scanlon, B. R., R. C. Reedy, D. A. Stonestrom, D. E. Prudic, and K. F. Dennehy. 2005. “Impact of Land use and Land Cover Change on Groundwater Recharge and Quality in the Southwestern us.” Global Change Biology 11 (10): 1577–1593. Sepehr, M., and J. Cosgrove. 2005. “Role of the Kazerun Fault Zone in the Formation and Deformation of the Zagros Fold-Thrust Belt, Iran.” Tectonics 24 (5): 1–13. doi:10.1029/2004TC001725. Sharma, V., and K. Juglan. 2018. “Automated Classification of Fatty and Normal Liver Ultrasound Images Based on Mutual Information Feature Selection.” IRBM 39 (5): 313–323. Solomon, S., and F. Quiel. 2006. “Groundwater Study Using Remote Sensing and Geographic Information Systems (gis) in the Central Highlands of Eritrea.” Hydrogeology Journal 14 (6): 1029–1041. Suryanarayana, C., and V. Mahammood. 2019. “Groundwater-level Assessment and Prediction Using Realistic Pumping and Recharge Rates for Semi-Arid Coastal Regions: A Case Study of Visakhapatnam City, India.” Hydrogeology Journal 27 (1): 249–272. Tahmassebipoor, N., O. Rahmati, F. Noormohamadi, and S. Lee. 2016. “Spatial Analysis of Groundwater Potential using Weights-of-evidence and Evidential Belief Function Models and Remote Sensing.” Arabian Journal of Geosciences 9 (1). .doi:10.1007/s12517-015-2166-z. Tien Bui, D., T.-C. Ho, B. Pradhan, B.-T. Pham, V.-H. Nhu, and I. Revhaug. 2016. “Gis-based Modeling of Rainfall- Induced Landslides Using Data Mining-Based Functional Trees Classifier with Adaboost, Bagging, and Multiboost Ensemble Frameworks.” Environmental Earth Sciences 75 (14): 1101. doi:10.1007/s12665-016-5919-4. Tien Bui, D., T. A. Tuan, H. Klempe, B. Pradhan, and I. Revhaug. 2016. “Spatial Prediction Models for Shallow Landslide Hazards: A Comparative Assessment of the Efficacy of Support Vector Machines, Artificial Neural Networks, Kernel Logistic Regression, and Logistic Model Tree.” Landslides 13: 361–378. Tobler, W. 2004. “On the First law of Geography: A Reply.” Annals of the Association of American Geographers 94 (2): 304–310. Usgs. 2016. The united states geological survey earth resources observation and science center, http://earthexplorer.Usgs. Gov [online]. United States Geological Survey. Vadiati, M., J. Adamowski, and A. Beynaghi. 2018. “A Brief Overview of Trends in Groundwater Research: Progress Towards Sustainability?” Journal of Environmental Management 223: 849–851. http://www.sciencedirect.com/ science/article/pii/S0301479718307382. Varouchakis, E., D. Hristopulos, and G. Karatzas. 2012. “Improving Kriging of Groundwater Level Data Using Nonlinear Normalizing Transformations—a Field Application.” Hydrological Sciences Journal 57 (7): 1404–1419. Webb, G. I. 2000. “Multiboosting: A Technique for Combining Boosting and Wagging.” Machine Learning 40 (2): 159–196. Worthington, P. F. 1977. “Geophysical Investigations of Groundwater Resources in the Kalahari Basin.” Geophysics 42 (4): 838–849. Yu, H.-L., and Y.-C. Lin. 2015. “Analysis of Space–Time non-Stationary Patterns of Rainfall–Groundwater Interactions by Integrating Empirical Orthogonal Function and Cross Wavelet Transform Methods.” Journal of Hydrology 525: 585–597. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1429 Zektser, I., and H. A. Loaiciga. 1993. “Groundwater Fluxes in the Global Hydrologic Cycle: Past, Present and Future.” Journal of Hydrology 144 (1-4): 405–427. Zhang, Q., and L. Li. 2009. “Development and Application of an Integrated Surface Runoff and Groundwater Flow Model for a Catchment of Lake Taihu Watershed, China.” Quaternary International 208 (1-2): 102–108. Zhang, B., H.-X. Wang, Y.-W. Ye, J.-L. Tao, L.-Z. Zhang, and L. Shi. 2019. “Potential Hazards to a Tunnel Caused by Adjacent Reservoir Impoundment.” Bulletin of Engineering Geology and the Environment 78 (1): 397–415. Zhu, A. X., G. Lu, J. Liu, C. Z. Qin, and C. Zhou. 2018. “Spatial Prediction Based on Third law of Geography.” Annals of GIS 24 (4): 225–240. Ziolkowska, J., and R. Reyes. 2017. “Groundwater Level Changes due to Extreme Weather—an Evaluation Tool for Sustainable Water Management.” Water 9 (2): 117. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Digital Earth Taylor & Francis

A tree-based intelligence ensemble approach for spatial prediction of potential groundwater

Loading next page...
 
/lp/taylor-francis/a-tree-based-intelligence-ensemble-approach-for-spatial-prediction-of-khMBrlKjRc

References (111)

Publisher
Taylor & Francis
Copyright
© 2020 Informa UK Limited, trading as Taylor & Francis Group
ISSN
1753-8955
eISSN
1753-8947
DOI
10.1080/17538947.2020.1718785
Publisher site
See Article on Publisher Site

Abstract

INTERNATIONAL JOURNAL OF DIGITAL EARTH 2020, VOL. 13, NO. 12, 1408–1429 https://doi.org/10.1080/17538947.2020.1718785 A tree-based intelligence ensemble approach for spatial prediction of potential groundwater a a b,c d Mohammadtaghi Avand , Saeid Janizadeh , Dieu Tien Bui , Viet Hoa Pham , e f Phuong Thao T. Ngo and Viet-Ha Nhu a b Faculty of Natural Resources and Marine Sciences, Tarbiat Modares University, Tehran, Iran; Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Vietnam; Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam; Ho Chi Minh City Institute of Resources Geography, Vietnam Academy of Science and Technology, Ho Chi Minh City, Vietnam; Institute of Research and Development, Duy Tan University, Da Nang, Vietnam; Department of Geological-Geotechnical Engineering, Hanoi University of Mining and Geology, Hanoi, Vietnam ABSTRACT ARTICLE HISTORY Received 9 September 2019 The objective of this research is to propose and confirm a new machine Accepted 16 January 2020 learning approach of Best-First tree (BFtree), AdaBoost (AB), MultiBoosting (MB), and Bagging (Bag) ensembles for potential KEYWORDS groundwater mapping and assessing role of influencing factors. The Environmental modeling; Yasuj-Dena area (Iran) is selected as a case study. For this regard, a groundwater potential; GIS; Yasuj-Dena database was established with 362 springs locations and 12 ensemble model; decision groundwater-influencing factors (slope, aspect, elevation, stream power tree index (SPI), length of slope (LS), topographic wetness index (TWI), topographic position index (TPI), land use, lithology, distance from fault, distance from river, and rainfall). The database was employed to train and validate the proposed groundwater models. The area under the curve (AUC) and statistical metrics were employed to check and confirm the quality of the models. The result shows that the BFTree-Bag model (AUC = 0.810, kappa = 0.495) has the highest prediction performance, followed by the BFTree-MB model (AUC = 0.785, kappa = 0.477), and the BFTree-MB model (AUC = 0.745, kappa = 0.422). Compared to the benchmark of Random Forests, the BFTree-Bag model performs better; therefore, we conclude that the BFtree-Bag is a new tool should be used for modeling of groundwater potential. 1. Introduction Water below the surface, which accounts for nearly 30% of the freshwater worldwide (Lo et al. 2016), has a particularly important role to human consumption, socio-economic development, and ecologi- cal processes (Bui et al. 2018; Kooy, Walter, and Prabaharyaka 2018; Lv, Ling et al. 2019). However, due to population growth and industrial development (de Graaf et al. 2019; Zhang et al. 2019), groundwater withdrawals are much higher than their natural rates. Thus, over-exploitations of groundwater have reached an alarming rate at many countries, in particular, at arid and semi- arid countries, where the surface water is limited (Alfarrah and Walraevens 2018; Kammoun et al. 2018; Cavalcante Júnior et al. 2019; Razzaq et al. 2019; Suryanarayana and Mahammood 2019); therefore, accurately determination of groundwater potential is considered as a critical issue of the groundwater sustainable strategies to protect and manage this vital resource. This has clearly stated by the United Nations in the world water development report (Connor 2015). CONTACT Dieu Tien Bui [email protected] Ton DucThang University, 119 Nguyen Huu Tho street, Tan Phong ward, District 7, Ho Chi Minh City, Vietnam © 2020 Informa UK Limited, trading as Taylor & Francis Group INTERNATIONAL JOURNAL OF DIGITAL EARTH 1409 A literature review shows that accurate identification of groundwater potential is still difficult. Although surface water sources penetrate the earth’s surface through penetrations and fractures of the earth, the availability of groundwater also depends on the type and physical properties of rocks, including porosity, permeability, portability, and storage capacity. Besides, other factors, i.e. elevation, lithology, slope, aspect, land use, river network density, faults, and soil play important roles (Naghibi et al. 2016; Rahmati et al. 2018; Moghaddam et al. 2020). Moreover, the occurrence of intermittent and prolonged droughts and high weather fluctuations are other factors affecting the determination of potential groundwater areas (Ziolkowska and Reyes 2017; Bloomfield, Marchant, and Mckenzie 2019). These make the fact that the identification of groundwater potential areas with high accuracy is complex and requires much time as well as labor costs. To identify groundwater potential areas, many methods and techniques have been proposed, and among them, geophysical techniques (Worthington 1977; Okereke, Esu, and Edet 1998; Hasan et al. 2018), and hydrogeological methods (Panagopoulos, Antonakos, and Lambrakis 2006; Amaya et al. 2018; Gu et al. 2018; Nsiah, Appiah-Adjei, and Adjei 2018), and geology methods (Chilton and Foster 1995; Kim and Hamm 1999; Gheith and Sultan 2002; Gulden et al. 2007) are the most widely used. However, extensive field surveys with drilling are required, which are time-consuming and costly. For large areas, geostatistics and Geographic information system (GIS) have been considered, including (i) analytical techniques, i.e. Inverse distance weighted and ordinal Kriging methods (Kumar 2006), which is also called the first law of geography for spatial prediction (Tobler 2004); (ii) spatial heterogeneity related techniques, i.e. universal Kriging (Kambhammettu, Allena, and King 2011) and Box–Cox Kriging (Varouchakis, Hristopulos, and Karatzas 2012) and they have been named as the second law of geography for spatial prediction; and (ii) techniques based on simi- larities in geographic environment, which refer to the third law of geography (Zhu et al. 2018), i.e. self- organizing maps (Rezaei, Ahmadzadeh, and Safavi 2017; Fang et al. 2019). Overall, both the analytical techniques and spatial heterogeneity related techniques require sufficient samples in order to obtain reliable results, whereas the last one needs more extensive studies to have reasonable conclusions. Thus, literature review shows that statistical techniques are the most popular used, i.e. frequency ratio (Ozdemir 2011), logistic regression (LR) (Ozdemir 2011; Chen et al. 2018), maximum entropy (Rahmati et al. 2016), evidential belief function (EBF) (Amiri et al. 2019; Chen, Pradhan et al. 2019; Tahmassebipoor et al. 2016), advanced decision trees (Naghibi and Pourghasemi 2015), generalized additive model (Falah et al. 2016), weight of evidence (WoE) (Ghorbani Nejad et al. 2016). The criti- cal issue of using these techniques is the ability to characterizing the relationship of the groundwater potential and its geo-environmental variables. Nevertheless, the accuracy of the potential ground- water models is not always satisfied. Recently, innovations of Geographic information system (GIS) and machine learning has pro- vided new and powerful tools for groundwater potential modeling. GIS provides a geospatial plat- form for handling multiple groundwater-related factors, whereas, machine learning is capable of exploring nonlinear relationships and mining hidden patterns of groundwater data employed (Bar- zegar et al. 2018). Consequently, various artificial intelligent approaches have been successfully pro- posed, i.e. neural networks (Corsini, Cervi, and Ronchetti 2009), CART (Naghibi and Pourghasemi 2015), regression tree with booting (BRT) (Golkarian et al. 2018; Naghibi et al. 2016), logistic model trees (Rahmati et al. 2018), C5.0 (Golkarian et al. 2018), multivariate adaptive regression spline (Ara- bameri et al. 2019), random forest (Rahmati et al. 2019), and support vector machines (Chen, Tsan- garatos et al. 2019). Overall, the prediction capability of the groundwater potential maps has improved significantly, but no single method is the best for all areas. In more recent years, hybrid approaches that combine two or more methods and techniques have been considered for potential groundwater modeling, i.e. genetic-based random forest (Naghibi, Ahmadi, and Daneshi 2017), ensemble of LR and WoE (Chen, Li et al. 2018), EBF-BRT (Kordestani et al. 2019), hybridization of Fisher function and rotation forest (Chen, Pradhan et al. 2019), bag- ging-DRASTIC (Barzegar et al. 2019), and Bagging-Decision Stump (Pham et al. 2019), logistic regression-based multi-adaptive boosting ensemble (Rizeei et al. 2019), Self-Learning Framework 1410 M. AVAND ET AL. based random forest (Sameen, Pradhan, and Lee 2019), metaheuristic based neural fuzzy (Chen, Panahi et al. 2019), tree-based rotation forest ensembles (Naghibi et al. 2019). The prominent result is that the performance of the groundwater potential models has been enhanced significantly; there- fore, more exploration of new ensemble approaches for groundwater potential modeling should be carried out. The aim of this research is, therefore, to expand the body of groundwater modeling by proposing and affirming a new machine learning approach, which is based on ensembles of the Best-First tree (BFtree), AdaBoost, MultiBoost, and Bagging for the mapping of groundwater potential. The BFtree is a relatively new tree intelligence algorithm that has proven efficient for classification purposes (Jegadeeshwaran and Sugumaran 2013; Chen, Zhang et al. 2018), whereas AdaBoost, MultiBoost, and Bagging are powerful machine learning ensembles. Thus, to the best of our knowledge, explora- tion of these methods has not been carried out for groundwater modeling. The Yasuj-Dena area (Iran) is selected as a case study. This is a typical mountainous highland area in Iran, where ground- water plays a vital role; however, no study on groundwater potential has been carried out. Finally, the result was compared to a benchmark of Random Forests and conclusions were given. 2. Study area and data used 2.1. Geographical summary of the study area ′ ′′ The Yasuj-Dena area belongs to the middle-west region of Iran, between longitudes 51°4 30 and ′ ′′ ′ ′′ ′ ′′ 51°55 5 , and between latitudes 31°6 32 and 31°16 4 , covering an area of 2159.9 km . Due to the geographic location, this area is a place for the arrival of the western and southern air masses. This is a hilly area with the elevation varies from 1346.1 m to 4407.1 m above the sea level. The high- est point is the Dena peak with an altitude of 4409.1 m, whereas the lowest point is the Lishtar plain with an altitude of 1346.1 m. Hydrologically, the upstream of the Yasuj-Dena area is the source of important rivers, providing essential water resources for people, who mainly occupy in lower plains with irrigated farming. Besides, springs have a massive share in downstream drinking water and play a vital role in the devel- opment of tourism and the creation of strong tourist attractions in the area. The rainfall in theseareas is concentrated mainly from November to May. The total yearly rainfall varies from 300 mm to 800 mm. It is noted that the large part of the highland in this area is covered by permanent glaciers. Geologically, the study area is dominated by calcareous formations (Asmari, Sarvak, and Bakh- teyari) and Quaternary sediments (Khazaei, Padyab, and Feyznia 2013). Dena is the main fault in this area stretching from northwest to southeast, which is a part of the High Zagros Fault system in Iran (Bachmanov et al. 2004; Sepehr and Cosgrove 2005). 2.2. Groundwater spring inventory To identify groundwater potential areas, groundwater springs inventories are essential information that should be collected. These inventories can be correlated to springs influencing factors to explore relationships between these factors and groundwater potential areas (Oh et al. 2011; Rahmati et al. 2018). In this research, a total of 362 spring locations for the Yasuj-Dena (Figure 1) were identified and mapped based on the documents provided by the Iranian Ministry of Water Resources in 2018. Our fieldwork with statistical analysis of these springs showed that the level of groundwater increases from March to April yearly, but it decreases from August to February. Discharge of these springs 3 −1 3 −1 varies from 1 m s in winter to 10 m s in the summer. 2.3. Groundwater influencing factor Determination of groundwater influencing factors is an important task which influences the result of potential groundwater maps (Rahmati et al. 2018; Miraki et al. 2019); therefore, they should be INTERNATIONAL JOURNAL OF DIGITAL EARTH 1411 Figure 1. Location of the Yasuj-Dena and the spring locations. carefully selected. In this research, a total of 12 influencing factors were considered: slope ( ), aspect, elevation (m), stream power index (SPI), length of slope (LS), topographic wetness index (TWI), topographic position index (TPI), land use, lithology, distance from fault (m), distance from river (m), and rainfall (mm). A 30 m resolution DEM (digital elevation model) for the Yasuj-Dena, which was generated from the ALOS sensor (Rosenqvist et al. 2007) and are available at the JAXA website (JAXA 2019). Using this DEM, seven factors were derived:slope, aspect,elevation,SPI,LS,TWI, and TPI. The slope should be selected for potential groundwater modeling because it influences water accumulation (Bouwer 2002), which relates to groundwater recharges. Aspect presents slope directions that influence the amount of rainfall, solar radiation, wind speed, and land cover (Solomon and Quiel 2006), which indirectly affect to amount of water infiltrating to the earth, and thus, influencing groundwater. The slope map and aspect map are presented in Figure 2(a and b), respectively. The elevation is considered because the altitude of topography controls the speed surface runoff direction at ground level; therefore, it influences water perme- abilities in the layers of the earth (Zhang and Li 2009). The elevation map in this research is shown in Figure 2d. SPI measures the destructive power of the water flow of the catchment (Chen, Li et al. 2018); therefore, it is considered as an influencing factor for potential groundwater modeling. In this research, SPI map (Figure 2d) was computed based on the following formula: SPI = AS∗ tan b (1) where AS is the watershed area and b is the local slope. 1412 M. AVAND ET AL. Figure 2. Groundwater influencing factors: (a) Slope, (b) Aspect, (c) Elevation, (d) SPI, (e) LS, (f) TWI, (g) TPI, (h) Land use, (i) Lithol- ogy, (j) Distance from fault; (k) Distance from river, and (l) Rainfall. Regarding LS, it should be considered for this analysis because LS influences rates of surface flow (Klute, Scott, and Whisler 1965), and as the length of the slope increases, more water will accumu- late, which relates to processes of water infiltrations on the earth. In this research, LS map (Figure 2e) was computed using the below equation: 0.6 1.3 AS sin a LS = (2) 22.13 0.0896 where AS is the watershed area and α is the upslope area. TWI is one of the important factors because it relates to soil moisture, saturation areas, and flow accumulation (Kalantar et al. 2019), which influence groundwater. In this analysis, the TWI map (Figure 2f) was estimated using the following equation (Beven et al. 1984): TWI = ln (3) tan b where a is the upslope area, whereas b is the local slope. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1413 Figure 2 Continued TPI describes characteristics of slope position. Thus, topographic ridges are related to the higher positive values, whereas valleys are associated with negative values, and flat areas are referred to as near-zero values (Ågren et al. 2014). These features are related to the accumulation and infiltration degree of water. In this research, TPI map (Figure 2g) is computed using Eq.4 below (Arulbalaji, Pad- malal, and Sreelash 2019): TPI = E − 1/n E (4) 0 n n−1 where E is the elevation of the considered grid, E is the elevation of the grid; n is the total number of 0 n the surrounding grids used. Regarding land use, this is one of the main factors controlling the process of underground water supply. This factor provides not only indicators for groundwater availability, but also indirect information on infiltrations and near-surface waters (Scanlon et al. 2005). Thus, the presence of vegetation may reduce the flow rate and cause more water to penetrate the soil; therefore, land useisselected forthis groundwaterpotentialmodeling.Inthisresearch, land usemap (Figure 2h) of the Yasuj-Dena was derived from Landsat 8 OLI imagery acquired on 15 June 2016 (USGS 2016) using the Maximum likelihood algorithm. Forthisregard, sixdifferent classes, forest, 1414 M. AVAND ET AL. Table 1. Type of geological formations in the study area. Code Name Main lithology 1 E Dolomite and buff dolomitic limestone 2 Jkk Massive limestone 3 K Massive limestone and limestone mixing with marl 4 Mgs Anhydrite, argillaceous limestone, and limestone 5 OMa Limestone and shale 6 Plb Alternating hard of consolidated, massive, feature forming conglomerate, and low-weathering cross-bedded sandstone 7 Q Piedmont fan and valley deposits 8 Tr Grey dolomite, greenish shale, and argillaceous limestone. 9 OE Undivided Asmari and Jahrom formations 10 SpH Rock salt, limestone, brown cherty dolomite, and red sandstone urban, orchard, agriculture, range, and bare land were used based on documents of the local authority. Lithology has been employed widely for predicting potential groundwater resources (Ozdemir 2011; Mukherjee, Singh, and Mukherjee 2012; Fenta et al. 2015). This is because lithology and its related properties such as texture, age, and degree of purity of rocks play an essential role in porosity, permeability, and concentration of groundwater flow inside the rocks. The lithology map (Figure 2i) of the Yasuj-Dena was prepared using the national lithology map at a scale of 1: 1,00,000 (Sepehr and Cosgrove 2005), in which 10 lithological units were used (Table 1). Distance from fault was con- sidered in this analysis because fractures and faults can pass water to the ground layers and prevent water from escaping. Thus, the fault is a crucial factor in identifying groundwater sources. In this research, the fault was extracted from the above national lithology map, and then, was used to derive distance from fault map (Figure 2j). River network, which relates to lithology, has an important role in the availability of groundwater. Thus, river flow has been found affecting recharge of groundwater aquifers, which leads to fluctu- ations in groundwater levels (Zektser and Loaiciga 1993); therefore, distance from river was con- sidered for the groundwater potential analysis. In this analysis, the river network of the Yasuj- Dena was taken from topographic maps at a scale of 1:50,000 (Pourghasemi et al. 2014), and then, it was buffered to obtain distance from river map (Figure 2k). Regarding rainfall, this factor has a significant impact on the potential of groundwater and its productivity (Yu and Lin 2015) because it strongly influences the amount of water penetrating to groundwater systems. In this research, the average yearly rainfall during the last ten years provided by the Iranian Meteorological Organization (Rahmati et al. 2015) was used (Figure 2l). 3. Background of the decision trees and ensemble algorithms used 3.1. Best-First tree algorithm Best-First tree (BFTree) is a relatively new and robust tree-based learning algorithm, which is initially proposed by Friedman, Hastie, and Tibshirani (2000) and then improved by Haijian (2007). Structurally, BFTree has three types of nodes, a root node, internal nodes, and leaves. Tree growing of the BFTree algorithm is followed the standard divide-and-conquer procedure; however, it uses the best-first order for expanding, instead of the depth-first as in C4.5 and Classifi- cation And Regression Tree (CART). The maximum impurity reduction measured by Information gain or Gini index is the criteria for determining which node is the best, among the available nodes, for splitting. Using the groundwater data D = (IF , CL), where IF is n input groundwater factors and n n CL is the output class. First, a root node is created, and then, the best groundwater factor, IF , that has the maximum impurity reduction is searched to split the dataset, and sub- nodes are generated. The next step of the BFTree algorithm is to find thebest nodetobe INTERNATIONAL JOURNAL OF DIGITAL EARTH 1415 split and expanded. This procedure is repeated until no node could be split anymore and samplesbelongtoCL. 3.2. Homogeneous ensemble algorithms 3.2.1. AdaBoost AdaBoost (AB), which is proposed by Freund and Schapire (1997), is one of the most robust ensem- ble algorithms in machine learning. The working procedure of this algorithm is summarized as fol- lows: first, a sub-dataset is generated from the groundwater dataset, and then, a groundwater model is constructed using the BFTree algorithm. In this step, the weight of the samples in the dataset is equally assigned. Subsequently, the model is applied to run on the whole groundwater dataset to determine misclassified samples. Then, these samples are assigned higher weights. Next, the weights of all samples in the groundwater dataset are normalized. Finally, a new-sub dataset is randomly gen- erated to construct a next groundwater model. This procedure is continued until a stopping criterion is reached (Tien Bui, Ho et al. 2016). The final model is derived by a weighted sum of all the ground- water models. 3.2.2. Bagging Bagging (Bag), which was proposed by Breiman (1996), is considered as one of the most successful ensemble algorithms. This algorithm can improve the prediction performance of classifiers in var- ious real-world problems (Erdal and Karakurt 2013; Tien Bui, Ho et al. 2016; Alobaidi, Chebana, and Meguid 2018). The practical manner of the bagging algorithm is as follows: first, sub-datasets are built from the groundwater dataset using the bootstrap sampling technique. Then, each sub-data- set is used to generate a groundwater model using the BFTree algorithm. Finally, all the groundwater models are aggregated to obtain the final model. 3.2.3. MultiBoosting Proposed by Webb (2000), MultiBoosting (MB) is a robust ensemble algorithm, which has capable of reducing variance and bias. The working principle of this algorithm is as follows: first, using the groundwater dataset, sub-datasets are derived using the bootstrap sampling technique, and then, the BFTree algorithm is used to construct groundwater models. Subsequently, misclassified samples are reset weights, and new sub-datasets are sampled to build new groundwater models. Finally, the final groundwater model is obtained. 3.3 . Benchmark model of random forest Random forest (RF) introduced by Breiman (2001)is anefficient ensemble algorithm that includes various decision trees. RF is considered one of the most successful classification algorithms, which has widely used in geosciences (Carranza and Laborte 2015; Khatami, Mountrakis, and Stehman 2016; Kuhn, Cracknell, and Reading 2019). Regarding the groundwater potential modeling, as con- clusions in Rahmati et al. (2016) and Golkarian et al. (2018), RF is the best in determining potential groundwater areas; therefore, it is selected as a benchmark. In this algorithm, the bootstrap sampling technique is employed to derive sub-datasets from the groundwater dataset. Subsequently, each of the sub-datasets will be employed to build a groundwater sub-model using the CART algorithm (Breiman 2017). Finally, the final groundwater model is obtained by aggregating all the sub-models above. It should be emphasized that the behavior of the RF model is controlled by its turning parameters such as depth of the tree (d), the number of the sub-datasets used (n), and the number of the groundwater-influencing factor employed (m); therefore, they should be carefully selected. 1416 M. AVAND ET AL. 4. Proposed approach based on best-first tree and homogeneous ensemble for identification of potential groundwater areas This section provides the methodological chart proposed in this study. It should be noted that the preparation of the groundwater data was carried out in ArcGIS 10.6, whereas all the proposed models BFTree-Bag, BFTree-AB, and BFTree-MB were programmed by us using Python-based Weka API wrapper (Reutemann 2019)(Figure 3). 4.1. The Yasuj-Dena database First, a groundwater database for the Yasuj-Dena was constructed which consists of 362 springs locations and 12 groundwater influencing factors above. In this regard, the database with the file geodatabase model of the Esri ArcCatalog was employed due to the ability to optimize its per- formance (Childs 2009). Subsequently, all the factors were coded and rescaled into the range of 0.01 and 0.99. Among the 362 springs locations, 70% or 253 locations were randomly selected and used to training the groundwater models and the rest (109 locations) were employed to check and confirm the model accuracy as suggested in Lee, Kim, and Oh (2012). Because the groundwater modeling in this research employed an approach of a binary recognition; therefore, the same amount of non-springs locations was randomly generated for the study area. Finally, an extraction process in ArcGIS was conducted to derive values of 12 influencing factors for these locations. As a result, the training dataset and the validation dataset consists of 506 and 218 samples, respectively. 4.2. Multi-collinearity checking of groundwater influencing factors As mentioned above, a total of 12 influencing factors were initial selected for this study area, how- ever, for the modeling, these factors should be checked their multicollinearities to ensure that they will not cause noises to the groundwater models. Literature review shows that variance inflation Figure 3. Overall methodological flow chart adopted in this study. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1417 factors (VIF) and tolerance (TOL) (Mansfield and Helms 1982) are the most widely used indicators for the multicollinearity checking in geosciences, including groundwater modeling (Kavzoglu, Sahin, and Colkesen 2014; Tien Bui, Tuan et al. 2016; Khosravi, Sartaj et al. 2018; Arabameri et al. 2019; Lv, Xiao et al. 2019; Maity and Mandal 2019). Therefore, they were selected for this analysis. Thus, a factor with a VIF higher than 10 and a TOL less than 0.1 indicating a problem of multicollinearity existed (Dou et al. 2019). Besides, Pearson correlation was also used to detect collinearity of two fac- tors, and herein, a pair with a Pearson value larger than 0.7 indicates a collinearity problem (Liu, Zhang, and Balay 2018). 4.3. Feature selection with the permutation method To check the if the groundwater influencing factors have contributions to the model, in this work, the permutation method (Altmann et al. 2010) is considered because it is considered as an efficient tech- nique that works well in practice (Alaa et al. 2019). Herein, the importance of an influencing factor is measured by the increase in the prediction error of the model after we permuted it, which breaks the relationship between the factor and the true outcome. The permutation importance is an intuitive, model-agnostic method to estimate the feature importance for classifier and regression models. The importance degree of the factors is used to select them to the groundwater modeling in next step. 4.4. Configuring and training the groundwater models To determine the number of BFTree and its parameters used in the ensemble models, a trial-and-test was carried out by varying the number of BFTree versus MSE (mean squared error) of each ensemble model on the training dataset and the validation dataset. For this regard, a minimum one sample in each leaf node of the BFTree was used and the 5-fold cross-validation was employed to prevent the model from overfitting (Sharma and Juglan 2018). As a result, the BFTree-Bag with 160 trees is the best for the groundwater data at hand; whereas, 8 trees and 50 trees are the most preferable for the BFTree-AB model and the BFTree-MB model, respectively. Regarding the RF model, 100 trees were used (Breiman 2001), whereas the maximum depth is 20 and the number of factors used in each tree is 12 as default values. 4.5. Quality assessment of the groundwater potential model To assess the quality of the groundwater potential maps, sensitivity, specificity, classification accu- racy (CA), positive predictive value (PPV), and negative predictive value (NPV), the ROC curve, kappa were used (Khosravi, Panahi, and Tien Bui 2018; Rahmati et al. 2018; Pham et al. 2019; Rah- mati et al. 2019). Sensitivity expresses groundwater spring predicted values against all groundwater spring outputs. The specificity of the expression is non-groundwater spring predicted values con- cerning all non-groundwater spring outputs. CA represents the number of correct predictions against all predicted items. PPV and NPV are proportions of groundwater spring and non-ground- water spring results in the analysis that are true groundwater spring and non-groundwater spring negative results, respectively. The area under the ROC curve (AUC) summary the performance glob- ally of the groundwater potential model, whereas Kappa is used to check the reliability of the model, which is the agreement of the predicted groundwater spring outcome and the inventories. 5. Results 5.1. Multicollinearity diagnosis of the groundwater influencing factors The result of the multicollinearity diagnosis is shown in Table 2. It could be seen that the TOL value is greater than 0.1 and the VIF value was less than 10 for all variables; therefore, the no 1418 M. AVAND ET AL. multicollinearity problem exists between the influencing factors used. To confirm this, Pearson’s cor- relation was further used and the result is presented in Figure 4. The highest correlation (0.69) is for between LS and slope has the highest correlation; however, this correlation value is still less than 0.7, which is the threshold value of the collinearity problem (Liu, Zhang, and Balay 2018). Therefore, it is concluded that no correlation problem among the considered factors. Although the considered factors are satisfied in the above multicollinearity diagnosis analysis; however, the predictive contribution of these factors should be checked before going ahead to the modeling process (Martínez-Álvarez et al. 2013; Bui et al. 2019; Hoa et al. 2019). Therefore, Person correlation was also used to check the predictive degree of the influencing factors to the groundwater potential. Herein, the higher the Person value, the better of that factor is for the groundwater poten- tial model. The result is shown in Table 2. We observer that rainfall (0.205) and lithology (0.164) have the highest value, whereas, distance to fault has the lowest value of 0.012; therefore, all the fac- tors are included in the modeling process. Table 2. The multicollinearity analysis for the groundwater influencing factors. No. Influencing factor VIF TOL Person value 1 Elevation 2.025 0.494 0.108 2 Slope 2.517 0.397 0.012 3 Aspect 1.043 0.959 0.047 4 SPI 1.009 0.991 0.036 5 TPI 1.508 0.663 0.024 6 TWI 1.817 0.550 0.050 7 LS 2.168 0.461 0.015 8 Distance to river 1.235 0.810 0.026 9 Land use 1.060 0.943 0.025 10 Lithology 1.164 0.859 0.164 11 Distance to fault 1.109 0.902 0.012 12 Rainfall 1.650 0.606 0.205 Figure 4. Pearson correlation of the groundwater influencing factors. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1419 5.2. Training and validating the potential groundwater models The training result of the groundwater potential models is shown in Table 3 and Figure 5. It could be seen that the three ensemble models have perfect degree-of-fit with the training dataset. Classifi- cation accuracy (CA) is higher than 98%, Kappa is larger than 0.96, and AUC is higher than 0.99. The highest fit is the BFTree-Bag model (CA = 99.8, kappa = 0.996, AUC = 0.995) and the BFTree-MB model (CA = 99.8, kappa = 0.996, AUC = 0.995), followed by the BFTree-AB model (CA = 98.2, kappa = 0.964, AUC = 0.990). In contrast to these models, the single BFTree model (CA = 86.0, kappa = 0.719, AUC = 0.939) has a lower fit significantly. The other metrics (PPV, NPV, sensitivity, and specificity) are depicted in Table 3. The validating result of the groundwater potential models is shown in Table 4 and Figure 6.We see that the three ensemble models predict groundwater potential with good results. The highest pre- diction performance is the BFTree-Bag model (CA = 74.8, kappa = 0.495), followed by the BFTree- MB model (CA = 73.9, kappa = 0.477), the BFTree-AB model (CA = 71.1, kappa = 0.422). In con- trast, the single BFTree model (CA = 69.7, kappa = 0. 0.395) has a lower prediction performance sig- nificantly. The AUC that summaries the global predicting performance of these models is shown in Figure 6. It is observed that AUC is 0.810 for the BFTree-Bag model, indicating that the prediction Table 3. Performance of the six groundwater potential models using the training dataset. CA: Classification Accuracy. Metrics BFTree-Bag BFTree-AB BFTree-MB BFTree RF True positive 252 251 253 221 252 True negative 253 246 252 214 253 False positive 1 2 0 32 1 False negative 0 7 1 39 0 PPV (%) 99.6 99.2 100.0 87.4 99.6 NPV (%) 100.0 97.2 99.6 84.6 100.0 Sensitivity (%) 100.0 97.3 99.6 85.0 100.0 Specificity (%) 99.6 99.2 100.0 87.0 99.6 CA (%) 99.8 98.2 99.8 86.0 99.8 Kappa 0.996 0.964 0.996 0.719 0.996 Figure 5. ROC curve and AUC of the models using the training dataset. 1420 M. AVAND ET AL. capability is 81.0%, followed by the BFTree-MB model (78.5%), the BFTree-AB model (74.5%). The single BFTree model has a low prediction capability (71.3). The other prediction metrics, PPV, NPV, sensitivity, and specificity are shown in Table 4. 5.3. Benchmark model and relative importance of the influencing factors The performance of the proposed ensemble models is further compared to that derived by the benchmark of RF. It is seen that the RF model (CA = 99.8, kappa = 0.996, AUC = 0.995) has a perfect degree-of-fit with the training dataset also (Table 3 and Figure 5). However, the prediction perform- ance of the RF model (CA = 72.5, kappa = 0.450, AUC = 0.801) is lower than that of the BFTree-Bag model (Table 4 and Figure 6). Regarding the relative importance of the groundwater influencing factors, the result of the per- mutation feature importance technique is shown in Table 5. We see that rainfall is the most impor- tant factor (merit value = 0.458), followed by land use (0.395), lithology (0.308), aspect (0.261), SPI (0.198), elevation (0.158), TPI (0.150), distance to river (0.102), and slope (0.095). In contrast, dis- tance to fault (0.016), TWI (0.040), and LS (0.055) have the lowest importance. Table 4. Prediction performance of the models using the validation dataset. CA: Classification Accuracy. Metrics BFTree-Bag BFTree-AB BFTree-MB BFTree RF True positive 80 78 81 71 76 True negative 83 77 80 81 82 False positive 29 31 28 38 33 False negative 26 32 29 28 27 PPV (%) 73.4 71.6 74.3 65.1 69.7 NPV (%) 76.1 70.6 73.4 74.3 75.2 Sensitivity (%) 75.5 70.9 73.6 71.7 73.8 Specificity (%) 74.1 71.3 74.1 68.1 71.3 CA (%) 74.8 71.1 73.9 69.7 72.5 Kappa 0.495 0.422 0.477 0.395 0.450 Figure 6. ROC curve and AUC of the six models using the validation dataset. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1421 Table 5. The relative importance of the groundwater influencing factors using the Permutation feature importance technique. No. Influencing factor Merit value 1 Rainfall 0.458 2 Land use 0.395 3 Lithology 0.308 4 Aspect 0.261 5 SPI 0.198 6 Elevation 0.158 7 TPI 0.150 8 Distance to river 0.102 9 Slope 0.095 10 LS 0.055 11 TWI 0.040 12 Distance to fault 0.016 5.4. Generating the groundwater potential maps Since the BFTree-Bag model is capable to provide the best prediction of groundwater potential, the model is then used to estimate the groundwater potential index for each of all the pixels in the Yasuj- Dena area. The result is then converted to a raster map to open in ArcGIS, and the result is shown in Figure 7. For the purpose of comparison, the other three groundwater potential maps produced by the BFTree-AB model (Figure 7b) the BFTree-MB model (Figure 7c), and the RF model (Figure 7d) were also generated. Visual interpretation of these potential maps indicating that the groundwater potential is in good agreement with the inventory data at hand. 6. Discussion Development of sustainability strategies for groundwater management is of major concern in many areas, particularly in countries locate at arid and semi-arid regions with limited surface water; there- fore, systematic efforts have been made worldwide (Gleeson et al. 2010; Vadiati, Adamowski, and Beynaghi 2018), and among them, producing groundwater potential maps with high accuracy is important and is still a critical issue. This research proposes and affirms a new machine learning ensemble approach based on BFTree, Bagging, AdaBoost, and MultiBoost with the aim is to enhance the quality of the identification of groundwater potential. The Yasuj-Dena area (Iran) is selected as a case study. Overall, all three proposed ensemble models, the BFTree-Bag, the BFTree-AB, and the BFTree- MB have proven its efficiency in predicting groundwater potential with the best one is the first model, followed by the last and the second. The advantage of the BFTree-Bag is the ability to reduce the variance of the groundwater samples through the use of the bootstrap sampling with replications technique. Thus, additional data for the training process were generated from the training dataset of groundwater. As a result, 160 individual trees, which were built from the boot- strap subsets, have provided a good diversity for the BFTree-Bag model. Consequently, high degree-of-fit and prediction performance are derived. For the BFTree-AB, the main advantage of this model is its adaptivity, which is the ability to adjust the weights of misclassified samples in the groundwater dataset. Consequently, the performance of the BFTree-AB model is improved compared to that of the BFTree; however, the BFTree-AB does not generate additional data from the groundwater samples, and as a result, the BFTree-AB model with 8 BFTrees was generated that limits the capability of reducing the variance of the groundwater samples used to compare to the BFTree-Bag. Regarding the BFTree-MB, this model is a balance of the BFTree-Bag and the BFTree- AB. Thus, both the bootstrap sampling with replications and the boosting techniques were used to derive subsets, which were used to generate 50 BFTrees; therefore, the prediction performance of the BFTree-MB is better compared to the BFTree-AB. However, the BFTree-MB performed lower 1422 M. AVAND ET AL. Figure 7. Groundwater potential map using: (a) the BFTree-Bag model, (b) the BFTree-AB model, (c) the BFTree-MB model, and (d) the RF model. than the BFTree-Bag model. This is because the groundwater data have some noises due to collect- ing the data from various sources with different spatial scales. As indicated in Kotsiantis (2011), MultiBoosting related models can be considered stronger than Bagging related models when the data used are noise-free. The valid of the proposed models is confirmed by comparing to the benchmark model of RF, which is a non-parametric and robust method and is suitable in the presence of concentration, noise and excessive (Rahman et al. 2020). For groundwater modeling, RF has been proven its par- ticularly suitable due to the ability to tackle complex nonlinear relationships between the affecting factors groundwater and the potential of groundwater. It can also automatically take into account the interactions between the affecting factors groundwater (Naghibi et al. 2018; Rahmati et al. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1423 2016). The result in this research shows that the BFTree-Bag model is slightly better than RF in pre- dicting groundwater potential indicating that BFTree-Bag model is a powerful tool. Investigating factors that influence groundwater potential modeling is an important issue because this helps us to develop a measure to protect and manage groundwater resources. In this research, rainfall, land use, and lithology are the most critical factors. This is a reasonable result because rain- fall affects the recharge and reinforcement of water resources (Zhang et al. 2019). Besides, it is con- sidered as one of the most effective factors in the dissolution of carbonate rocks and plays a significant role in the dissolution of lime and the formation of karst drips. In the Yasuj-Dena area, due to a large amount of rainfall in mountainous regions combined with the presence of seam and gap in the rocks, rainfall causes the penetration of water and the dissolution of limestone, resulting in an increase in groundwater and the formation and reinforcement of springs and ground- water potential in the study area. For land use, this is an important factor because the high volume of groundwater is mainly dis- tributed in forest and pasture areas in this study area. Thus, the presence of vegetation increases the penetration and strengthening of groundwater. Besides, in areas with higher altitudes, the moisture from the fog and rain as well as the production of carbon monoxide from the vegetation have played a vital role in the dissolution of limestone. Herein, various plant species influence the lime dissol- ution and through the penetration of rocks. Besides, the formation of seam and gaps have facilitated the penetration of water into the rock (Khazaei, Padyab, and Feyznia 2013). Thus, it can increase the water solubility by producing carbonic acid, causing the lime to dissolve and infiltrate more water on the ground and reinforcement groundwater. Regarding the importance of lithology, field works checking in areas with high and very high- water abstraction class showed that the major part of these areas has anthropomorphism of thick mass and dolomite lime including Asmari limestone formation, Asmari-Jahrom limestone for- mation, and Quaternary alluvial deposits. The mountainous part and the nearby areas of the Dena peak have limestone and dolomitic lithology and the plain area is more than quaternary sedi- ments (Farzin and Menbari 2018). Herein, Asmari Formation consists of colored and massive lime- stone stones with many porosity and fracture, and the presence of this porosity and fractures has led to the formation of superficial karst shapes such as Karen, Rill Karen, Ronel Karen, which play an important role in the dispersal of springs and their discharge (Barzegar et al. 2018). Thus, the pres- ence of high springs in these areas, originating from the Asmari Karst formation, confirms the high groundwater potential. Also, in the part of the plain with quaternary sediments, most of these depos- its consist of sediments and debris, composed of coarse aggregates and coarse alluvial deposits of the river, causing these substances and sediments to cause increasing the infiltration of water and recharge the groundwater supply in these areas. Chen et al. (2018) in a study designed to model groundwater potential, also showed that lithology and elevation variables are important in determin- ing the groundwater potential. 7. Conclusion In recent decades, with increasing population, the need for safe and drinkable water, including groundwater, is an uptrend, but this resource is limited, which require a better tool for managing this vital resource. This research proposed and verified a new machine learning approach for accu- rately mapping groundwater potential. Based on the finding in this research, some conclusions are as follows: Groundwater potential mapping with excellent prediction accuracy is still difficult but machine learning ensembles are good tools that should be used to enhance the quality of the groundwater potential map. BFTree-Bag with the prediction capability slightly better than RF is a new tool could be con- sidered for groundwater potential mapping in other regions. 1424 M. AVAND ET AL. . Rainfall, land use, and lithology are key factors for groundwater potential mapping The groundwater potential map in this study is a useful tool, which can be used as baseline infor- mation for local authorities and planners in developing strategies for sustainable management of groundwater. Disclosure statement No potential conflict of interest was reported by the author(s). References Ågren, A., W. Lidberg, M. Strömgren, J. Ogilvie, and P. Arp. 2014. “Evaluating Digital Terrain Indices for Soil Wetness Mapping–a Swedish Case Study.” Hydrology and Earth System Sciences 18 (9): 3623–3634. Alaa, A. M., T. Bolton, E. Di Angelantonio, J. H. Rudd, and M. Van Der Schaar. 2019. “Cardiovascular Disease Risk Prediction Using Automated Machine Learning: A Prospective Study of 423,604 uk Biobank Participants.” PloS one 14 (5): e0213653. Alfarrah, N., and K. Walraevens. 2018. “Groundwater Overexploitation and Seawater Intrusion in Coastal Areas of Arid and Semi-Arid Regions.” Water 10 (2): 143. Alobaidi, M. H., F. Chebana, and M. A. Meguid. 2018. “Robust Ensemble Learning Framework for day-Ahead Forecasting of Household Based Energy Consumption.” Applied Energy 212: 997–1012. Altmann, A., L. Toloşi, O. Sander, and T. Lengauer. 2010. “Permutation Importance: A Corrected Feature Importance Measure.” Bioinformatics (oxford, England) 26 (10): 1340–1347. Amaya, A. G., J. Ortiz, A. Durán, and M. Villazon. 2018. “Hydrogeophysical Methods and Hydrogeological Models: Basis for Groundwater Sustainable Management in Valle Alto (Bolivia).” Sustainable Water Resources Management 5: 1–10. Amiri, M., H. R. Pourghasemi, G. A. Ghanbarian, and S. F. Afzali. 2019. “Assessment of the Importance of Gully Erosion Effective Factors using Boruta Algorithm and its Spatial Modeling and Mapping using Three Machine Learning Algorithms.” Geoderma 340: 55–69. doi:10.1016/j.geoderma.2018.12.042. Arabameri, A., K. Rezaei, A. Cerda, L. Lombardo, and J. Rodrigo-Comino. 2019. “Gis-based Groundwater Potential Mapping in Shahroud Plain, Iran. A Comparison among Statistical (Bivariate and Multivariate), Data Mining and Mcdm Approaches.” Science of the Total Environment 658: 160–177. Arulbalaji, P., D. Padmalal, and K. Sreelash. 2019. “Gis and ahp Techniques Based Delineation of Groundwater Potential Zones: A Case Study From Southern Western Ghats, India.” Scientific Reports 9 (1): 2082. Bachmanov, D. M., V. G. Trifonov, K. T. Hessami, A. I. Kozhurin, T. P. Ivanova, E. A. Rogozhin, M. C. Hademi, and F. H. Jamali. 2004. “Active Faults in the Zagros and Central Iran.” Tectonophysics 380 (3): 221–241. http://www. sciencedirect.com/science/article/pii/S0040195103005080. Barzegar, R., A. A. Moghaddam, J. Adamowski, and A. H. Nazemi. 2019. “Delimitation of Groundwater Zones Under Contamination Risk Using a Bagged Ensemble of Optimized Drastic Frameworks.” Environmental Science and Pollution Research 26 (8): 8325–8339. Barzegar, R., A. A. Moghaddam, R. Deo, E. Fijani, and E. Tziritis. 2018. “Mapping Groundwater Contamination Risk of Multiple Aquifers Using Multi-Model Ensemble of Machine Learning Algorithms.” Science of the Total Environment 621: 697–712. Beven, K., M. Kirkby, N. Schofield, and A. Tagg. 1984. “Testing a Physically-Based Flood Forecasting Model (Topmodel) for Three uk Catchments.” Journal of Hydrology 69 (1): 119–143. Bloomfield, J. P., B. P. Marchant, and A. A. Mckenzie. 2019. “Changes in Groundwater Drought Associated with Anthropogenic Warming.” Hydrology and Earth System Sciences 23 (3): 1393–1408. Bouwer, H. 2002. “Artificial Recharge of Groundwater: Hydrogeology and Engineering.” Hydrogeology Journal 10 (1): 121–142. Breiman, L. 1996. “Bagging Predictors.” Machine Learning 24 (2): 123–140. Breiman, L. 2001. “Random Forests.” Machine Learning 45 (1): 5–32. Breiman, L. 2017. Classification and Regression Trees. Boca Raton, FL: CRC Press LLC. Bui, N. T., A. Kawamura, H. Amaguchi, D. Du Bui, N. T. Truong, and K. Nakagawa. 2018. “Social Sustainability Assessment of Groundwater Resources: A Case Study of Hanoi, Vietnam.” Ecological Indicators 93: 1034–1042. Bui, D. T., P.-T. T. Ngo, T. D. Pham, A. Jaafari, N. Q. Minh, P. V. Hoa, and P. Samui. 2019. “A Novel Hybrid Approach Based on a Swarm Intelligence Optimized Extreme Learning Machine for Flash Flood Susceptibility Mapping.” CATENA 179: 184–196. http://www.sciencedirect.com/science/article/pii/S034181621930147X. Carranza, E. J. M., and A. G. Laborte. 2015. “Data-driven Predictive Mapping of Gold Prospectivity, Baguio District, Philippines: Application of Random Forests Algorithm.” Ore Geology Reviews 71: 777–787. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1425 Cavalcante Júnior, R. G., M. A. Vasconcelos Freitas, N. F. da Silva, and F. R. de Azevedo Filho. 2019. “Sustainable Groundwater Exploitation Aiming at the Reduction of Water Vulnerability in the Brazilian Semi-Arid Region.” Energies 12 (5): 904. Chen, W., H. Li, E. Hou, S. Wang, G. Wang, M. Panahi, T. Li, et al. 2018. “Gis-based Groundwater Potential Analysis Using Novel Ensemble Weights-of-Evidence with Logistic Regression and Functional Tree Models.” Science of The Total Environment 634: 853–867. http://www.sciencedirect.com/science/article/pii/S0048969718312130. Chen, W., H. Li, E. Hou, S. Wang, G. Wang, M. Panahi, T. Li, T. Peng, C. Guo, and C. Niu. 2018. “GIS-based Groundwater Potential Analysis using Novel Ensemble Weights-of-evidence with Logistic Regression and Functional Tree Models.” Science of The Total Environment 634: 853–867. doi:10.1016/j.scitotenv.2018.04.055. Chen, W., M. Panahi, K. Khosravi, H. R. Pourghasemi, F. Rezaie, and D. Parvinnezhad. 2019. “Spatial Prediction of Groundwater Potentiality Using Anfis Ensembled with Teaching-Learning-Based and Biogeography-Based Optimization.” Journal of Hydrology 572: 435–448. Chen, W., B. Pradhan, S. Li, H. Shahabi, H. M. Rizeei, E. Hou, and S. Wang. 2019. “Novel Hybrid Integration Approach of Bagging-Based Fisher’s Linear Discriminant Function for Groundwater Potential Analysis.” Natural Resources Research 28 (4): 1239–1258. Chen, W., P. Tsangaratos, I. Ilia, Z. Duan, and X. Chen. 2019. “Groundwater Spring Potential Mapping Using Population-Based Evolutionary Algorithms and Data Mining Methods.” Science of The Total Environment 684: 31–49. http://www.sciencedirect.com/science/article/pii/S0048969719323599. Chen, W., S. Zhang, R. Li, and H. Shahabi. 2018. “Performance Evaluation of the gis-Based Data Mining Techniques of Best-First Decision Tree, Random Forest, and Naïve Bayes Tree for Landslide Susceptibility Modeling.” Science of the Total Environment 644: 1006–1018. Childs, C. 2009. “The Top Nine Reasons to Use a File Geodatabase.” A Scalable and Speedy Choice for Single Users or Small Groups. ArcUser, Spring 2009: 12–15. Chilton, P. J., and S. Foster. 1995. “Hydrogeological Characterisation and Water-Supply Potential of Basement Aquifers in Tropical Africa.” Hydrogeology Journal 3 (1): 36–49. Connor, R. 2015. The United Nations World Water Development Report 2015: Water for a Sustainable World. Paris: UNESCO publishing. Corsini, A., F. Cervi, and F. Ronchetti. 2009. “Weight of Evidence and Artificial Neural Networks for Potential Groundwater Spring Mapping: an Application to the Mt. Modino Area (Northern Apennines, Italy).” Geomorphology 111 (1-2): 79–87. doi:10.1016/j.geomorph.2008.03.015. de Graaf, I. E. M., T. Gleeson, L. P. H. (Rens) van Beek, E. H. Sutanudjaja, and M. F. P. Bierkens. 2019. “Environmental Flow Limits to Global Groundwater Pumping.” Nature 574 (7776): 90–94. doi:10.1038/s41586-019-1594-4. Dou, J., A. P. Yunus, D. Tien Bui, A. Merghadi, M. Sahana, Z. Zhu, C.-W. Chen, K. Khosravi, Y. Yang, and B. T. Pham. 2019. “Assessment of Advanced Random Forest and Decision Tree Algorithms for Modeling Rainfall-Induced Landslide Susceptibility in the izu-Oshima Volcanic Island, Japan.” Science of the Total Environment 662: 332–346. Erdal, H. I., and O. Karakurt. 2013. “Advancing Monthly Streamflow Prediction Accuracy of Cart Models Using Ensemble Learning Paradigms.” Journal of Hydrology 477: 119–128. Falah, F., S. Ghorbani Nejad, O. Rahmati, M. Daneshfar, and H. Zeinivand. 2016. “Applicability of Generalized Additive Model in Groundwater Potential Modelling and Comparison its Performance by Bivariate Statistical Methods.” Geocarto International 32 (10): 1069–1089. doi:10.1080/10106049.2016.1188166. Fang, H.-T., B.-C. Jhong, Y.-C. Tan, K.-Y. Ke, and M.-H. Chuang. 2019. “A two-Stage Approach Integrating som-and Moga-svm-Based Algorithms to Forecast Spatial-Temporal Groundwater Level with Meteorological Factors.” Water Resources Management 33 (2): 797–818. Farzin, M., and S. Menbari. 2018. “Zoning of Karstic Aquifer Protection on Tange-Konara Yasuj Using Cop Method.” Journal of Range and Watershed Managment 71 (2): 439–455. https://jrwm.ut.ac.ir/article_67999_ a3b3b3180c1cb5c03724fa623ff3904a.pdf. Fenta, A. A., A. Kifle, T. Gebreyohannes, and G. Hailu. 2015. “Spatial Analysis of Groundwater Potential Using Remote Sensing and Gis-Based Multi-Criteria Evaluation in Raya Valley, Northern Ethiopia.” Hydrogeology Journal 23 (1): 195–206. Freund, Y., and R. E. Schapire. 1997. “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting.” Journal of Computer and System Sciences 55 (1): 119–139. http://www.sciencedirect.com/science/article/ pii/S002200009791504X. Friedman, J., T. Hastie, and R. Tibshirani. 2000. “Additive Logistic Regression: A Statistical View of Boosting (with Discussion and a Rejoinder by the Authors).” The Annals of Statistics 28 (2): 337–407. Gheith, H., and M. Sultan. 2002. “Construction of a Hydrologic Model for Estimating Wadi Runoff and Groundwater Recharge in the Eastern Desert, Egypt.” Journal of Hydrology 263 (1-4): 36–55. Ghorbani Nejad, S., F. Falah, M. Daneshfar, A. Haghizadeh, and O. Rahmati. 2016. “Delineation of Groundwater Potential Zones using Remote Sensing and GIS-based Data-Driven Models.” Geocarto International 1–21. doi:10.1080/10106049.2015.1132481. Gleeson, T., J. Vandersteen, M. A. Sophocleous, M. Taniguchi, W. M. Alley, D. M. Allen, and Y. Zhou. 2010. “Groundwater Sustainability Strategies.” Nature Geoscience 3 (6): 378–379. 1426 M. AVAND ET AL. Golkarian, A., S. A. Naghibi, B. Kalantar, and B. Pradhan. 2018. “Groundwater Potential Mapping Using c5. 0, Random Forest, and Multivariate Adaptive Regression Spline Models in gis.” Environmental Monitoring and Assessment 190 (3): 149. Gu, H., F. Ma, J. Guo, K. Li, and R. Lu. 2018. “Assessment of Water Sources and Mixing of Groundwater in a Coastal Mine: The Sanshandao Gold Mine, China.” Mine Water and the Environment 37 (2): 351–365. Gulden, L. E., E. Rosero, Z. L. Yang, M. Rodell, C. S. Jackson, G. Y. Niu, P. J. F. Yeh, and J. Famiglietti. 2007. “Improving Land-Surface Model Hydrology: Is an Explicit Aquifer Model Better Than a Deeper Soil Profile?” Geophysical Research Letters 34 (9): 1–5. Haijian, S. 2007. Best-first Decision Tree Learning. Hamilton: Thesis, Master of Science. The University of Waikato. Hasan, M., Y. Shang, G. Akhter, and W. Jin. 2018. “Geophysical Assessment of Groundwater Potential: A Case Study From Mian Channu Area, Pakistan.” Groundwater 56 (5): 783–796. Hoa, P. V., N. V. Giang, N. A. Binh, L. V. H. Hai, T.-D. Pham, M. Hasanlou, and D. Tien Bui. 2019. “Soil Salinity Mapping Using sar Sentinel-1 Data and Advanced Machine Learning Algorithms: A Case Study at ben tre Province of the Mekong River Delta (Vietnam).” Remote Sensing 11 (2): 128. http://www.mdpi.com/2072-4292/ 11/2/128. Jaxa. 2019. Alos global digital surface model. Http://www.Eorc.Jaxa.Jp/alos/en/aw3d30/ [online]. Japan Aerospace Exploration Agency. Jegadeeshwaran, R., and V. Sugumaran. 2013. “Comparative Study of Decision Tree Classifier and Best First Tree Classifier for Fault Diagnosis of Automobile Hydraulic Brake System Using Statistical Features.” Measurement 46 (9): 3247–3260. Kalantar, B., H. A. H. Al-Najjar, B. Pradhan, V. Saeidi, A. A. Halin, N. Ueda, and S. A. Naghibi. 2019. “Optimized Conditioning Factors Using Machine Learning Techniques for Groundwater Potential Mapping.” Water 11 (9): 1909. doi:10.3390/w11091909. Kambhammettu, B., P. Allena, and J. P. King. 2011. “Application and Evaluation of Universal Kriging for Optimal Contouring of Groundwater Levels.” Journal of Earth System Science 120 (3): 413–422. Kammoun, S., R. Trabelsi, V. Re, K. Zouari, and J. Henchiri. 2018. “Groundwater Quality Assessment in Semi-Arid Regions Using Integrated Approaches: The Case of Grombalia Aquifer (ne Tunisia).” Environmental Monitoring and Assessment 190 (2): 87. Kavzoglu, T., E. K. Sahin, and I. Colkesen. 2014. “Landslide Susceptibility Mapping Using gis-Based Multi-Criteria Decision Analysis, Support Vector Machines, and Logistic Regression.” Landslides 11 (3): 425–439. Khatami, R., G. Mountrakis, and S. V. Stehman. 2016. “A Meta-Analysis of Remote Sensing Research on Supervised Pixel-Based Land-Cover Image Classification Processes: General Guidelines for Practitioners and Future Research.” Remote Sensing of Environment 177: 89–100. Khazaei, M., M. Padyab, and S. Feyznia. 2013. “Investigating the Effect of Diapiras on Water and Soil Salinization: Case Study: Deipir, Shar Kakan River Basin, Yasouj.” Geography and Development 32: 15–28. Khosravi, K., M. Panahi, and D. Tien Bui. 2018. “Spatial Prediction of Groundwater Spring Potential Mapping Based on an Adaptive Neuro-Fuzzy Inference System and Metaheuristic Optimization.” Hydrology & Earth System Sciences 22 (9): 4771–4792. Khosravi, K., M. Sartaj, F. T.-C. Tsai, V. P. Singh, N. Kazakis, A. M. Melesse, I. Prakash, D. Tien Bui, and B. T. Pham. 2018. “A Comparison Study of Drastic Methods with Various Objective Methods for Groundwater Vulnerability Assessment.” Science of the Total Environment 642: 1032–1049. Kim, Y. J., and S.-Y. Hamm. 1999. “Assessment of the Potential for Groundwater Contamination Using the Drastic/ Egis Technique, Cheongju Area, South Korea.” Hydrogeology Journal 7 (2): 227–235. Klute, A., E. Scott, and F. Whisler. 1965. “Steady State Water Flow in a Saturated Inclined Soil Slab.” Water Resources Research 1 (2): 287–294. Development of Urban Water Services in Jakarta: The Kooy, M., C. T. Walter, and I. Prabaharyaka. 2018. “Inclusive Role of Groundwater.” Habitat International 73: 109–118. Kordestani, M. D., S. A. Naghibi, H. Hashemi, K. Ahmadi, B. Kalantar, and B. Pradhan. 2019. “Groundwater Potential Mapping Using a Novel Data-Mining Ensemble Model.” Hydrogeology Journal 27 (1): 211–224. Kotsiantis, S. 2011. “Combining Bagging, Boosting, Rotation Forest and Random Subspace Methods.” Artificial Intelligence Review 35 (3): 223–240. Kuhn, S., M. J. Cracknell, and A. M. Reading. 2019. “Lithological Mapping in the Central African Copper Belt Using Random Forests and Clustering: Strategies for Optimised Results.” Ore Geology Reviews 112: 103015. Kumar, V. 2006. “Kriging of Groundwater Levels–a Case Study.” Journal of Spatial Hydrology 6 (1): 81–94. Lee, S., Y.-S. Kim, and H.-J. Oh. 2012. “Application of a Weights-of-Evidence Method and gis to Regional Groundwater Productivity Potential Mapping.” Journal of Environmental Management 96 (1): 91–105. http:// www.sciencedirect.com/science/article/pii/S0301479711003471. Liu, C., Z. Zhang, and J. W. Balay. 2018. “Posterior Assessment of Reference Gages for Water Resources Management Using Instantaneous Flow Measurements.” Science of the Total Environment 634: 12–19. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1427 Lo, M. H., J. S. Famiglietti, J. T. Reager, M. Rodell, S. Swenson, and W. Y. Wu. 2016. “Grace-based Estimates of Global Groundwater Depletion.” In Terrestrial Water Cycle and Climate Change: Natural and Human-Induced Impacts, edited by Q. Tang and T. Oki, 137–146. Florida: American Geophysical Union and John Wiley and Sons, Inc. Lv, C., M. Ling, Z. Wu, X. Guo, and Q. Cao. 2019. “Quantitative Assessment of Ecological Compensation for Groundwater Overexploitation Based on Emergy Theory.” Environmental Geochemistry and Health 1–12. doi:10.1007/s10653-019-00248-z. Lv, X., W. Xiao, Y. Zhao, W. Zhang, S. Li, and H. Sun. 2019. “Drivers of Spatio-Temporal Ecological Vulnerability in an Arid, Coal Mining Region in Western China.” Ecological Indicators 106: 105475. Maity, D. K., and S. Mandal. 2019. “Identification of Groundwater Potential Zones of the Kumari River Basin, India: An rs & gis Based Semi-Quantitative Approach.” Environment, Development and Sustainability 21 (2): 1013–1034. Mansfield, E. R., and B. P. Helms. 1982. “Detecting Multicollinearity.” The American Statistician 36 (3a): 158–160. Martínez-Álvarez, F., J. Reyes, A. Morales-Esteban, and C. Rubio-Escudero. 2013. “Determining the Best set of Seismicity Indicators to Predict Earthquakes. Two Case Studies: Chile and the Iberian Peninsula.” Knowledge- Based Systems 50 (0): 198–210. http://www.sciencedirect.com/science/article/pii/S0950705113001871. Miraki, S., S. H. Zanganeh, K. Chapi, V. P. Singh, A. Shirzadi, H. Shahabi, and B. T. Pham. 2019. “Mapping Groundwater Potential Using a Novel Hybrid Intelligence Approach.” Water Resources Management 33 (1): 281–302. Moghaddam, D. D., O. Rahmati, M. Panahi, J. Tiefenbacher, H. Darabi, A. Haghizadeh, A. T. Haghighi, O. A. Nalivan, and D. Tien Bui. 2020. “The Effect of Sample Size on Different Machine Learning Models for Groundwater Potential Mapping in Mountain Bedrock Aquifers.” CATENA 187: 104421. doi:10.1016/j.catena.2019.104421. P., C. K. Singh, and S. Mukherjee. 2012. “Delineation of Groundwater Potential Zones in Arid Region of Mukherjee, India—a Remote Sensing and gis Approach.” Water Resources Management 26 (9): 2643–2672. Naghibi, S. A., K. Ahmadi, and A. Daneshi. 2017. “Application of Support Vector Machine, Random Forest, and Genetic Algorithm Optimized Random Forest Models in Groundwater Potential Mapping.” Water Resources Management 31 (9): 2761–2775. Naghibi, S. A., M. Dolatkordestani, A. Rezaei, P. Amouzegari, M. T. Heravi, B. Kalantar, and B. Pradhan. 2019. “Application of Rotation Forest with Decision Trees as Base Classifier and a Novel Ensemble Model in Spatial Modeling of Groundwater Potential.” Environmental Monitoring and Assessment 191 (4): 248. Naghibi, S. A., and H. R. Pourghasemi. 2015. “A Comparative Assessment Between Three Machine Learning Models and Their Performance Comparison by Bivariate and Multivariate Statistical Methods in Groundwater Potential Mapping.” Water Resources Management 29 (14): 5217–5236. doi:10.1007/s11269-015-1114-8. Naghibi, S. A., H. R. Pourghasemi, and K. Abbaspour. 2018. “A Comparison Between Ten Advanced and Soft Computing Models for Groundwater Qanat Potential Assessment in Iran using R and GIS.” Theoretical and Applied Climatology 131 (3-4): 967–984. doi:10.1007/s00704-016-2022-4. Naghibi, S.A., H. R. Pourghasemi, and B. Dixon. 2016. “GIS-based Groundwater Potential Mapping using Boosted Regression Tree, Classification and Regression Tree, and Random Forest Machine Learning Models in Iran.” Environmental Monitoring and Assessment 188 (1). doi:10.1007/s10661-015-5049-6. Nsiah, E., E. K. Appiah-Adjei, and K. A. Adjei. 2018. “Hydrogeological Delineation of Groundwater Potential Zones in the Nabogo Basin, Ghana.” Journal of African Earth Sciences 143: 1–9. Oh, H.-J., Y.-S. Kim, J.-K. Choi, E. Park, and S. Lee. 2011. “Gis Mapping of Regional Probabilistic Groundwater Potential in the Area of Pohang City, Korea.” Journal of Hydrology 399 (3-4): 158–172. Okereke, C., E. Esu, and A. Edet. 1998. “Determination of Potential Groundwater Sites Using Geological and Geophysical Techniques in the Cross River State, Southeastern Nigeria.” Journal of African Earth Sciences 27 (1): 149–163. Ozdemir, A. 2011. “Using a Binary Logistic Regression Method and gis for Evaluating and Mapping the Groundwater Spring Potential in the Sultan Mountains (Aksehir, Turkey).” Journal of Hydrology 405 (1-2): 123–136. Panagopoulos, G., A. Antonakos, and N. Lambrakis. 2006. “Optimization of the Drastic Method for Groundwater Vulnerability Assessment via the use of Simple Statistical Methods and gis.” Hydrogeology Journal 14 (6): 894–911. Pham, B. T., A. Jaafari, I. Prakash, S. K. Singh, N. K. Quoc, and D. Tien Bui. 2019. “Hybrid Computational Intelligence Models for Groundwater Potential Mapping.” CATENA 182: 104101. http://www.sciencedirect.com/science/article/ pii/S0341816219302437. Pourghasemi, H., H. Moradi, S. F. Aghda, C. Gokceoglu, and B. Pradhan. 2014. “Gis-based Landslide Susceptibility Mapping with Probabilistic Likelihood Ratio and Spatial Multi-Criteria Evaluation Models (North of Tehran, Iran).” Arabian Journal of Geosciences 7 (5): 1857–1878. Rahman, M. M., J. Karunasinghe, S. Clifford, L. D. Knibbs, and L. Morawska. 2020. “New Insights into the Spatial Distribution of Particle Number Concentrations by Applying Non-Parametric Land Use Regression Modelling.” Science of The Total Environment 702: 134708. doi:10.1016/j.scitotenv.2019.134708. Rahmati, O., D. D. Moghaddam, V. Moosavi, Z. Kalantari, M. Samadi, S. Lee, and D. Tien Bui. 2019. “An Automated Python Language-Based Tool for Creating Absence Samples in Groundwater Potential Mapping.” Remote Sensing 11 (11): 1375. https://www.mdpi.com/2072-4292/11/11/1375. 1428 M. AVAND ET AL. Rahmati, O., S. A. Naghibi, H. Shahabi, D. T. Bui, B. Pradhan, A. Azareh, E. Rafiei-Sardooi, A. N. Samani, and A. M. Melesse. 2018. “Groundwater Spring Potential Modelling: Comprising the Capability and Robustness of Three Different Modeling Approaches.” Journal of Hydrology 565: 248–261. http://www.sciencedirect.com/science/ article/pii/S002216941830622X. Rahmati, O., A. Nazari Samani, M. Mahdavi, H. R. Pourghasemi, and H. Zeinivand. 2015. “Groundwater Potential Mapping at Kurdistan Region of Iran Using Analytic Hierarchy Process and gis.” Arabian Journal of Geosciences 8 (9): 7059–7071. doi:10.1007/s12517-014-1668-4. Rahmati, O., H. R. Pourghasemi, and A. M. Melesse. 2016. “Application of GIS-based Data Driven Random Forest and Maximum Entropy Models for Groundwater Potential Mapping: A Case Study at Mehran Region, Iran.” CATENA 137: 360–372. doi:10.1016/j.catena.2015.10.010. Razzaq, A., P. Qing, M. Abid, M. Anwar, and I. Javed. 2019. “Can the Informal Groundwater Markets Improve Water use Efficiency and Equity? Evidence From a Semi-Arid Region of Pakistan.” Science of The Total Environment 666: 849–857. Reutemann, P. 2019. Python weka wrapper 3 0.1.7. Https://pypi.Org/project/python-weka-wrapper3/. Rezaei, F., M. R. Ahmadzadeh, and H. R. Safavi. 2017. “Som-drastic: Using Self-Organizing map for Evaluating Groundwater Potential to Pollution.” Stochastic Environmental Research and Risk Assessment 31 (8): 1941–1956. Rizeei, H. M., B. Pradhan, M. A. Saharkhiz, and S. Lee. 2019. “Groundwater Aquifer Potential Modeling Using an Ensemble Multi-Adoptive Boosting Logistic Regression Technique.” Journal of Hydrology 579: 124172. http:// www.sciencedirect.com/science/article/pii/S0022169419309072. Rosenqvist, A., M. Shimada, N. Ito, and M. Watanabe. 2007. “Alos Palsar: A Pathfinder Mission for Global-Scale Monitoring of the Environment.” IEEE Transactions on Geoscience and Remote Sensing 45 (11): 3307–3316. Sameen, M. I., B. Pradhan, and S. Lee. 2019. “Self-learning Random Forests Model for Mapping Groundwater Yield in Data-Scarce Areas.” Natural Resources Research 28 (3): 757–775. Scanlon, B. R., R. C. Reedy, D. A. Stonestrom, D. E. Prudic, and K. F. Dennehy. 2005. “Impact of Land use and Land Cover Change on Groundwater Recharge and Quality in the Southwestern us.” Global Change Biology 11 (10): 1577–1593. Sepehr, M., and J. Cosgrove. 2005. “Role of the Kazerun Fault Zone in the Formation and Deformation of the Zagros Fold-Thrust Belt, Iran.” Tectonics 24 (5): 1–13. doi:10.1029/2004TC001725. Sharma, V., and K. Juglan. 2018. “Automated Classification of Fatty and Normal Liver Ultrasound Images Based on Mutual Information Feature Selection.” IRBM 39 (5): 313–323. Solomon, S., and F. Quiel. 2006. “Groundwater Study Using Remote Sensing and Geographic Information Systems (gis) in the Central Highlands of Eritrea.” Hydrogeology Journal 14 (6): 1029–1041. Suryanarayana, C., and V. Mahammood. 2019. “Groundwater-level Assessment and Prediction Using Realistic Pumping and Recharge Rates for Semi-Arid Coastal Regions: A Case Study of Visakhapatnam City, India.” Hydrogeology Journal 27 (1): 249–272. Tahmassebipoor, N., O. Rahmati, F. Noormohamadi, and S. Lee. 2016. “Spatial Analysis of Groundwater Potential using Weights-of-evidence and Evidential Belief Function Models and Remote Sensing.” Arabian Journal of Geosciences 9 (1). .doi:10.1007/s12517-015-2166-z. Tien Bui, D., T.-C. Ho, B. Pradhan, B.-T. Pham, V.-H. Nhu, and I. Revhaug. 2016. “Gis-based Modeling of Rainfall- Induced Landslides Using Data Mining-Based Functional Trees Classifier with Adaboost, Bagging, and Multiboost Ensemble Frameworks.” Environmental Earth Sciences 75 (14): 1101. doi:10.1007/s12665-016-5919-4. Tien Bui, D., T. A. Tuan, H. Klempe, B. Pradhan, and I. Revhaug. 2016. “Spatial Prediction Models for Shallow Landslide Hazards: A Comparative Assessment of the Efficacy of Support Vector Machines, Artificial Neural Networks, Kernel Logistic Regression, and Logistic Model Tree.” Landslides 13: 361–378. Tobler, W. 2004. “On the First law of Geography: A Reply.” Annals of the Association of American Geographers 94 (2): 304–310. Usgs. 2016. The united states geological survey earth resources observation and science center, http://earthexplorer.Usgs. Gov [online]. United States Geological Survey. Vadiati, M., J. Adamowski, and A. Beynaghi. 2018. “A Brief Overview of Trends in Groundwater Research: Progress Towards Sustainability?” Journal of Environmental Management 223: 849–851. http://www.sciencedirect.com/ science/article/pii/S0301479718307382. Varouchakis, E., D. Hristopulos, and G. Karatzas. 2012. “Improving Kriging of Groundwater Level Data Using Nonlinear Normalizing Transformations—a Field Application.” Hydrological Sciences Journal 57 (7): 1404–1419. Webb, G. I. 2000. “Multiboosting: A Technique for Combining Boosting and Wagging.” Machine Learning 40 (2): 159–196. Worthington, P. F. 1977. “Geophysical Investigations of Groundwater Resources in the Kalahari Basin.” Geophysics 42 (4): 838–849. Yu, H.-L., and Y.-C. Lin. 2015. “Analysis of Space–Time non-Stationary Patterns of Rainfall–Groundwater Interactions by Integrating Empirical Orthogonal Function and Cross Wavelet Transform Methods.” Journal of Hydrology 525: 585–597. INTERNATIONAL JOURNAL OF DIGITAL EARTH 1429 Zektser, I., and H. A. Loaiciga. 1993. “Groundwater Fluxes in the Global Hydrologic Cycle: Past, Present and Future.” Journal of Hydrology 144 (1-4): 405–427. Zhang, Q., and L. Li. 2009. “Development and Application of an Integrated Surface Runoff and Groundwater Flow Model for a Catchment of Lake Taihu Watershed, China.” Quaternary International 208 (1-2): 102–108. Zhang, B., H.-X. Wang, Y.-W. Ye, J.-L. Tao, L.-Z. Zhang, and L. Shi. 2019. “Potential Hazards to a Tunnel Caused by Adjacent Reservoir Impoundment.” Bulletin of Engineering Geology and the Environment 78 (1): 397–415. Zhu, A. X., G. Lu, J. Liu, C. Z. Qin, and C. Zhou. 2018. “Spatial Prediction Based on Third law of Geography.” Annals of GIS 24 (4): 225–240. Ziolkowska, J., and R. Reyes. 2017. “Groundwater Level Changes due to Extreme Weather—an Evaluation Tool for Sustainable Water Management.” Water 9 (2): 117.

Journal

International Journal of Digital EarthTaylor & Francis

Published: Dec 1, 2020

Keywords: Environmental modeling; groundwater potential; GIS; ensemble model; decision tree

There are no references for this article.