A Stratigraphic Prediction Method Based on Machine Learning

Cuiying Zhou; Jinwu Ouyang; Weihua Ming; Guohao Zhang; Zichun Du; Zhen Liu

doi:10.3390/app9173553

A Stratigraphic Prediction Method Based on Machine Learning

Zhou, Cuiying;Ouyang, Jinwu;Ming, Weihua;Zhang, Guohao;Du, Zichun;Liu, Zhen 2019-08-29 00:00:00 applied sciences Article A Stratigraphic Prediction Method Based on Machine Learning 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 Cuiying Zhou , Jinwu Ouyang , Weihua Ming , Guohao Zhang , Zichun Du 1 , 2 , and Zhen Liu * Civil Engineering, Sun Yat-sen University, Guangzhou 510275, China Guangdong Engineering Research Centre for Major Infrastructure Safety, Guangzhou 510275, China * Correspondence: [email protected] Received: 20 June 2019; Accepted: 27 August 2019; Published: 29 August 2019 Abstract: Simulation of a geostratigraphic unit is of vital importance for the study of geoinformatics, as well as geoengineering planning and design. A traditional method depends on the guidance of expert experience, which is subjective and limited, thereby making the eective evaluation of a stratum simulation quite impossible. To solve this problem, this study proposes a machine learning method for a geostratigraphic series simulation. On the basis of a recurrent neural network, a sequence model of the stratum type and a sequence model of the stratum thickness is successively established. The performance of the model is improved in combination with expert-driven learning. Finally, a machine learning model is established for a geostratigraphic series simulation, and a three-dimensional (3D) geological modeling evaluation method is proposed which considers the stratum type and thickness. The results show that we can use machine learning in the simulation of a series. The series model based on machine learning can describe the real situation at wells, and it is a complimentary tool to the traditional 3D geological model. The prediction ability of the model is improved to a certain extent by including expert-driven learning. This study provides a novel approach for the simulation and prediction of a series by 3D geological modeling. Keywords: recurrent neural network; series simulation; three-dimensional geological modeling; expert-driven learning 1. Introduction A geostratigraphic structure is the result of multiple factors in the course of the evolution of Earth’s history, forming a complex morphology and irregular distribution. Geological bodies have spatially successive relationships, thus forming a series of strata with dierent lateral extensions and thicknesses. A geostratigraphic series is spatially uncertain due to the variations in sequence and the number and thickness of the stratum layers. Within the rock-soil mass extending from the top of the bedrock (referring to lithiﬁed rock that underlies unconsolidated surface sediments, conglomerates or regolith) to the surface, only one layer or dozens of layers can be present. There can be a few layers, and each can be dierent. At the same time, the thickness of the strata also varies considerably, from tens of centimeters to hundreds of meters. Dierent geotechnical bodies have dierent physical, chemical, and mechanical properties, and weak stratum conditions directly threaten the safety of engineering construction and operation. A geostratigraphic series model with high reliability is helpful to understand the geological conditions of a construction area, providing far-reaching practical guidance for site planning and selection, engineering construction, environmental assessment, cost savings, and operational risk reduction. Therefore, building a series model and accurately describing the spatial distribution of strata have become important topics in the ﬁeld of geology and engineering geology. Appl. Sci. 2019, 9, 3553; doi:10.3390/app9173553 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 3553 2 of 29 To understand the geological structure, many techniques and methods have been developed to describe, simulate, and model strata [1–6]. With the introduction of the Glass Earth [7] concept and geological data, interdisciplinary theoretical integration and application research is being carried out. The most representative traditional method of simulating the stratum structure is three-dimensional (3D) geological modeling, such as that with the B-rep model [8], octree model [9], tri-prism model [10] and geochron concepts [11–14]. However, the traditional method relies on the guidance of expert knowledge and experience in the selection of assumptions, parameters, and data interpolation methods, which are subjective and limited [15]. Assumptions about the borehole data distribution must be made, and it is dicult to eectively evaluate the stratum simulation results. Machine learning [16–18] has been widely used in various ﬁelds of geology. The machine learning method does not make too many assumptions about the data but selects a model according to the data characteristics. Then, the machine learning method divides the data into a training set and a test set and constantly adjusts the parameters to obtain better accuracy. Machine learning is more concerned with the predictive power of models [19]. In the ﬁelds of geology and engineering, there have been numerous research and application examples in dierent ﬁelds [20–25]. Rodriguez-Galiano et al. conducted a study on mineral exploration based on a decision tree [26]. Porwal et al. used radial function and neural network to evaluate potential maps in mineral exploration [27]. Zhang studied the relationships between chemical elements and magmatite and between the sedimentary rock lithology and sedimentary rock minerals by using a multilayer perceptron and back propagation (BP) neural network [28]. Zhang et al. predicted karst collapse based on the Gaussian process [29]. Chaki et al. carried out an inversion of reservoir parameters by combining well logging and seismic data [30]. Gaurav combined machine learning, pattern recognition, and multivariate geostatistics to estimate the ﬁnal recoverable shale gas volume [31]. Sha et al. used a convolutional neural network to characterize unfavorable geological bodies and surface issues, etc. [32]. Generally, machine learning research on stratum distributions based on drilling data is in its infancy. To solve the above problems, this study explores the feasibility and reliability of machine learning through the simulation of a geostratigraphic series and proposes a machine learning geostratigraphic series simulation method. This method does not rely on subjective factors, and it is based on the principle of a recurrent neural network [33,34] to establish a stratum simulation model. This method can determine the stratum information accurately. The predictive power of machine learning models is examined with expert-driven mechanism based on supervisory learning [35]. Compared with the traditional 3D geological modeling method, this study shows that the proposed method can better describe the real situation. This study provides a novel approach for the simulation and prediction of a geostratigraphic series. This work has far-reaching practical signiﬁcance for the accurate description of the spatial distribution of lithologic features and guidance of site selection, engineering construction, and environmental assessment. 2. Geostratigraphic Series Simulation Method Based on Machine Learning 2.1. Geostratigraphic Series A sequence refers to a series of data of a system at a speciﬁc sampling interval. In reality, sequences are a very common form of data. For example, strata have a certain thickness, and a certain stratum may be distributed throughout the whole ﬁeld or only locally (namely, the stratum division). Stratum information can be interpreted as a sequence. Therefore, the strata can be regarded as a vertically oriented spatial sequence, as shown in Figure 1. The simulation of a geostratigraphic series is based on the learning results of borehole data to predict the geostratigraphic series at any point in the study area, including the stratum type and thickness of each layer in the sequence. Appl. Sci. 2019, 9, 3553 3 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 32 Figure 1. Geostratigraphic series diagram. Figure 1. Geostratigraphic series diagram. 2.2. Stratum Data Reconstruction Schemes Based on Machine Learning 2.2. Stratum Data Reconstruction Schemes Based on Machine Learning Drilling data reconstruction schemes based on machine learning include data normalization, data Drilling data reconstruction schemes based on machine learning include data normalization, segmentation, data ﬁlling, and data coding. data segmentation, data filling, and data coding. 2.2.1. Stratum Data Normalization 2.2.1. Stratum Data Normalization Data normalization refers to the process of compressing data into a small interval, and the interval Data no is usually rmalization taken re as fers [0,1]to or the proce [1,1]. Data ss of co normalization mpressingis da essentially ta into a sma a linear ll intransformation. terval, and the interval is usually taken as [0, 1] or [−1, 1]. Data normalization is essentially a linear transformation. Data normalization does not change the variation suppress and sequence of the data. There are Dat many a norm common alizatmeans ion does of ndata ot chang normalization, e the variation such sup as press linear and normalization, sequence of the and datinverse a. Therecotangent are many common means of data normalization, such as linear normalization, and inverse cotangent normalization. In this study, the most common method of linear normalization is adopted. For any norma data point, lization. In t the program his study, t determines he most the common met spatial coordinates hod of li and near norma the maximu lizat m ion i and s adopt minimum ed. For a values ny data point, the program determines the spatial coordinates and the maximum and minimum values (X and X , respectively) of the stratum thickness after traversing all the borehole data. The above max min (X linear max and normalization Xmin, respect is iv applied ely) of the by s using tratum Equation thicknes (1): s after traversing all the borehole data. The above linear normalization is applied by using Equation (1): X = (X X )/(X X ) (1) min max min X = (X − Xmin) / (Xmax − Xmin) (1) where X is the result of normalization. where X is the result of normalization. 2.2.2. Drilling Data Segmentation and Equalization 2.2.2. Drilling Data Segmentation and Equalization Machine learning is used to ensure that the designed model achieves good prediction results Machine learning is used to ensure that the designed model achieves good prediction results in in both the training set and the test set. Therefore, before machine learning, the original drilling data both the training set and the test set. Therefore, before machine learning, the original drilling data must be divided into training data and test data. This process is called data segmentation. To ensure must be divided into training data and test data. This process is called data segmentation. To ensure the eectiveness of machine learning, randomness and uniformity of the data distribution should be the effectiveness of machine learning, randomness and uniformity of the data distribution should be ensured during sampling of the training data and test data. ensured during sampling of the training data and test data. To ensure the eectiveness of the training data, we adopt a random replication strategy for small To ensure the effectiveness of the training data, we adopt a random replication strategy for small samples. We randomly select data from boreholes with dierent numbers of geological layers to samples. We randomly select data from boreholes with different numbers of geological layers to improve the replication eect. This method is used to comprehensively study data with dierent improve the replication effect. This method is used to comprehensively study data with different characteristics, improve the prediction ability of a model for dierent numbers of geological layers, characteristics, improve the prediction ability of a model for different numbers of geological layers, increase the number of dierent layers represented by nearby drilling data, and artiﬁcially upgrade the increase the number of different layers represented by nearby drilling data, and artificially upgrade training sample data at the equilibrium level. This approach of artiﬁcially replicating small data types the training sample data at the equilibrium level. This approach of artificially replicating small data is known as over sampling [36]. types is known as over sampling [36]. 2.2.3. Geostratigraphic Series Filling 2.2.3. Geostratigraphic Series Filling When a recurrent neural network (RNN) is used to process sequential problems, input data are When a recurrent neural network (RNN) is used to process sequential problems, input data are received at every moment, and output is produced after the hidden layer has ﬁnished processing the received at every moment, and output is produced after the hidden layer has finished processing the data. Therefore, the input and output of an RNN are equal in length, and it is dicult to process data. Therefore, the input and output of an RNN are equal in length, and it is difficult to process input data of different lengths at the same time. In drilling data, the number of layers in each borehole Appl. Sci. 2019, 9, 3553 4 of 29 input data of dierent lengths at the same time. In drilling data, the number of layers in each borehole varies, and the geostratigraphic series is nonuniform. Therefore, the use of an RNN for batch training using stratum data requires ﬁlling at the tail of the geostratigraphic series without changing the original sequence of the geostratigraphic series and extending all geostratigraphic series to the same length [37]. Before training, in addition to adding a start of sequence (SOS) to the geostratigraphic series, an end of sequence (EOS) must be added to the geostratigraphic series. For each training set, the sampling process stops when the termination marker appears in the equal length geostratigraphic series output of the RNN. As two virtual stratum types, the initiation and termination markers participate in the RNN training process via the input and output. The initiation markers represent the beginning of geostratigraphic series prediction, while the termination markers represent the end of the series prediction. The introduction of termination markers teaches the RNN model to predict when a sequence will end and overcomes the shortcomings of processing unequally long sequences by the RNN. In addition, the RNN model can conduct geostratigraphic series simulations with dierent numbers of layers at any location in the research area. 2.2.4. Stratum Coding Based on One-Hot Encoding In machine learning tasks, data characteristics are not always continuous values, such as coordinates. One-hot encoding is a data processing method used to address discrete features. In geology, stratum types are finite and countable, regardless of the criteria used to divide the strata. Therefore, the set of geostratigraphic series elements is determined after crossing all the borehole data, in addition to obtaining the maximum value of each feature and the number of layers. To facilitate the search and mathematical representation, in this study, each stratum is represented by a unique digital identification [38]. 2.3. Geostratigraphic Series Simulation Based on a Recurrent Neural Network 2.3.1. Establishment of the Sequence Model of the Stratum Type The model in geostratigraphic series prediction uses the RNN as the core of the neural network. The structure is shown in Figure 2. In the machine learning tasks, the input data are coordinated in a stratum, while the output result is the simulation result of the stratum type model corresponding to the given coordinates. Since the RNN does not have a state hidden from the previous moment at the current moment, it is necessary to assign the initial state of the hidden layer neurons in the RNN before each training run. The input coordinates are the common attributes of all the strata in a geostratigraphic series, and it guides the whole process of RNN simulation of the geostratigraphic series. Therefore, the assignment process establishes the correlation between the input coordinates and RNN, guiding the geostratigraphic series simulation from the beginning. The content of the assignment is determined by the input information. After the input layer receives the coordinates of the borehole and the basic elevation information, the coordinate input information is increased from the original three dimensions to the number of dimensions equal to the number of neurons. It serves as the initial state of the hidden layer neurons in the RNN. Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 32 Appl. Sci. 2019, 9, 3553 5 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 32 Figure 2. Schematic diagram of the stratum type model. Figure 2. Schematic diagram of the stratum type model. Figure 2. Schematic diagram of the stratum type model. At each moment, the RNN receives input of the neuron state and stratum information from the At each moment, the RNN receives input of the neuron state and stratum information from the At each moment, the RNN receives input of the neuron state and stratum information from the previous moment, and outputs the judgement of the stratum type through hidden layer calculations. previous moment, and outputs the judgement of the stratum type through hidden layer calculations. previous moment, and outputs the judgement of the stratum type through hidden layer calculations. By introducing an n-dimensional correct value vector, each item in the weight vector represents the By introducing an n-dimensional correct value vector, each item in the weight vector represents the By introducing an n-dimensional correct value vector, each item in the weight vector represents the possibility of a certain stratum. The larger the value is, the higher the probability of a certain stratum. possibility of a certain stratum. The larger the value is, the higher the probability of a certain stratum. possibility of a certain stratum. The larger the value is, the higher the probability of a certain stratum. Thus, the most likely stratum is the predicted value at that moment. Repeating the above process and Thus, the most likely stratum is the predicted value at that moment. Repeating the above process and Thus, the most likely stratum is the predicted value at that moment. Repeating the above process and removing the termination marker in the output, we can obtain the model’s simulation results for the removing the termination marker in the output, we can obtain the model’s simulation results for the removing the termination marker in the output, we can obtain the model’s simulation results for the input coordinate information of the geostratigraphic series. input coordinate information of the geostratigraphic series. input coordinate information of the geostratigraphic series. 2.3.2. Establishment of the Series Model of the Stratum Thickness 2.3.2. Establishment of the Series Model of the Stratum Thickness 2.3.2. Establishment of the Series Model of the Stratum Thickness Sequence Sequence-to-sequence -to-sequence (o (or r seq2se seq2seq) q) le learning arning h has as been beenwidely used widely usedin the proce in the processing ssing of m of machine achine Sequence-to-sequence (or seq2seq) learning has been widely used in the processing of machine tran translation slation an and d speech speech reco recognition, gnition, a also lsoknown known a assthe theencoder encoder -de -dec coder oder ne network. tworkIt . It map maps sequences s sequences as translation and speech recognition, also known as the encoder-decoder network. It maps sequences as input inpu to t to o output utpu sequences t sequences thr throu oughgdeep h deep neural neura networks. l networksThe . The seq2seq seq2seq m model odeis l is shown shown inin Figur Figu ere 3 . as input to output sequences through deep neural networks. The seq2seq model is shown in Figure 3. This This pr proce ocess includes ss include two s tw steps, o steps, input inp encoding ut encod and ingoutput and output decoding decoding and these and these two links two lin are handled ks are 3. This process includes two steps, input encoding and output decoding and these two links are handled by by the encoder the encoder and decoder and decoder , respectively , respectively. . The encoderThe encoder is responsibleis re for conv sponsib erting le for conver a variable-length ting a handled by the encoder and decoder, respectively. The encoder is responsible for converting a vari input able series -lenginto th inp a ﬁxed-length ut series into vector a fixed . This -lengﬁxed-length th vector. This vector fixed-length vec contains information tor contains about info the rmation input variable-length input series into a fixed-length vector. This fixed-length vector contains information ab series. out th The e in encoder put seisrie responsible s. The encoder for decoding is resp this onsﬁxed-length ible for decod vector ingand this f generating ixed-leng ath v variable-length ector and about the input series. The encoder is responsible for decoding this fixed-length vector and generating a variable-len output series according to gth the output information series accor content ding to the in the vector form repration con esents. tent the vector represents. generating a variable-length output series according to the information content the vector represents. Figure 3. The sequence-to-sequence (seq2seq) model. Figure 3. The sequence-to-sequence (seq2seq) model. Figure 3. The sequence-to-sequence (seq2seq) model. In contrast to the traditional RNN, the seq2seq architecture does not require input and generates In contrast to the traditional RNN, the seq2seq architecture does not require input and generates In contrast to the traditional RNN, the seq2seq architecture does not require input and generates output at every moment. Instead, the algorithm converts the input series of the stratum types into output at every moment. Instead, the algorithm converts the input series of the stratum types into a output at every moment. Instead, the algorithm converts the input series of the stratum types into a vector with the help of the encoder, and then outputs the results through the decoder. In other words, vector with the help of the encoder, and then outputs the results through the decoder. In other words, Appl. Sci. 2019, 9, 3553 6 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 32 a vector with the help of the encoder, and then outputs the results through the decoder. In other words, seq2seq carries more information when making predictions than the traditional RNN and infers the seq2seq carries more information when making predictions than the traditional RNN and infers the seq2seq carries more information when making predictions than the traditional RNN and infers the output content based on the input series as a whole. output content based on the input series as a whole. output content based on the input series as a whole. In this study, two RNNs are used as the encoder and decoder which are connected to each other. In this study, two RNNs are used as the encoder and decoder which are connected to each other. In this study, two RNNs are used as the encoder and decoder which are connected to each other. Seq2seq is now widely used to process machine translation and speech recognition problems, thus, Seq2seq is now widely used to process machine translation and speech recognition problems, thus, Seq2seq is now widely used to process machine translation and speech recognition problems, thus, we we apply it to the layer thickness recognition problem, that is to say, given the geostratigraphic series we apply it to the layer thickness recognition problem, that is to say, given the geostratigraphic series apply it to the layer thickness recognition problem, that is to say, given the geostratigraphic series x x = [x1, x2, x3, …,xn], an equal-length thickness sequence d = [d1, d2, d3, …,dn] is generated. N is the x = [x1, x2, x3, …,xn], an equal-length thickness sequence d = [d1, d2, d3, …,dn] is generated. N is the = [x1, x2, x3, : : : ,xn], an equal-length thickness sequence d = [d1, d2, d3, : : : ,dn] is generated. N is length of the sequence (i.e., the total number of strata at that point). The encoder receives the type length of the sequence (i.e., the total number of strata at that point). The encoder receives the type the length of the sequence (i.e., the total number of strata at that point). The encoder receives the information of the current stratum at each moment, n times in total. After the input has been information of the current stratum at each moment, n times in total. After the input has been type information of the current stratum at each moment, n times in total. After the input has been completely received, the hidden state, at the last moment of the encoder, is taken as the initial state completely received, the hidden state, at the last moment of the encoder, is taken as the initial state completely received, the hidden state, at the last moment of the encoder, is taken as the initial state to guide the decoder. Then, the decoder outputs the thickness of each layer step-by-step. The above to guide the decoder. Then, the decoder outputs the thickness of each layer step-by-step. The above to guide the decoder. Then, the decoder outputs the thickness of each layer step-by-step. The above process and model structure are shown in Figure 4. process and model structure are shown in Figure 4. process and model structure are shown in Figure 4. Figure 4. Structure diagram of the two-layer prediction network. Figure 4. Structure diagram of the two-layer prediction network. Figure 4. Structure diagram of the two-layer prediction network. 2. 2.3.3. 3.3. Es Establishment tablishment oof f the G the Geostratigraphic eostratigraphic SSeries eries Mo Modeling deling 2.3.3. Establishment of the Geostratigraphic Series Modeling The st The stratum ratum thi thickness ckness model model uses rea uses real l stra stratum tum type type datdata a in the in tr the aining training process process. . In prIn actic practice, e, the The stratum thickness model uses real stratum type data in the training process. In practice, the rea the l srteal ratum stratum type itype s unkno is unknown, wn, and the and outp the utoutput sequensequence ce of the str ofat the umstratum type motype del shou model ld bshould e used abe s real stratum type is unknown, and the output sequence of the stratum type model should be used as the j used udg asem the en judgement t basis. The basis. outpu The t of output the straof tum the typ stratum e model type is conn model ected is connected with the encode with the r ofencoder the layeof r the judgement basis. The output of the stratum type model is connected with the encoder of the layer thickn the layer ess mo thickness del. Wemodel. can obta Win e can a compl obtain ete a geo complete stratigraphic geostratigraphic series model. series The model. simula The tion simulation sequence thickness model. We can obtain a complete geostratigraphic series model. The simulation sequence of sequence the layer t ofh the icknes layer s is sh thickness own in Fig is shown ures 5 and 6. in Figur es 5 and 6. of the layer thickness is shown in Figures 5 and 6. Figure 5. Diagram of the neural network structure for stratum prediction. Figure 5. Diagram of the neural network structure for stratum prediction. Figure 5. Diagram of the neural network structure for stratum prediction. Appl. Sci. 2019, 9, 3553 7 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 32 Figure 6. Figure 6. Sim Simulation ulation proces process s o of f the stratu the stratum m t thickness hickness sequence. sequence. 2.4. Evaluation Method of Stratum Type Series Simulation 2.4. Evaluation Method of Stratum Type Series Simulation The stratum accuracy, the series edit distance, and the geostratigraphic series similarity based The stratum accuracy, the series edit distance, and the geostratigraphic series similarity based on the edit distance are used to evaluate the simulation performance of the series models of the on the edit distance are used to evaluate the simulation performance of the series models of the stratum type. stratum type. The stratum accuracy is the simplest evaluation index. By comparing elements at corresponding The stratum accuracy is the simplest evaluation index. By comparing elements at corresponding positions of the simulated sequence and the real geostratigraphic series, the proportion of the same positions of the simulated sequence and the real geostratigraphic series, the proportion of the same number of strata in the total number of strata was calculated by Equation (2): number of strata in the total number of strata was calculated by Equation (2): Correct stratum number (2)  The edit distance is a standard that is used to measure the similarity of series. The edit distance <math> represents the minimum number of edit operations required for one series to be converted into another series after insertion, deletion, and replacement. The smaller the edit distance between the two series, <semantics> the more similar the two series are. Since the length of the series for edit distance alignment is dierent, <mrow> the longer series has a notably higher similarity when editing two series with the same distance. To better describe the closeness of series, the following Equation (3) is used in the calculation of the <mfrac> similarity of series: D(S, T) <mrow> L(S, T) = 1 (3) max(jSj, jTj) (2) where D(S, T) represents the edit distance between series S and T. <mtext> Correct stratum number</mtext></mrow> There is no exact equation for calculating D(S, T). Its calculation examples are as follows: <mrow> Suppose there are two geostratigraphic series, t1 = [silt, ﬁne sand, silt, clay, silt, clay] and t2 = [miscellaneous ﬁll, sand, ﬁne sand, silt, clay]. In order to convert t1 to t2, the implementation process of the minimum operation times is as follows: <mtext>Total formation number of test&#x00 1. Replace the ﬁrst “silt” with “sand”; A0;data</mtext></mrow> 2. Insert “miscellaneous ﬁll” at the beginning of t1; </mfrac> </mrow> <annotation encoding='MathType-MTEF'>MathType@MTEF@5@5@+= Appl. Sci. 2019, 9, 3553 8 of 29 3. Remove the last “clay”; Appl. Sci. 2019, 9, x FOR PEER REVIEW 10 of 32 4. Delete the ﬁnal “silt”. Although the transition from one series to another through several insertions, deletions, and Throughout the above four steps to replace, delete, and insert operations, the geostratigraphic substitutions has many possibilities, the editing distance D (S, T) between the two series is always series t1 changed to series t2. Thus, the two geostratigraphic series of edit distance D(S, T) is 4. unique. Although the transition from one series to another through several insertions, deletions, and substitutions has many possibilities, the editing distance D(S, T) between the two series is always unique. 3. Results and Discussions 3. Results and Discussions 3.1. Study of the Regional Geology and Data Reconstruction Schemes 3.1. Study of the Regional Geology and Data Reconstruction Schemes The research area is located in a city in eastern China with a plain topography. The soil in the The research area is located in a city in eastern China with a plain topography. The soil in the study area is mainly composed of sandy soil, cohesive soil, and silty soil. The local strata are silt and study area is mainly composed of sandy soil, cohesive soil, and silty soil. The local strata are silt and silty soil. The research data come from the city’s geological survey work. There is a total of 1386 silty soil. The research data come from the city’s geological survey work. There is a total of 1386 borehole borehole data datasets, sets, and and all all the boreho the boreholes les termin terminate ate on on the the be bedr drock ock surface. surface. A A total totof al o 13f stratum 13 stratu types m types were determined. These boreholes are nonuniformly distributed in an area of 3882 square kilometers, were determined. These boreholes are nonuniformly distributed in an area of 3882 square kilometers, as shown in Figure 7. as shown in Figure 7. Figure 7. Figure 7. Distribution of the b Distribution of the bor oreholes eholes in in the the study study area. area. Using the reconstruction scheme of the stratum data proposed in this study, the drilling data are Using the reconstruction scheme of the stratum data proposed in this study, the drilling data are reconstructed. The speciﬁc operation process is as follows: reconstructed. The specific operation process is as follows: 1. Data normalization: In this study, the borehole data are used and the x coordinates, y coordinates, 1. Data normalization: In this study, the borehole data are used and the x coordinates, y hole elevation, and stratum thickness are continuous values. After reviewing all the borehole data, coordinates, hole elevation, and stratum thickness are continuous values. After reviewing all the it is found that the coordinates of the borehole data used the Xi’an 80 coordinate system, and their borehole data, it is found that the coordinates of the borehole data used the Xi’an 80 coordinate value reaches the millions, while the elevation of the oriﬁce and the thickness of the strata are only system, and their value reaches the millions, while the elevation of the orifice and the thickness of the within 100 m. The dierence between each characteristic is large and can be up to tens of thousands. strata are only within 100 m. The difference between each characteristic is large and can be up to tens To ensuring the same dimension, the above borehole data characteristics are compressed into the of th interval ousands. of To ensu [0,1] by linear ring normalization the same pr diocessing. mension, the above borehole data characteristics are 2. Drilling data segmentation and equalization: In this study, the training data and test data are compressed into the interval of [0,1] by linear normalization processing. selected randomly according to the ratio of 4:1 among all drilling points, and the data are balanced 2. Drilling data segmentation and equalization: In this study, the training data and test data are according to the number of layers. The spatial positions of the training data and test data are shown selected randomly according to the ratio of 4:1 among all drilling points, and the data are balanced in Figure 8. according to the number of layers. The spatial positions of the training data and test data are shown in Figure 8. Appl. Sci. 2019, 9, 3553 9 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 32 Figure 8. Schematic diagram of the training data and test data distributions. Figure 8. Schematic diagram of the training data and test data distributions. Figur Figure 8 sho e 8 shows ws th the e loca location tion dis distributions tributions ofof the tra the training ining datdata a and and testtest data data in the in st the udy study area af arter ea after the original drilling data are segmented into training data and test data, where the red symbols the original drilling data are segmented into training data and test data, where the red symbols r rep eprresent esent th theetraining training d data, ataand , and the the gr green een sym symbols bol repr s rep esent resen thet th test e tes data. t dThe ata. T positions, he positions plotting , plot scale, ting and geographic coordinates in Figure 8 are the same as in Figure 7. scale, and geographic coordinates in Figure 8 are the same as in Figure 7. 3. 3. Stratum Stratum co coding: ding: Accor Accord ding ing to to the the statistics, statistithe cs, the b borehole orehole s stratum tradata tum dat used, a in used this , in th studyis s , contain tudy, a total of 13 types of strata and 15 types of initiation and termination markers artiﬁcially introduced contain a total of 13 types of strata and 15 types of initiation and termination markers artificially in intro the duced subsequent in the sub geostratigraphic sequent geostr series. atigrap The hicnumbers series. The zer n ou to mbers 14 wer zero e assigned, to 14 we and re assig vectorization ned, and was vectorization carried out was c by one-hot arried o encoding. ut by one-ho Thet encoding number and . Th coding e number vectors and coding ve of the stratu ctors o m types f the stratum are shown in types a Table re shown in T 1. able 1. Table 1. Strata numbers and one-hot vectors. Table 1. Strata numbers and one-hot vectors. Stratum Types Number Coding Vector Stratum Types Number Coding Vector clay clay 0 0 (1(1, , 0, 0, 0, 0, 00, , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 00, , 0, 0, 0, 0, 0 0) , 0) silt 1 (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silt 1 (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) plain ﬁll 2 (0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) plain fill 2 (0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) miscellaneous ﬁll 3 (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) miscellaneous fill 3 (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silty sand 4 (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silty sand 4 (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silty clay 5 (0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) mucky soil 6 (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0) silty clay 5 (0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) mucky clay 7 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0) mucky soil 6 (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0) old city ﬁll 8 (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) mucky clay 7 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0) clay sand inclusion 9 (0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) old city fill 8 (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) mud 10 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0) clay sand inclusion 9 (0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) medium sand 11 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) intermediate ﬁne sand 12 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) mud 10 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0) start mark 13 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0) medium sand 11 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) end mark 14 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) intermediate fine 12 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) sand 4. Geostratigraphic series ﬁlling: According to the statistics, the maximum number of strata start mark 13 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0) in the study data is 10. Therefore, the ﬁlling length of the geostratigraphic series should be larger than end mark 14 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) 10 layers. For simplicity, the termination marker is used here to ﬁll all geostratigraphic series to the 11th layer 4. Ge.oSuppose stratigrap that hic ser all istratum es fillingtypes : Acco of rda ing bor to ehole the st ar aeticlay stics, , silt, the m silt axim sand, um clay num , and ber of mucky strata clay in, and the corresponding number vector is expressed as (0,1,4,0,7). The termination marker denoted by the study data is 10. Therefore, the filling length of the geostratigraphic series should be larger than 10 layers. For simplicity, the termination marker is used here to fill all geostratigraphic series to the 11th layer. Suppose that all stratum types of a borehole are clay, silt, silt sand, clay, and mucky clay, Appl. Sci. 2019, 9, 3553 10 of 29 the number 14 is repeatedly added at the end of the vector until the length of the numbered vector is 11. Finally, the geostratigraphic series data input of the machine learning model is obtained by replacing each item in the numbered vector with the corresponding one-hot encoding vector. 3.2. Machine Learning Simulation Result Analysis We have implemented the proposed algorithms written by Python software in the computer. Part of the algorithm code is as follows: 1. class CrossLoss(nn.Module): 2. def __init__(self,ignore_index = 0): 3. super(CrossLoss, self).__init__() 4. self.ignore_index = ignore_index 5. self.criterion = nn.CrossEntropyLoss(ignore_index = 0) 6. def forward(self, input, target): 7. ind = (target ! = self.ignore_index).ﬂoat() 8. num_all = torch.sum(ind).data[0] 9. #print(target) 10. size0 = target.size(0) 11. size1 = target.size(1) 12. temp = target.cpu().data 13. for i in range(size0): 14. for j in range(size1): 15. temp[i,j] = depthLabel(temp[i,j]) 16. pred = torch.mul(input,ind).long() 17. temp = temp.long() 18. loss = self.criterion(pred, temp) 19. return loss, num_all As the procedure may be further commercialized, it is not suitable to make it all public for the time being. Information about the algorithm’s computer performance is as follows: CPU: Intel Core i7-4790k @ 4.00GHz quad-core; Memory: 32 GB; VGA card: Nvidia GeForce GTX 770(2GB). 3.2.1. Training and Veriﬁcation of the Stratum Type Series Model (1) Model Training The cross-entropy loss function is used to describe the performance of the model in the training process. Figure 9 shows that as the number of training rounds increases, the loss value decreases continuously. However, the gradient of the loss curve begins to decrease after several cycles, and the amplitude of change gradually decreases. The ﬁnal loss value ﬂuctuates in a small range and tends to be stable. Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 32 and the corresponding number vector is expressed as (0,1,4,0,7). The termination marker denoted by the number 14 is repeatedly added at the end of the vector until the length of the numbered vector is 11. Finally, the geostratigraphic series data input of the machine learning model is obtained by replacing each item in the numbered vector with the corresponding one-hot encoding vector. 3.2. Machine Learning Simulation Result Analysis We have implemented the proposed algorithms written by Python software in the computer. Part of the algorithm code is as follows: 1 class CrossLoss(nn.Module): 2 def __init__(self,ignore_index = 0): 3 super(CrossLoss, self).__init__() 4 self.ignore_index = ignore_index 5 self.criterion = nn.CrossEntropyLoss(ignore_index = 0) 6 def forward(self, input, target): 7 ind = (target ! = self.ignore_index).float() 8 num_all = torch.sum(ind).data[0] 9 #print(target) 10 size0 = target.size(0) 11 size1 = target.size(1) 12 temp = target.cpu().data 13 for i in range(size0): 14 for j in range(size1): 15 temp[i,j] = depthLabel(temp[i,j]) 16 pred = torch.mul(input,ind).long() 17 temp = temp.long() 18 loss = self.criterion(pred, temp) 19 return loss, num_all As the procedure may be further commercialized, it is not suitable to make it all public for the time being. Information about the algorithm’s computer performance is as follows: • CPU: Intel Core i7-4790k @ 4.00GHz quad-core; • Memory: 32 GB; • VGA card: Nvidia GeForce GTX 770(2GB). 3.2.1. Training and Verification of the Stratum Type Series Model (1) Model Training The cross-entropy loss function is used to describe the performance of the model in the training process. Figure 9 shows that as the number of training rounds increases, the loss value decreases continuously. However, the gradient of the loss curve begins to decrease after several cycles, and the amplitude of change gradually decreases. The final loss value fluctuates in a small range and tends Appl. Sci. 2019, 9, 3553 11 of 29 to be stable. Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 32 Figure 9. Loss curve of the ﬁrst 50 training rounds. Figure 9. Loss curve of the first 50 training rounds. The model has completed most of its loss reduction after 50 training rounds, as shown in Figure 10. The model has completed most of its loss reduction after 50 training rounds, as shown in Figure After 50 rounds, the loss function tends to be stable, and the model is slowly learning from the training 10. After 50 rounds, the loss function tends to be stable, and the model is slowly learning from the data. The speciﬁc decline in the loss function is listed in Table 2. training data. The specific decline in the loss function is listed in Table 2. Figure 10. Figure 10. Lo Loss ss curve after curve after 500 500 training rounds. training rounds. Table 2. Statistical table of the loss decline. Table 2. Statistical table of the loss decline. Round Number 50 500 Round Number 50 500 Loss value 0.483226 0.374167 Loss value 0.483226 0.374167 Cumulative decline 0.327009 0.436068 Cumulative decline 0.327009 0.436068 Cumulative decline 40.36% 53.82% Cumulative decline 40.36% 53.82% (2) Model Test (2) Model Test The trained and ﬁnally stable model was tested, and the coordinate information of the test borehole The trained and finally stable model was tested, and the coordinate information of the test data was inputted successively. The position of the termination marker in the simulated stratum borehole data was inputted successively. The position of the termination marker in the simulated type sequence output by the model was searched and intercepted. All the elements before the ﬁrst stratum type sequence output by the model was searched and intercepted. All the elements before termination marker were taken as the stratum prediction series. By comparing the predicted value with the first termination marker were taken as the stratum prediction series. By comparing the predicted the real value one-to-one, the single-layer accuracy of the geostratigraphic series is tested. Then the value with the real value one-to-one, the single-layer accuracy of the geostratigraphic series is tested. similarity between the prediction sequence and the real geostratigraphic series is evaluated by using Then the similarity between the prediction sequence and the real geostratigraphic series is evaluated the edit distance algorithm. by using the edit distance algorithm. The accuracy of stratum type simulation varies with the training round, as shown in Figure 11. The accuracy of stratum type simulation varies with the training round, as shown in Figure 11. Figure 11 shows that as the number of training rounds increases, the overall prediction ability of Figure 11 shows that as the number of training rounds increases, the overall prediction ability of the the model continues to improve, and the accuracy of the stratum type and geostratigraphic series model continues to improve, and the accuracy of the stratum type and geostratigraphic series prediction is rapidly improved. The accuracy of the ﬁnal stratum type prediction was stable at 59.86%. prediction is rapidly improved. The accuracy of the final stratum type prediction was stable at 59.86%. As the loss function curve changes, the accuracy curve increases gradually. The accuracy achieved As the loss function curve changes, the accuracy curve increases gradually. The accuracy achieved in in the ﬁrst 50 rounds is almost the same as the ﬁnal accuracy. the first 50 rounds is almost the same as the final accuracy. The prediction of a single stratum is the ﬁrst step in establishing a spatial stratum distribution The prediction of a single stratum is the first step in establishing a spatial stratum distribution model. In addition to the accurate prediction of a single stratum, it is of greater concern whether model. In addition to the accurate prediction of a single stratum, it is of greater concern whether the model can make an accurate overall prediction of the geostratigraphic series in the study area. Then, the edit distance algorithm is used to evaluate the similarity between the simulated sequence and the real geostratigraphic series. If the edit distance between the prediction sequence and the real geostratigraphic series is larger than one, the prediction failed and will not be considered. The edit distance changes are shown in Figure 12. Appl. Sci. 2019, 9, 3553 12 of 29 the model can make an accurate overall prediction of the geostratigraphic series in the study area. Then, the edit distance algorithm is used to evaluate the similarity between the simulated sequence and Appl. the Sci. real 2019 geostratigraphic , 9, x FOR PEER REVIEW series. If the edit distance between the prediction sequence and the14 of real 32 geostratigraphic series is larger than one, the prediction failed and will not be considered. The edit Appl. Sci. 2019, 9, x FOR PEER REVIEW 14 of 32 distance changes are shown in Figure 12. Figure 11. Variation diagram of the simulation accuracy of the RNN model. Figure 11. Variation diagram of the simulation accuracy of the RNN model. Figure 11. Variation diagram of the simulation accuracy of the RNN model. Figure 12. Variation curve of the geostratigraphic series edit distance with time. Figure 12. Variation curve of the geostratigraphic series edit distance with time. In Figure 12, the lower curve indicates that the edit distance is zero, i.e., the proportion of the Figure 12. Variation curve of the geostratigraphic series edit distance with time. In Figure 12, the lower curve indicates that the edit distance is zero, i.e., the proportion of the number of boreholes in the predicted result in the test set is exactly equal to the real result. The above number of boreholes in the predicted result in the test set is exactly equal to the real result. The above curve indicates the proportion of the number of boreholes within an edit distance of one, i.e., the model In Figure 12, the lower curve indicates that the edit distance is zero, i.e., the proportion of the curve indicates the proportion of the number of boreholes within an edit distance of one, i.e., the makes no more than one wrong prediction in the whole sequence prediction process. The predicted number of boreholes in the predicted result in the test set is exactly equal to the real result. The above model makes no more than one wrong prediction in the whole sequence prediction process. The sequence can be converted into a real stratum sequence by a single insertion, replacement, or deletion curve indicates the proportion of the number of boreholes within an edit distance of one, i.e., the predicted sequence can be converted into a real stratum sequence by a single insertion, replacement, operation. In the end, the former curve converges to 35.2%, while the latter curve converges to 74%. model makes no more than one wrong prediction in the whole sequence prediction process. The or deletion operation. In the end, the former curve converges to 35.2%, while the latter curve Because the number of layers is dierent, it is dicult to accurately describe the similarity between predicted sequence can be converted into a real stratum sequence by a single insertion, replacement, converges to 74%. the predicted series and the real result by applying the edit distance alone. Therefore, the similarity or deletion operation. In the end, the former curve converges to 35.2%, while the latter curve Because the number of layers is different, it is difficult to accurately describe the similarity calculation equation based on the edit distance is adopted. The variation curve of the predicted series converges to 74%. between the predicted series and the real result by applying the edit distance alone. Therefore, the similarity with the number of training rounds is shown in Figure 13. Because the number of layers is different, it is difficult to accurately describe the similarity similarity calculation equation based on the edit distance is adopted. The variation curve of the between the predicted series and the real result by applying the edit distance alone. Therefore, the predicted series similarity with the number of training rounds is shown in Figure 13. similarity calculation equation based on the edit distance is adopted. The variation curve of the predicted series similarity with the number of training rounds is shown in Figure 13. Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 32 Appl. Sci. 2019, 9, 3553 13 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 32 Figure 13. Similarity curve of the geostratigraphic series. In Figure 13, with an increase in training rounds, the overall prediction ability of the model is Figure 13. Similarity curve of the geostratigraphic series. Figure 13. Similarity curve of the geostratigraphic series. continuously improved, and the average similarity curve between the predicted series and the actual geostratigraphic series also gradually increases and finally converges to 70.9%. This result shows that In In F Figur igure e 1 13 3,, with with an an i incr ncrease ease in in t training raining r rounds, ounds, the the overall prediction overall prediction ability ability of of t the he model model is is model accuracy continuously improves with increasing training rounds in the learning process and continuously improved, and the average similarity curve between the predicted series and the actual continuously improved, and the average similarity curve between the predicted series and the actual gradually establishes the correlation between the elevation information and the geostratigraphic geostratigraphic geostratigraphic ser series ies also also gradually gradually inc incr reases eases and and ﬁnally finally converg converges es to to 70. 70.9%. 9%. Th This is res result ult show shows s that that series in the study area. model accuracy continuously improves with increasing training rounds in the learning process and model accuracy continuously improves with increasing training rounds in the learning process and (3) Testing the Effect of Expert-Driven Learning gradually gradually establishes establishesthe the corr correl elation ation between between the the e elevation levatinformation ion informat and ion the and geostratigraphic the geostratigrap series hic To improve the learning performance of the RNN and test the effect of expert-driven learning, in the study area. series in the study area. this study conducted the training and testing of the expert-driven model based on supervisory (3) (3) Tes Testing ting the the Ef Efec ecttof ofExpert-Driven Expert-Driven Learning Learning learning in accordance with four ratios using the same dataset. The four expert ratios are 1/3, 1/2, 2/3 To improve the learning performance of the RNN and test the eect of expert-driven learning, this To improve the learning performance of the RNN and test the effect of expert-driven learning, and 1, i.e., expert-driven learning is carried out once every three rounds, once every two rounds, and study this stconducted udy conduct the ed training the traan inin dg testing and tes oftthe ing o expert-driven f the expert-dr model iven m based odel on bsupervisory ased on suplearning ervisory twice every three rounds, and the entire training process is conducted in the form of expert-driven in leaaccor rning dance in accordance with four wi ratios th fo using ur ratio the s usin same g dataset. the same The datfour aset.expert The foratios ur expar ert e r 1a /3, tios 1/2, ar2 e /3 1/and 3, 1/1, 2, i.e., 2/3 learning. expert-driven and 1, i.e., explearning ert-driveis n le carried arning out is car once rieevery d out thr once e ee rv ounds, ery three once roevery unds, two once r ounds, every two roun and twice ds, and every Figures 14–17 show the loss function curves of expert-driven learning using different factors. twice every three rounds, and the entire training process is conducted in the form of expert-driven three rounds, and the entire training process is conducted in the form of expert-driven learning. Since the model is based on the prediction results of both expert-driven learning and non-expert- learning. Figures 14–17 show the loss function curves of expert-driven learning using dierent factors. driven learning, the loss function is banded in the first three figures. The model obtained a higher Figures 14–17 show the loss function curves of expert-driven learning using different factors. Since the model is based on the prediction results of both expert-driven learning and non-expert-driven descent gradient under the guidance of correct monitoring signals as compared with the ordinary learning, Since the mo the loss del is function based on the is banded pred in icthe tionﬁrst resu thr lts o eefﬁgur bothes. expert-driven The model obtained learning aand non-expert- higher descent RNN model. The larger the proportion of expert-driven learning in the learning process is, the higher driven learning, the loss function is banded in the first three figures. The model obtained a higher gradient under the guidance of correct monitoring signals as compared with the ordinary RNN model. the rate of loss reduction. When expert-driven learning is completely adopted, the model loss The descen lart gr ger ad the iepr nt un oportion der the of gui expert-driven dance of correct m learning on in ito the ring s learning ignals pr as com ocessp is, ared the with higher the or the rate dinary of function curve decreases the fastest. Almost all of the gradient descent is completed within the first RNN model. The larger the proportion of expert-driven learning in the learning process is, the higher loss reduction. When expert-driven learning is completely adopted, the model loss function curve 50 training rounds. decr the ra eases te o the f lo fastest. ss redu Almost ction. Wh all of en the expert- gradient driven descent learnin is completed g is complete within ly a the do ﬁrst pted, 50 training the model rounds. loss function curve decreases the fastest. Almost all of the gradient descent is completed within the first 50 training rounds. Figure 14. Expert-driven learning with a factor of 1/3. Figure 14. Expert-driven learning with a factor of 1/3. Figure 14. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Appl. Sci. 2019, 9, 3553 14 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of . Figure 16. Expert-driven learning with a factor of 2/3. Figure 16. Expert-driven learning with a factor of 2/3. Figure Figure 16. 16. Expert-driven Expert-driven lear learning ning with a factor of 2/3 with a factor of 2/3. . Figure 16. Expert-driven learning with a factor of 2/3. Figure 17. Full expert-driven learning. Figure 17. Full expert-driven learning. Figure 17. Full expert-driven learning. Figure 17. Full expert-driven learning. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. Figure 17. Full expert-driven learning. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Appl. Sci. 2019, 9, 3553 15 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of . Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 21. Full expert-driven learning. Figure Figure 21. 21. Full Full expert-driven expert-driven learning. learning. The accuracy of the model simulation results under different tutor ratios is shown in Table 3 Figure 21. Full expert-driven learning. below. Figure 21. Full expert-driven learning. The accuracy of the model simulation results under dierent tutor ratios is shown in Table 3 below. The accuracy of the model simulation results under different tutor ratios is shown in Table 3 The accuracy of the model simulation results under different tutor ratios is shown in Table 3 below. The accuracy of the m Table 3. ode Stratu l sim m type accuracy unde ulation results undr different expert ratios. er different tutor ratios is shown in Table 3 Table 3. Stratum type accuracy under dierent expert ratios. below. below. Table 3. Stratum type accuracy under different expert ratios. Expert Ratio 0 1/3 1/2 2/3 1 Expert Ratio 0 1/3 1/2 2/3 1 Table 3. Stratum type accuracy under different expert ratios. Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% ExMaximumpert Ratio value 61.42%0 63.83%1/3 1 64.82%/2 2 63.40% 64.82%/3 1 Table 3. Stratum type accuracy under different expert ratios. Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Expert Ratio 0 1/3 1/2 2/3 1 Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data St beead tween y va the plue rediction 59ser.86% ies of 60 the.00% mod 62 el and the r.41% e 61 al geostrat.1igr3% a 60 phic serie.4 s2% . De tailed Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. statistics are shown in Table 4. Figure 22. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Figure 22. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Figure 22. Expert-driven learning with a factor of 1/3. Figure 22. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, 3553 16 of 29 Figure 22. Expert-driven learning with a factor of 1/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 24. Expert-driven learning with a factor of ½. Figure 24. Expert-driven learning with a factor of ½. Figure 24. Expert-driven learning with a factor of ½. Figure 24. Expert-driven learning with a factor of . Figure 24. Expert-driven learning with a factor of ½. Figure 25. Full expert-driven learning. Figure 25. Full expert-driven learning. Figure 25. Full expert-driven learning. Table 4. Statistical results of the edit distance under the different expert ratios. Figure 25. Full expert-driven learning. Table 4. Statistical results of the edit distance under the dierent expert ratios. Figure 25. Full expert-driven learning. Table 4. Stat Expe istrt Rat ical results of io the edit distance under the d 0 1/3 iff1 erent expert ratios. /2 2/3 1 Table 4. Statistical results of the edit distance under the different expert ratios. Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Table 4. Statistical results of the edit distance under the different expert ratios. Expert Ratio 0 1/3 1/2 2/3 1 Edit Distance = 0 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Edit Distance Ex= pe 0 rt Ratio 0 1/3 1/2 2/3 1 Steady value 35.2% 38% 38.4% 38.4% 35.6% Steady value 35.2% 38% 38.4% 38.4% 35.6% Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Expert Ratio 0 1/3 1/2 2/3 1 Edit Distance = 0 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Maximum value 76% 77.2% 76.4% 77.2% 76.4% Maximum value 76% 77.2% 76.4% 77.2% 76.4% Edit Edit Dista Distance nce <= = 0 1 Steady value 35.2% 38% 38.4% 38.4% 35.6% Edit Distance <= 1 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Steady value 74% 75.6% 75.6% 75.6% 73.6% StStead eadyy vvaalluuee 3574%.2% 38% 75.6% 38 75.4% .6% 7538.4% .6% 7335.6% .6% Edit Distance = 0 Maximum value 76% 77.2% 76.4% 77.2% 76.4% Steady value 35.2% 38% 38.4% 38.4% 35.6% Edit Distance <= 1 Maximum value 76% 77.2% 76.4% 77.2% 76.4% Steady value 74% 75.6% 75.6% 75.6% 73.6% Edit Distance <= 1 Maximum value 76% 77.2% 76.4% 77.2% 76.4% The similarity curves between the prediction series of the model and the real geostratigraphic Steady value 74% 75.6% 75.6% 75.6% 73.6% The similarity curves between the prediction series of the model and the real geostratigraphic Edit Distance <= 1 Steady value 74% 75.6% 75.6% 75.6% 73.6% series under different expert ratios is shown in Figures 26–29. series under dierent expert ratios is shown in Figures 26–29. The similarity curves between the prediction series of the model and the real geostratigraphic The similarity curves between the prediction series of the model and the real geostratigraphic series under different expert ratios is shown in Figures 26–29. The similarity curves between the prediction series of the model and the real geostratigraphic series under different expert ratios is shown in Figures 26–29. series under different expert ratios is shown in Figures 26–29. Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 32 Figure Figure 26. 26. Expert-driven Expert-driven learning with a factor of 1/3 learning with a factor of 1/3. . Figure 26. Expert-driven learning with a factor of 1/3. Figure 26. Expert-driven learning with a factor of 1/3. Figure 26. Expert-driven learning with a factor of 1/3. Figure Figure 27. 27. Expert-driven Expert-driven learning with a factor of 2/3 learning with a factor of 2/3. . Figure 28. Expert-driven learning with a factor of ½. Figure 29. Full expert-driven learning. The statistics of series similarity under different expert ratios are shown in Table 5. Table 5. Statistical results of the series similarity. Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% It can be seen that adopting the expert-driven learning mechanism is helpful to improve the performance of test models for stratum type series simulation based on machine learning, as shown in Table 5. However, the amplitude of the improvement effect is not significant. The expert-driven model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the faster the model will reach stability. From the highest and stable values of the various indicators in the different models, it is not the rule that the higher the expert ratio is, the better the effect will be. The ultimate performance of full expert-driven learning was only slightly better than that of the RNN model. The best results were obtained by using a partial expert-driven learning strategy model. 3.2.2. Training and Verification of the Stratum Thickness Series Model (1) Layer Thickness Simulation Based on Multi-Category Classification The layer thickness of the study area is divided into six stratum thickness intervals as follows: within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness series simulation based on multi-category classification also needs to be numbered and coded for the different stratum thicknesses, as shown in Table 6. Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 32 Figure 27. Expert-driven learning with a factor of 2/3. Appl. Sci. 2019, 9, 3553 17 of 29 Figure 27. Expert-driven learning with a factor of 2/3. Figure 28. Expert-driven learning with a factor of ½. Figure Figure 28. 28. Expert-driven Expert-driven learning learning with with a factor of ½ a factor of . . Figure 29. Full expert-driven learning. Figure 29. Full expert-driven learning. Figure 29. Full expert-driven learning. The statistics of series similarity under dierent expert ratios are shown in Table 5. The statistics of series similarity under different expert ratios are shown in Table 5. The statistics of series similarity under different expert ratios are shown in Table 5. Table 5. Statistical results of the series similarity. Table 5. Statistical results of the series similarity. Table 5. Statistical results of the series similarity. Expert Ratio 0 1/3 1/2 2/3 1 Expert Ratio 0 1/3 1/2 2/3 1 Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% It can be seen that adopting the expert-driven learning mechanism is helpful to improve the It can be seen that adopting the expert-driven learning mechanism is helpful to improve the It can be seen that adopting the expert-driven learning mechanism is helpful to improve the performance of test models for stratum type series simulation based on machine learning, as shown performance of test models for stratum type series simulation based on machine learning, as shown performance of test models for stratum type series simulation based on machine learning, as shown in Table 5. However, the amplitude of the improvement effect is not significant. The expert-driven in Table 5. However, the amplitude of the improvement eect is not signiﬁcant. The expert-driven in Table 5. However, the amplitude of the improvement effect is not significant. The expert-driven model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the faster model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the faster the model will reach stability. From the highest and stable values of the various indicators in the model will reach stability. From the highest and stable values of the various indicators in the faster the model will reach stability. From the highest and stable values of the various indicators in the different models, it is not the rule that the higher the expert ratio is, the better the effect will be. dierent models, it is not the rule that the higher the expert ratio is, the better the eect will be. the different models, it is not the rule that the higher the expert ratio is, the better the effect will be. The ultimate performance of full expert-driven learning was only slightly better than that of the RNN The ultimate performance of full expert-driven learning was only slightly better than that of the RNN The ultimate performance of full expert-driven learning was only slightly better than that of the RNN model. The best results were obtained by using a partial expert-driven learning strategy model. model. The best results were obtained by using a partial expert-driven learning strategy model. model. The best results were obtained by using a partial expert-driven learning strategy model. 3.2.2. Training and Verification of the Stratum Thickness Series Model 3.2.2. Training and Veriﬁcation of the Stratum Thickness Series Model 3.2.2. Training and Verification of the Stratum Thickness Series Model (1) Layer Thickness Simulation Based on Multi-Category Classification (1) Layer Thickness Simulation Based on Multi-Category Classiﬁcation (1) Layer Thickness Simulation Based on Multi-Category Classification The layer thickness of the study area is divided into six stratum thickness intervals as follows: The layer thickness of the study area is divided into six stratum thickness intervals as follows: The layer thickness of the study area is divided into six stratum thickness intervals as follows: within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness series simulation based on multi-category classification also needs to be numbered and coded for the series simulation based on multi-category classiﬁcation also needs to be numbered and coded for the series simulation based on multi-category classification also needs to be numbered and coded for the different stratum thicknesses, as shown in Table 6. dierent stratum thicknesses, as shown in Table 6. different stratum thicknesses, as shown in Table 6. Table 6. Code of the layer thickness type. Layer Thickness Type Stratum Thickness Interval Coded Vector Coding Number <3 m 0 [1, 0, 0, 0, 0, 0, 0] 3–5 m 1 [0, 1, 0, 0, 0, 0, 0] 5–10 m 2 [0, 0, 1, 0, 0, 0, 0] 10–20 m 3 [0, 0, 0, 1, 0, 0, 0] 20–30 m 4 [0, 0, 0, 0, 1, 0, 0] >30 m 5 [0, 0, 0, 0, 0, 1, 0] initiation mark 6 [0, 0, 0, 0, 0, 0, 1] Appl. Sci. 2019, 9, x FOR PEER REVIEW 20 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 20 of 32 Table 6. Code of the layer thickness type. Table 6. Code of the layer thickness type. Layer Thickness Layer Thickness Stratum Thickness Interval Coded Vector Stratum Thickness Interval Coded Vector Type Coding Number Type Coding Number <3 m 0 [1, 0, 0, 0, 0, 0, 0] <3 m 0 [1, 0, 0, 0, 0, 0, 0] 3–5 m 1 [0, 1, 0, 0, 0, 0, 0] 3–5 m 1 [0, 1, 0, 0, 0, 0, 0] 5–10 m 2 [0, 0, 1, 0, 0, 0, 0] 5–10 m 2 [0, 0, 1, 0, 0, 0, 0] 10–20 m 3 [0, 0, 0, 1, 0, 0, 0] 10–20 m 3 [0, 0, 0, 1, 0, 0, 0] 20–30 m 4 [0, 0, 0, 0, 1, 0, 0] 20–30 m 4 [0, 0, 0, 0, 1, 0, 0] >30 m 5 [0, 0, 0, 0, 0, 1, 0] >30 m 5 [0, 0, 0, 0, 0, 1, 0] Appl. Sci. 2019, 9, 3553 18 of 29 initiation mark 6 [0, 0, 0, 0, 0, 0, 1] initiation mark 6 [0, 0, 0, 0, 0, 0, 1] Before the output of the model is generated, the encoder has received a complete series of Before Before the theoutput output of of the the model model is generated, is generated, the encoder the encoder h has received as receiv a complete ed a com series pleteof se stratum ries of stratum types, that is, the total number of stratum layers at the prediction point is known. Therefore, stratum types, that is, the total number of stratum layers at the prediction point is known. Therefore, types, that is, the total number of stratum layers at the prediction point is known. Therefore, there there is no need to add a termination marker for the layer thickness interval. Only an initiation mark there is no is need no n to eed to add a add termination a terminat marker ion marker for the for layer the layer thickness thickness interval. interval Only . Onan ly an initiation initiation mark mark is is introduced as the starting point of the decoder’s simulated layer thickness sequence. After all is introduced as the starting point of the decoder’s simulated layer thickness sequence. After all introduced as the starting point of the decoder ’s simulated layer thickness sequence. After all outputs outputs of the model are completed, a series equal to the number of layers is intercepted as the outp of the utmodel s of the m are completed, odel are com a p series leted,equal a serito es eq theu number al to the n of layers umber of is inte laye rcepted rs is in as tercep the pr ted ediction as the prediction sequence of the layer thickness. prediction sequence of the layer thickness. sequence of the layer thickness. (2) Model Training and Testing (2) Model Training and Testing (2) Model Training and Testing The stratum thickness series model adopts the seq2seq architecture and uses the drilling data in The stratum thickness series model adopts the seq2seq architecture and uses the drilling data in The stratum thickness series model adopts the seq2seq architecture and uses the drilling data the training set for training. To accurately reflect the actual performance of the model, the highest the training set for training. To accurately reflect the actual performance of the model, the highest in the training set for training. To accurately reﬂect the actual performance of the model, the highest accuracy and average accuracy of the model in the test set were compared. After each round of accuracy and average accuracy of the model in the test set were compared. After each round of accuracy and average accuracy of the model in the test set were compared. After each round of training, training, the model was tested, and the test results were recorded. After training 500 rounds, the loss training, the model was tested, and the test results were recorded. After training 500 rounds, the loss the model was tested, and the test results were recorded. After training 500 rounds, the loss curve curve of the model is shown in Figure 30, and the changes in prediction accuracy are shown in Figure curve of the model is shown in Figure 30, and the changes in prediction accuracy are shown in Figure of the model is shown in Figure 30, and the changes in prediction accuracy are shown in Figure 31. 31. As the number of training rounds increases, the prediction performance of the model increases 31. As the number of training rounds increases, the prediction performance of the model increases As the number of training rounds increases, the prediction performance of the model increases slowly slowly and finally converges to 63.53%. slowly and finally converges to 63.53%. and ﬁnally converges to 63.53%. Figure 30. Loss function curve. Figure 30. Loss function curve. Figure 30. Loss function curve. Figure 31. Prediction accuracy curve of the layer thickness. Figure 31. Prediction accuracy curve of the layer thickness. Figure 31. Prediction accuracy curve of the layer thickness. (3) Testing the Eect of Expert-Driven Learning (3) Testing the Effect of Expert-Driven Learning (3) Testing the Effect of Expert-Driven Learning To further improve the accuracy of the model and improve the prediction ability of the model To further improve the accuracy of the model and improve the prediction ability of the model To further improve the accuracy of the model and improve the prediction ability of the model for the stratum thickness category, this section conducts expert-driven model based on supervisory for the stratum thickness category, this section conducts expert-driven model based on supervisory for the stratum thickness category, this section conducts expert-driven model based on supervisory learning in dierent proportions and compares the learning eect to determine the model with the learning in different proportions and compares the learning effect to determine the model with the learning in different proportions and compares the learning effect to determine the model with the highest accuracy and the greatest prediction ability. In this section, the expert ratios adopted by the highest accuracy and the greatest prediction ability. In this section, the expert ratios adopted by the highest accuracy and the greatest prediction ability. In this section, the expert ratios adopted by the seq2seq model in the learning process are 1/3, 1/2, 2/3, and 1. The accuracy performance of the dierent seq2seq model in the learning process are 1/3, 1/2, 2/3, and 1. The accuracy performance of the seq2seq model in the learning process are 1/3, 1/2, 2/3, and 1. The accuracy performance of the models in test data is provided in Table 7. different models in test data is provided in Table 7. different models in test data is provided in Table 7. Table 7. Prediction accuracy of the layer thickness. Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 65.07% 73.05% 80.08% 75.60% 70.07% Steady value 63.53% 70.07% 75.05% 72.62% 67.94% Table 7 shows the highest value of the results achieved in the test data and the ﬁnal stable value after convergence, based on the dierent expert ratios. As we can see from the test results, with the increase in the proportion of expert-driven learning, the accuracy of the model in terms of the test data ﬁrst increases and then decreases. In addition, the models that do not adopt expert-driven learning and Appl. Sci. 2019, 9, x FOR PEER REVIEW 21 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 21 of 32 Table 7. Prediction accuracy of the layer thickness. Table 7. Prediction accuracy of the layer thickness. Expert Expert 0 1/3 1/2 2/3 1 0 1/3 1/2 2/3 1 Ratio Ratio Maximum Maximum 65.07% 73.05% 80.08% 75.60% 70.07% 65.07% 73.05% 80.08% 75.60% 70.07% value value Steady Steady 63.53% 70.07% 75.05% 72.62% 67.94% 63.53% 70.07% 75.05% 72.62% 67.94% value value Table 7 shows the highest value of the results achieved in the test data and the final stable value Table 7 shows the highest value of the results achieved in the test data and the final stable value after convergence, based on the different expert ratios. As we can see from the test results, with the after convergence, based on the different expert ratios. As we can see from the test results, with the increase in the proportion of expert-driven learning, the accuracy of the model in terms of the test increase in the proportion of expert-driven learning, the accuracy of the model in terms of the test Appl. Sci. 2019, 9, 3553 19 of 29 data first increases and then decreases. In addition, the models that do not adopt expert-driven data first increases and then decreases. In addition, the models that do not adopt expert-driven learning and completely adopt expert-driven learning do not achieve the highest accuracy. Clearly, learning and completely adopt expert-driven learning do not achieve the highest accuracy. Clearly, the relationship between the expert ratio and the prediction accuracy rate is not simply a positive completely adopt expert-driven learning do not achieve the highest accuracy. Clearly, the relationship the relationship between the expert ratio and the prediction accuracy rate is not simply a positive correlation. The loss function of 50% expert-driven learning and the training process is shown in correlation. T between the expert he loss func ratio and tion of the pr 50% ediction expert-d accuracy riven learn rate isinot ng and the trainin simply a positive g process is sh correlation. The own loss in Figure 32. When 50% expert-driven learning is applied, the stable value of the layer thickness function of 50% expert-driven learning and the training process is shown in Figure 32. When 50% Figure 32. When 50% expert-driven learning is applied, the stable value of the layer thickness prediction accuracy is 75.05%, and the highest value is 80.08%, which is the best model performance p expert-driven rediction accurac learning y is 7 is 5.applied, 05%, and the the stabl highes e value t valof ue the is 80.08%, wh layer thickness ich is the prediction best model per accuracy f is or 75.05%, mance in the test set, as shown in Figure 33. At this point, the prediction ability of the model for unknown and the highest value is 80.08%, which is the best model performance in the test set, as shown in the test set, as shown in Figure 33. At this point, the prediction ability of the model for unknown data is the greatest, which is consistent with the experience with the stratum type identification model. da in ta Figur is the e 33 gre . At atethis st, whi point, ch is the consis prediction tent with th ability e exp of the eriemodel nce with the for unknown stratumdata typeis ident the i gr fica eatest, tion m which odel. Therefore, expert-driven learning can improve the prediction ability of the model and accelerate Therefore is consistent , ex with pert-d the riven experience learning c with an the improve th stratum type e predic identiﬁcation tion ability model. of the m Thero efor del e,and expert-driven accelerate convergence, but it is not the rule that the higher the expert ratio is, the better the performance of the conv learning ergence, can impr but it ove is no the t th pre rul ediction e thaability t the higher of thethe model expert and raaccelerate tio is, the better the performanc convergence, but it is e of not the the model. model. rule that the higher the expert ratio is, the better the performance of the model. Figure 32. Loss curve of 50% expert-driven learning. Figure 32. Loss curve of 50% expert-driven learning. Figure 32. Loss curve of 50% expert-driven learning. Figure 33. Accuracy of 50% expert-driven learning. Figure 33. Accuracy of 50% expert-driven learning. Figure 33. Accuracy of 50% expert-driven learning. The ﬁnal results show that the maximum accuracy of the layer thickness model is 80.85% under The final results show that the maximum accuracy of the layer thickness model is 80.85% under The final results show that the maximum accuracy of the layer thickness model is 80.85% under the 50% expert ratio, which accurately predicts the layer thickness in the test data. the 50% expert ratio, which accurately predicts the layer thickness in the test data. the 50% expert ratio, which accurately predicts the layer thickness in the test data. 3.2.3. Veriﬁcation of the Geostratigraphic Series Model 3.2.3. Verification of the Geostratigraphic Series Model 3.2.3. Verification of the Geostratigraphic Series Model To verify the true prediction ability of the geostratigraphic series model, the stratum data in the To verify the true prediction ability of the geostratigraphic series model, the stratum data in the To verify the true prediction ability of the geostratigraphic series model, the stratum data in the test borehole data are used for practical testing, and the dierences between the simulated series output test borehole data are used for practical testing, and the differences between the simulated series test borehole data are used for practical testing, and the differences between the simulated series by the model and the real geostratigraphic series are compared. Selected examples of the real borehole output by the model and the real geostratigraphic series are compared. Selected examples of the real output by the model and the real geostratigraphic series are compared. Selected examples of the real stratum conditions and prediction results of machine learning are shown in Table 8. borehole stratum conditions and prediction results of machine learning are shown in Table 8. borehole stratum conditions and prediction results of machine learning are shown in Table 8. Table 8 shows that by comparing the prediction results of the model with the real borehole data, the machine learning model based on the seq2seq architecture has a high accuracy in stratum type prediction. According to the statistics, in all data of the test set, the machine learning model accurately simulates 62.98% of the stratum types, and the similarity between the simulated sequence and the real stratum sequence is 72.16%. In addition, the accuracy rate of the stratum thickness prediction is 74.04%, which basically realizes the determination of the stratum thickness in the study area, as shown in Table 9. In conclusion, the machine learning model based on a recurrent neural network can accurately simulate the real stratum situation in the study area, and its feasibility is veriﬁed. Appl. Sci. 2019, 9, 3553 20 of 29 Table 8. Comparison of the real borehole stratum and machine learning prediction results. The Real Borehole Strata Prediction Results of Machine Learning Number Stratum Thickness Sequence Stratum Thickness Sequence Stratum Type Sequence Stratum Type Sequence (m) (m) 1 silt, clay 0.3, 3.9 ﬂoury soil, clay, plain ﬁll within 3 m, within 3 m, 3–5 m 2 clay 2 clay within 3 m 3 miscellaneous ﬁll 0.6 plain ﬁll 5–10 m 4 plain ﬁll, clay 3.1, 9.8 plain ﬁll, clay within 3 m, 5–10 m miscellaneous ﬁll, plain ﬁll, mucky soil, within 3 m, within 3 m, within 3 m, within 3 5 miscellaneous ﬁll, clay, mucky soil, plain ﬁll, clay 1.2, 1.3, 1.5, 2.4, 13.3 plain ﬁll, clay m, 10–20 m within 3 m, within 3 m, within 3 m, within 3 6 ﬂoury soil, silty clay, plain ﬁll, clay, plain ﬁll, clay 1.0, 0.5, 2.5, 1.2, 0.3, 3.6 ﬂoury soil, plain ﬁll, clay, plain ﬁll, clay m 5–10 m 7 miscellaneous ﬁll, plain ﬁll, clay 0.7, 3.0, 4.5 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 3–5 m 8 miscellaneous ﬁll, clay 0.6, 4.0 miscellaneous ﬁll within 3 m 9 miscellaneous ﬁll, plain ﬁll, clay 0.5, 1.0, 11.9 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 10–20 m 10 miscellaneous ﬁll, clay 1.0, 9.8 miscellaneous ﬁll, clay within 3 m, 5–10 m 11 miscellaneous ﬁll, silt, plain ﬁll, clay 4.1, 11.2, 7.0, 10.0 miscellaneous ﬁll, plain ﬁll, clay within 3 m, 10–20 m, 5–10 m 12 ﬂoury soil, plain ﬁll, mucky soil, clay 0.5, 6.7, 1.2, 8.6 ﬂoury soil, plain ﬁll, plain ﬁll, clay within 3 m, within 3 m, within 3 m, 5–10 m 13 silt, clay 0.4, 6.6 ﬂoury soil, clay within 3 m, 5–10 m 14 silt, clay 0.4, 10.4 ﬂoury soil, clay within 3 m, 5–10 m 15 miscellaneous ﬁll, silt, plain ﬁll, clay 0.7, 1.9, 3.4, 24.0 miscellaneous ﬁll, ﬂoury soil, plain ﬁll, clay within 3 m, within 3 m, within 3 m, 20–30 m miscellaneous ﬁll soil, plain ﬁll soil, old city miscellaneous ﬁll, ﬂoury soil, plain ﬁll, old within 3 m, within 3 m, within 3 m, 5–10 m, 16 1.2, 2.6, 6.5, 13.0 miscellaneous ﬁll soil, clay town ﬁll, clay 10–20 m 17 miscellaneous ﬁll soil, plain ﬁll soil, clay 0.5, 2.8, 10.2 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 10–20 m 18 miscellaneous ﬁll soil, plain ﬁll soil, clay 2.1, 0.8, 12.9 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 10–20 m, Appl. Sci. 2019, 9, 3553 21 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 23 of 32 Table 8 shows that by comparing the prediction results of the model with the real borehole data, Table 9. Statistical results of the geostratigraphic series model simulations. the machine learning model based on the seq2seq architecture has a high accuracy in stratum type prediction. Stratum According Type Accuracy to the statistic As verage , in all Sequence data of the Similarity test set, the m Stratum achine Thickness learning m Accuracy odel accurately simulates 62.98% of the stratum types, and the similarity between the simulated sequence and the 62.98% 72.16% 74.04% real stratum sequence is 72.16%. In addition, the accuracy rate of the stratum thickness prediction is 74.04%, which basically realizes the determination of the stratum thickness in the study area, as 3.3. Three-Dimensional Geological Modeling and Testing shown in Table 9. 3.3.1. Three-Dimensional Geological Modeling Table 9. Statistical results of the geostratigraphic series model simulations. To further test the geostratigraphic series simulation eect based on machine learning, this section Stratum Type Accuracy Average Sequence Similarity Stratum Thickness Accuracy compares the geostratigraphic series simulation method based on machine learning with the traditional 62.98% 72.16% 74.04% method based on 3D geological modeling. On the basis of the training data, a 3D geological model of the research area is constructed by using the triangulated irregular network (TIN) 3D geological In conclusion, the machine learning model based on a recurrent neural network can accurately modeling method [39]. The 3D geological model is consistent with the real strata at the borehole simulate the real stratum situation in the study area, and its feasibility is verified. locations, and it can directly show the complex geological structure and the spatial distributions of the rock and soil masses comprehensively. 3.3. Three-Dimensional Geological Modeling and Testing The main steps for the construction the 3D geological model in this study are as follows: 1. Drilling treatment: According to the geological conditions and drilling stratiﬁcation data, 3.3.1. Three-Dimensional Geological Modeling the strata are classiﬁed and integrated, and the strata are preliminarily sorted from top to bottom. 2. Interpolation mesh generation: Using Delaunay’s triangulation and subdivision algorithms, To further test the geostratigraphic series simulation effect based on machine learning, this a TIN mesh is generated, as shown in Figure 34. section compares the geostratigraphic series simulation method based on machine learning with the 3. Network reﬁnement: The generated irregular triangular interpolation network is adjusted until traditional method based on 3D geological modeling. On the basis of the training data, a 3D the accuracy meets the requirements. geological model of the research area is constructed by using the triangulated irregular network (TIN) 4. Uniform drilling series: All drilling holes are traversed and a uniform geostratigraphic series 3D geological modeling method [39]. The 3D geological model is consistent with the real strata at the is established by considering special stratum conditions such as missing data and reversals. Then, borehole locations, and it can directly show the complex geological structure and the spatial according to the uniﬁed geostratigraphic series, the original stratiﬁcation of all borehole data is distributions of the rock and soil masses comprehensively. transformed into a uniﬁed stratiﬁcation of the borehole series, as shown in Figure 35. If a stratum is The main steps for the construction the 3D geological model in this study are as follows: not included in the original data of the borehole, its layer thickness is set to zero. 1. Drilling treatment: According to the geological conditions and drilling stratification data, the 5. Spatial interpolation: For each layer of the uniform drilling series, the Kriging method is used strata are classified and integrated, and the strata are preliminarily sorted from top to bottom. to calculate the elevation at the top and bottom of the layer in the interpolation grid. If the elevation of 2. Interpolation mesh generation: Using Delaunay’s triangulation and subdivision algorithms, a the top layer is the same as that of the bottom layer, this layer does not exist. TIN mesh is generated, as shown in Figure 34. 6. Stratum construction: If the elevation of the top and bottom of the stratum are dierent, the top 3. Network refinement: The generated irregular triangular interpolation network is adjusted and bottom can be connected with adjacent points to the interpolation point to form a stratum of the until the accuracy meets the requirements. 3D model, as shown in Figure 36. 4. Uniform drilling series: All drilling holes are traversed and a uniform geostratigraphic series 7. Inspection: The generated 3D model is inspected, and the model is adjusted according to is established by considering special stratum conditions such as missing data and reversals. Then, experience and geological characteristics. according to the unified geostratigraphic series, the original stratification of all borehole data is 8. Model generation: A 3D stratum model is rendered, and the redundant parts are removed, transformed into a unified stratification of the borehole series, as shown in Figure 35. If a stratum is while only the research area is maintained. not included in the original data of the borehole, its layer thickness is set to zero. Figure Figure 34. 34. Interpolation Interpolation netwo network rk diagram. diagram. Appl. Sci. 2019, 9, x FOR PEER REVIEW 24 of 32 Appl. Sci. 2019, 9, 3553 22 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 24 of 32 Figure 35. Unified geostratigraphic series diagram. 5. Spatial interpolation: For each layer of the uniform drilling series, the Kriging method is used to calculate the elevation at the top and bottom of the layer in the interpolation grid. If the elevation of the top layer is the same as that of the bottom layer, this layer does not exist. 6. Stratum construction: If the elevation of the top and bottom of the stratum are different, the top and bottom can be connected with adjacent points to the interpolation point to form a stratum of the 3D model, as shown in Figure 36. 7. Inspection: The generated 3D model is inspected, and the model is adjusted according to experience and geological characteristics. 8. Model generation: A 3D stratum model is rendered, and the redundant parts are removed, Figure 35. Uniﬁed geostratigraphic series diagram. Figure 35. Unified geostratigraphic series diagram. while only the research area is maintained. 5. Spatial interpolation: For each layer of the uniform drilling series, the Kriging method is used to calculate the elevation at the top and bottom of the layer in the interpolation grid. If the elevation of the top layer is the same as that of the bottom layer, this layer does not exist. 6. Stratum construction: If the elevation of the top and bottom of the stratum are different, the top and bottom can be connected with adjacent points to the interpolation point to form a stratum of the 3D model, as shown in Figure 36. 7. Inspection: The generated 3D model is inspected, and the model is adjusted according to experience and geological characteristics. 8. Model generation: A 3D stratum model is rendered, and the redundant parts are removed, while only the research area is maintained. Figure 36. Schematic diagram of stratum construction. Figure 36. Schematic diagram of stratum construction. The method to determine the boundary conditions of the model is as follows: According to The method to determine the boundary conditions of the model is as follows: According to boundary on the map of the study area, boundary points are selected at appropriate distances. boundary on the map of the study area, boundary points are selected at appropriate distances. The The boundary points are used as the control points of the estimated stratigraphic boundaries. boundary points are used as the control points of the estimated stratigraphic boundaries. Then, these Then, these control points are connected successively to form a closed polygon. The closed polygon control points are connected successively to form a closed polygon. The closed polygon is used as the is used as the boundary of the estimated stratum. After determining the estimated stratigraphic boundary of the estimated stratum. After determining the estimated stratigraphic boundary, we boundary, we extended the area of the borehole to the boundary of the estimated stratum and eventually extended the area of the borehole to the boundary of the estimated stratum and eventually established the entire 3D geological model. established the entire 3D geological model. The whole process of 3D geological model modeling, from borehole data processing to the ﬁnal The whole process of 3D geological model modeling, from borehole data processing to the final generation of the model, is shown in Figure 37 below. Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 generation of the model, is shown in Figure 37 below. Figure 36. Schematic diagram of stratum construction. The method to determine the boundary conditions of the model is as follows: According to boundary on the map of the study area, boundary points are selected at appropriate distances. The boundary points are used as the control points of the estimated stratigraphic boundaries. Then, these control points are connected successively to form a closed polygon. The closed polygon is used as the boundary of the estimated stratum. After determining the estimated stratigraphic boundary, we extended the area of the borehole to the boundary of the estimated stratum and eventually established the entire 3D geological model. The whole process of 3D geological model modeling, from borehole data processing to the final generation of the model, is shown in Figure 37 below. Figure 37. Workflow of 3D geological modeling. Figure 37. Workﬂow of 3D geological modeling. Finally, a 3D geological model of the research area is constructed (as shown in Figure 38) and sectioned. The stratum types and series after sectioning are shown in Figures 39–41. Figure 38. Three-dimensional geological model. The 3D geological model software was developed by our own team. Figure 39. Three-dimensional boreholes. Figure 40. Borehole and stratum distributions. Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 Appl. Sci. 2019, 9, 3553 23 of 29 Figure 37. Workflow of 3D geological modeling. Figure 37. Workflow of 3D geological modeling. Figure 37. Workflow of 3D geological modeling. Finally Finally, a , a 3D 3D geolog geological ical model model of of the the r resear esearch ch ar area ea is is constr constructed ucted ( (as as shown in Fig shown in Figur ure 3 e 38 8)) and and Finally, a 3D geological model of the research area is constructed (as shown in Figure 38) and Finally, a 3D geological model of the research area is constructed (as shown in Figure 38) and sectioned. sectioned. Th The e s stratum tratum types types and s and series eries after after sectioning sectioning ar are e shown shown in F in Figur igures es 39–41. 39–41. sectioned. The stratum types and series after sectioning are shown in Figures 39–41. sectioned. The stratum types and series after sectioning are shown in Figures 39–41. Figure 38. Three-dimensional geological model. The 3D geological model software was developed by Figure 38. Three-dimensional geological model. The 3D geological model software was developed by Figure Figure 38. 38. Three-d Three-dimensional imensional geol geologi ogical cal model model. . The 3D The 3D geol geological ogical mo model del software was de software was developed veloped by by our own team. our own team. our our own team. own team. Figure 39. Three-dimensional boreholes. Figure 39. Three-dimensional boreholes. Figure 39. Three-dimensional boreholes. Figure 39. Three-dimensional boreholes. Appl. Sci. 2019, 9, x FOR PEER REVIEW 26 of 32 Figure 40. Borehole and stratum distributions. Figure Figure 40. 40. Bor Bore ehole hole and and stratum stratum distribu distributions. tions. Figure 40. Borehole and stratum distributions. Figure 41. Geological section. Figure 41. Geological section. 3.3.2. Three-Dimensional Geological Model Veriﬁcation 3.3.2. Three-Dimensional Geological Model Verification At the same positions as the data in Section 3.2.3, the borehole coordinate information is input into At the same positions as the data in Section 3.2.3, the borehole coordinate information is input the 3D geological model. Then the comparison prediction results between the 3D geological model into the 3D geological model. Then the comparison prediction results between the 3D geological and the real borehole stratum are obtained, as shown in Table 10. model and the real borehole stratum are obtained, as shown in Table 10. Appl. Sci. 2019, 9, 3553 24 of 29 Table 10. Comparison of the real borehole stratum conditions and 3D geological modeling prediction results. The Real Borehole Strata Prediction Results of 3D Geological Modeling Number Stratum Thickness Sequence Stratum Thickness Sequence Stratum Type Sequence Stratum Type Sequence (m) (m) 1 silt, clay 0.3, 3.9 clay, silt 0.3, 3.9 2 clay 2 miscellaneous ﬁll 2.0 3 miscellaneous ﬁll 0.6 miscellaneous ﬁll 0.6 4 plain ﬁll, clay 3.1, 9.8 miscellaneous ﬁll 13.5 5 miscellaneous ﬁll, clay, mucky soil, plain ﬁll, clay 1.2, 1.3, 1.5, 2.4, 13.3 miscellaneous ﬁll, clay, mucky soil, silt 1.2, 1.3, 3.9, 13.3 6 ﬂoury soil, silty clay, plain ﬁll, clay, plain ﬁll, clay 1.0, 0.5, 2.5, 1.2, 0.3, 3.6 plain ﬁll, silt clay, silt, clay, silt 1, 0.5, 2.5, 1.2, 3.9 7 miscellaneous ﬁll, plain ﬁll, clay 0.7, 3.0, 4.5 miscellaneous ﬁll, silt 0.7, 8.5 8 miscellaneous ﬁll, clay 0.6, 4.0 miscellaneous ﬁll 4.6 9 miscellaneous ﬁll, plain ﬁll, clay 0.5, 1.0, 11.9 miscellaneous ﬁll, silt 0.5, 0.5 10 miscellaneous ﬁll, clay 1.0, 9.8 miscellaneous ﬁll 12.2 11 miscellaneous ﬁll, silt, plain ﬁll, clay 4.1, 11.2, 7.0, 10.0 miscellaneous ﬁll, silt 2.8, 25.2 12 ﬂoury soil, plain ﬁll, mucky soil, clay 0.5, 6.7, 1.2, 8.6 plain ﬁll, silt 0.5, 16.5 13 silt, clay 0.4, 6.6 plain ﬁll 7 14 silt, clay 0.4, 10.4 plain ﬁll 10.9 15 miscellaneous ﬁll, silt, plain ﬁll, clay 0.7, 1.9, 3.4, 24.0 miscellaneous ﬁll, plain ﬁll, silt, silt 0.7, 1.9, 3.4, 24 miscellaneous ﬁll soil, plain ﬁll soil, old city miscellaneous ﬁll, plain ﬁll, old city 16 1.2, 2.6, 6.5, 13.0 1.2, 2.6, 22.5 miscellaneous ﬁll soil, clay miscellaneous ﬁll soil 17 miscellaneous ﬁll soil, plain ﬁll soil, clay 0.5, 2.8, 10.2 miscellaneous ﬁll, plain ﬁll 0.5, 13 18 miscellaneous ﬁll soil, plain ﬁll soil, clay 2.1, 0.8, 12.9 miscellaneous ﬁll, silt, clay 2.1, 0.8, 12.9 Appl. Sci. 2019, 9, x FOR PEER REVIEW 28 of 32 Appl. Sci. 2019, 9, 3553 25 of 29 From Table 10, the 3D geological model performs poorly in terms of the number of layers, stratum type, and sequence similarity, but it can better predict the stratum thickness. When the From Table 10, the 3D geological model performs poorly in terms of the number of layers, stratum prediction of the stratum type is accurate, the corresponding thickness prediction is close to the real type, and sequence similarity, but it can better predict the stratum thickness. When the prediction of value. the stratum type is accurate, the corresponding thickness prediction is close to the real value. Some borehole data are randomly selected in the training set, and the borehole coordinate Some borehole data are randomly selected in the training set, and the borehole coordinate information is input into the 3D geological model to obtain the stratum sequence prediction results information is input into the 3D geological model to obtain the stratum sequence prediction results of of the borehole points. According to the statistics, in all the data of the test set, the 3D geological the borehole points. According to the statistics, in all the data of the test set, the 3D geological model model accurately simulates 30.78% of the stratum types, and the similarity between the simulated accurately simulates 30.78% of the stratum types, and the similarity between the simulated series and series and the real geostratigraphic series is 32.27%. In addition, the accuracy rate of the stratum the real geostratigraphic series is 32.27%. In addition, the accuracy rate of the stratum thickness is thickness is 64.52%, as shown in Table 11. 64.52%, as shown in Table 11. Table 11. Statistics of 3D geological model prediction results. Table 11. Statistics of 3D geological model prediction results. Stratum Type Average Sequence Stratum Thickness Stratum Type Accuracy Average Sequence Similarity Stratum Thickness Accuracy Accuracy Similarity Accuracy 30.78% 32.27% 64.52% 30.78% 32.27% 64.52% Comparing Tables 9 and 11, the prediction results histogram of machine learning and 3D geological Comparing Tables 9 and 11, the prediction results histogram of machine learning and 3D modeling is obtained in terms of the stratum type, average series similarity, and stratum thickness geological modeling is obtained in terms of the stratum type, average series similarity, and stratum accuracy, as shown in Figure 42. thickness accuracy, as shown in Figure 42. Figure 42. Comparison histogram of the prediction results of machine learning and 3D geological Figure 42. Comparison histogram of the prediction results of machine learning and 3D geological modeling. modeling. Figure 42 shows that there is a certain dierence in accuracy between the geostratigraphic series models Figu based re 42 sho on w 3D s th geological at there is modeling a certain di and fferemachine nce in accurac learning. y betwe Generally en the geos , these tratig two raph methods ic series can describe the real stratum situation well. The model based on machine learning has a good models based on 3D geological modeling and machine learning. Generally, these two methods can simulation describe the e re ect al st in ra terms tum sit ofuthe atiostratum n well. The type, mod and el all based on its corr mach esponding ine learn indexes ing ha ar s ea good superior sim to ulation those of the traditional 3D geological model. The machine learning model provides stratum information effect in terms of the stratum type, and all its corresponding indexes are superior to those of the by trad pr itedicting ional 3D geolo the layer gical thicknesses model. The within machine the le strata arning mode and it is slightly l provides mor setra accurate tum info than rmatthe ion by 3D geological model. predicting the layer thicknesses within the strata and it is slightly more accurate than the 3D geological model. 3.4. Evaluation of 3D Geological Modeling Based on the Geostratigraphic Series Model 3.4. Evaluation of 3D Geological Modeling Based on the Geostratigraphic Series Model Considering the actual performance of the machine learning model in the prediction of the stratum type and stratum thickness, this study proposes an evaluation algorithm for a 3D geological model. Considering the actual performance of the machine learning model in the prediction of the In the absence of real data guidance, the learning results based on the machine learning model represent stratum type and stratum thickness, this study proposes an evaluation algorithm for a 3D geological the accuracy of geological modeling. For any geostratigraphic series, the reliability evaluation process model. In the absence of real data guidance, the learning results based on the machine learning model is described below. represent the accuracy of geological modeling. For any geostratigraphic series, the reliability The evaluation objects are divided into a stratum type series and stratum thickness series. evaluation process is described below. The geostratigraphic series model generates output in the same position, including stratum type and The evaluation objects are divided into a stratum type series and stratum thickness series. The stratum thickness series. geostratigraphic series model generates output in the same position, including stratum type and stratum thickness series. Appl. Sci. 2019, 9, 3553 26 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 29 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 29 of 32 The similarity of the stratum type series calculated by the edit distance algorithm is used as the The similarity of the stratum type series calculated by the edit distance algorithm is used as the The similarity of the stratum type series calculated by the edit distance algorithm is used as the evaluation index. evaluation index. evaluation index. Comparing the layer thickness series, if the 3D layer thickness is the same as the most likely Comparing the layer thickness series, if the 3D layer thickness is the same as the most likely Comparing the layer thickness series, if the 3D layer thickness is the same as the most likely thickness, the score is one; if the 3D layer thickness is the same as the second most likely thickness, thickness, the score is one; if the 3D layer thickness is the same as the second most likely thickness, thickness, the score is one; if the 3D layer thickness is the same as the second most likely thickness, the score is 0.5; otherwise, the score is zero. the score is 0.5; otherwise, the score is zero. the score is 0.5; otherwise, the score is zero. The scores are added, and the score sum is divided by the 3D series length, which is then used The scores are added, and the score sum is divided by the 3D series length, which is then used as The scores are added, and the score sum is divided by the 3D series length, which is then used as the layer thickness evaluation index. The average values of the type evaluation index and thickness the layer thickness evaluation index. The average values of the type evaluation index and thickness as the layer thickness evaluation index. The average values of the type evaluation index and thickness evaluation index are calculated, and the reliability score of this point in the 3D geological model is evaluation index are calculated, and the reliability score of this point in the 3D geological model is evaluation index are calculated, and the reliability score of this point in the 3D geological model is obtained. If the reliability score is higher than 0.5, the simulation of the real stratum is considered to obtained. If the reliability score is higher than 0.5, the simulation of the real stratum is considered to obtained. If the reliability score is higher than 0.5, the simulation of the real stratum is considered to be reliable. be reliable. be reliable. The calculation process of this algorithm consists of two parts, the type evaluation index and the The calculation process of this algorithm consists of two parts, the type evaluation index and the The calculation process of this algorithm consists of two parts, the type evaluation index and the layer thickness evaluation index. The reliability score is the average of these two indexes. The range layer thickness evaluation index. The reliability score is the average of these two indexes. The range of layer thickness evaluation index. The reliability score is the average of these two indexes. The range of reliability scores calculated by this algorithm is [0,1], representing the matching degree between reliability scores calculated by this algorithm is [0,1], representing the matching degree between the of reliability scores calculated by this algorithm is [0,1], representing the matching degree between the evaluation object and the empirical cognition of the machine learning model. The higher the evaluation object and the empirical cognition of the machine learning model. The higher the reliability the evaluation object and the empirical cognition of the machine learning model. The higher the reliability score is, the closer the evaluation object and the model are in predicting the stratum score is, the closer the evaluation object and the model are in predicting the stratum distribution of reliability score is, the closer the evaluation object and the model are in predicting the stratum distribution of this point. this point. distribution of this point. The test borehole provides the real stratum data, and its evaluation result should be higher than The test borehole provides the real stratum data, and its evaluation result should be higher than The test borehole provides the real stratum data, and its evaluation result should be higher than that of the 3D model. Moreover, if the stratum distribution of a point in the 3D model is similar to the that of the 3D model. Moreover, if the stratum distribution of a point in the 3D model is similar to the that of the 3D model. Moreover, if the stratum distribution of a point in the 3D model is similar to the real situation, the scoring result will be similar to the result of the real stratum. To test the feasibility real situation, the scoring result will be similar to the result of the real stratum. To test the feasibility of real situation, the scoring result will be similar to the result of the real stratum. To test the feasibility of the evaluation algorithm based on the 3D geological model, this study uses the algorithm to the evaluation algorithm based on the 3D geological model, this study uses the algorithm to calculate of the evaluation algorithm based on the 3D geological model, this study uses the algorithm to calculate the reliability score of the test borehole data and the 3D geological model. The calculation the reliability score of the test borehole data and the 3D geological model. The calculation and statistical calculate the reliability score of the test borehole data and the 3D geological model. The calculation and statistical results show that the average reliability score of the test borehole data is 0.6293, which results show that the average reliability score of the test borehole data is 0.6293, which is higher than and statistical results show that the average reliability score of the test borehole data is 0.6293, which is higher than that of the 3D geological model, as shown in Table 12. In addition, the reliability scores that of the 3D geological model, as shown in Table 12. In addition, the reliability scores of the test is higher than that of the 3D geological model, as shown in Table 12. In addition, the reliability scores of the test boreholes are mostly higher than 0.5, while those of the 3D geological model are mainly boreholes are mostly higher than 0.5, while those of the 3D geological model are mainly below 0.5, of the test boreholes are mostly higher than 0.5, while those of the 3D geological model are mainly below 0.5, as shown in the Figures 43 and 44. as shown in the Figures 43 and 44. below 0.5, as shown in the Figures 43 and 44. Table 12. Average reliability of the test borehole data and 3D geological model. Table 12. Average reliability of the test borehole data and 3D geological model. Table 12. Average reliability of the test borehole data and 3D geological model. Test Borehole Data Three-Dimensional Geological Model Test Borehole Data Three-Dimensional Geological Model Test Borehole Data Three-Dimensional Geological Model Average reliability 0.6293 0.3205 Average reliability 0.6293 0.3205 Average reliability 0.6293 0.3205 Figure 43. Histogram of the reliability index frequency of the 3D geological model. Figure 43. Histogram of the reliability index frequency of the 3D geological model. Figure 43. Histogram of the reliability index frequency of the 3D geological model. Figure 44. Histogram of the borehole data reliability index frequency. Figure 44. Histogram of the borehole data reliability index frequency. Figure 44. Histogram of the borehole data reliability index frequency. In conclusion, the evaluation method of 3D geological modeling based on the geostratigraphic In conclusion, the evaluation method of 3D geological modeling based on the geostratigraphic series model is feasible in this study. series model is feasible in this study. Appl. Sci. 2019, 9, 3553 27 of 29 In conclusion, the evaluation method of 3D geological modeling based on the geostratigraphic series model is feasible in this study. 4. Conclusions (1) In view of the disadvantages of the traditional simulation method of the structure of a geostratigraphic series, this study proposes a method based on the principle of a recurrent neural network. This method has the advantage of not relying on subjective factors such as assumptions and expert experience. Moreover, this approach can eectively evaluate geostratigraphic series simulation results in terms of characteristics such as the stratum thickness, stratum type, and stratum sequence. In the process of stratum simulation, utilizing expert-driven learning can improve both the learning eciency and the predictive ability of the model. (2) A complete machine learning model for geostratigraphic series simulation is established, and a model-based 3D geological modeling evaluation method is designed. This study provides a novel approach for the simulation and prediction of geostratigraphic series with 3D geological modeling. This work has far-reaching practical signiﬁcance for the accurate description of the spatial distributions of geological features and guidance of site selection, engineering construction, and environmental assessment. (3) The series model based on machine learning can describe the real situation at wells, and is a complimentary tool to the traditional 3D geological model. This study directly shows that machine learning is feasible and reliable in geostratigraphic series simulation. Additionally, our research provides new ideas and references for the popularization of machine learning in other ﬁelds of geology and engineering, especially 3D geological modeling. Author Contributions: Z.L., conceptualization, methodology, data curation, formal analysis; writing—original draft preparation, writing—review and editing, project administration, funding acquisition; C.Z., conceptualization, methodology, writing—original draft preparation, supervision, project administration, funding acquisition; J.O., data curation, formal analysis, writing—original draft preparation and editing; W.M., writing—original draft preparation; Z.D., G.Z., formal analysis, writing—original draft preparation. Funding: This research presented is funded by the Provincial Science and Technology Project of Guangdong Province (Grant no. 2016B010124007), the Science and Technology Youth Top-Notch Talent Project of Guangdong Special Support Program (Grant no. 2015 TQ01Z344) and the Guangzhou Science and Technology Project (Grant no. 201803030005). Acknowledgments: The authors would like to thank the anonymous reviewers for their very constructive and helpful comments. Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Bertoncello, A.; Sun, T.; Li, H.; Mariethoz, G.; Caers, J. Conditioning surface-based geological models to well and thickness data. Math. Geosci. 2013, 45, 873–893. [CrossRef] 2. Zhu, L.; Zhang, C.; Li, M.; Pan, X.; Sun, J. Building 3D solid models of sedimentary stratum systems from borehole data: An automatic method and case studies. Eng. Geol. 2012, 127, 1–13. [CrossRef] 3. Jones, N.L.; Walker, J.R.; Carle, S.F. Hydrogeologic unit ﬂow characterization using transition probability geostatistics. Groundwater 2005, 43, 285–289. [CrossRef] 4. Qiao, J.; Pan, M.; Li, Z.; Jin, Y. 3D Geological modeling from DEM, boreholes, cross-sections and geological maps. In Proceedings of the 2011 19th International Conference on Geoinformatics, Shanghai, China, 24–26 June 2011; pp. 1–5. 5. Lallier, F.; Caumon, G.; Borgomano, J.; Viseur, S.; Royer, J.J.; Antoine, C. Uncertainty assessment in the stratigraphic well correlation of a carbonate ramp: Method and application to the Beausset Basin, SE France. Comptes Rendus Geosci. 2016, 348, 499–509. [CrossRef] 6. Edwards, J.; Lallier, F.; Caumon, G.; Carpentier, C. Uncertainty management in stratigraphic well correlation and stratigraphic architectures: A training-based method. Comput. Geosci. 2017, 111, 11–17. [CrossRef] Appl. Sci. 2019, 9, 3553 28 of 29 7. Carr, G.R.; Andrew, A.S.; Denton, G.; Giblin, A.; Korsch, M.; Whitford, D. The “Glass Earth”—Geochemical frontiers in exploration through cover. Aust. Inst. Geosci. Bull. 1999, 28, 33–40. 8. Molennar, M. A topology for 3D vector maps. ITC J. 1992, 1, 25–33. 9. Chen, H.; Huang, T. A survey of construction and manipulation of octrees. Comput. Vis. Graph. Image Process. 1988, 43, 409–431. [CrossRef] 10. Houlding, S.W. 3D Geoscience Modeling—Computer Techniques for Geological Characterization; Springer: New York, NY, USA, 1994; p. 303. 11. Caumon, G.; Mallet, J.L. 3D Stratigraphic models: Representation and stochastic modelling. In Proceedings of the IAMG 2006, Liège, Belgium, 3–8 September 2006. 12. Mallet, J.L. Discrete Smooth Interpolation. ACM Trans. Graph. 1989, 8, 121–144. [CrossRef] 13. Mallet, J.L. Geomodeling; Oxford University Press: New York, NY, USA, 2002; p. 612. 14. Mallet, J.L. Elements of Mathematical Sedimentary Geology: The GeoChron Model; EAGE: Houten, The Netherlands, 2014. 15. Randle, C.H.; Bond, C.E.; Lark, R.M.; Monaghan, A.A. Uncertainty in geological interpretations: Eectiveness of expert elicitations. Geosphere 2019, 15, 108–118. [CrossRef] 16. Carbonell, J. Machine Learning: A Maturing Field. Mach. Learn. 1992, 9, 5–7. [CrossRef] 17. Langley, P. Machine learning as an experimental science. Mach. Learn. 1988, 3, 5–8. [CrossRef] 18. Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [CrossRef] 19. Breiman, L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat. Sci. 2001, 16, 199–231. [CrossRef] 20. Bachri, I.; Hakdaoui, M.; Raji, M.; Teodoro, A.C.; Benbouziane, A. Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco. ISPRS Int. J. Geo-Inf. 2019, 8, 248. [CrossRef] 21. Chen, L.; Ren, C.; Li, L.; Wang, Y.; Zhang, B.; Wang, Z.; Li, L. A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content. ISPRS Int. J. Geo-Inf. 2019, 8, 174. [CrossRef] 22. Mueller, E.; Sandoval, J.; Mudigonda, S.; Elliott, M. A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri. ISPRS Int. J. Geo-Inf. 2019, 8, 13. [CrossRef] 23. Burl, M.C.; Asker, L.; Smyth, P.; Fayyad, U.; Perona, P.; Crumpler, L.; Aubele, J. Learning to Recognize Volcanoes on Venus. Mach. Learn. 1998, 30, 165–194. [CrossRef] 24. Gonçalves, Í.G.; Kumaira, S.; Guadagnin, F. A machine learning approach to the potential-ﬁeld method for implicit modeling of geological structures. Comput. Geosci. 2017, 103, 173–182. [CrossRef] 25. Klump, J.F.; Huber, R.; Robertson, J.; Cox, S.J.; Woodcock, R. Linking descriptive geology and quantitative machine learning through an ontology of lithological concepts. In Proceedings of the AGU Fall Meeting Abstracts 2014, San Francisco, CA, USA, 15–19 December 2004. 26. Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [CrossRef] 27. Porwal, A.; Carranza, E.J.M.; Hale, M. Artiﬁcial Neural Networks for Mineral-Potential Mapping: A Case Study from Aravalli Province, Western India. Nat. Resour. Res. 2003, 12, 155–171. [CrossRef] 28. Zhang, T. The Relationships between Rock Elements and the Igneous Rocks, the Lithologic Discrimination and Mineral Identiﬁcation of Sedimentary Rocks: A Study Based on the Method of Artiﬁcial Neural Network. Ph.D. Thesis, Northwest University, Xi’an, China, 2016. 29. Zhang, Y.; Su, G.; Yan, L. Gaussian Process Machine Learning Model for Forecasting of Karstic Collapse. In International Conference on Applied Informatics and Communication; Springer: Berlin/Heidelberg, Germany, 2011; pp. 365–372. 30. Chaki, S.; Routray, A.; Mohanty, W.K. Well-Log and Seismic Data Integration for Reservoir Characterization: A Signal Processing and Machine-Learning Perspective. IEEE Signal Process. Mag. 2018, 35, 72–81. [CrossRef] 31. Gaurav, A. Horizontal shale well eur determination integrating geology, machine learning, pattern recognition and multivariate statistics focused on the permian basin. In SPE Liquids-Rich Basins Conference-North America; Society of Petroleum Engineers: Richardson, TX, USA, 2017. 32. Sha, A.; Tong, Z.; Gao, J. Recognition and Measurement of Pavement Disasters Based on Convolutional Neural Networks. China J. Highw. Transp. 2017, 31, 1–10. Appl. Sci. 2019, 9, 3553 29 of 29 33. Connor, J.T.; Martin, R.D.; Atlas, L.E. Recurrent Neural Networks and Robust Time Series Prediction; IEEE Press: Piscataway Township, NJ, USA, 1994. 34. Graves, A. Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 35. Lamb, A.M.; Goyal, A.G.; Zhang, Y.; Zhang, S.; Courville, A.C.; Bengio, Y. Professor Forcing: A New Algorithm for Training Recurrent Networks. In Advances in Neural Information Processing Systems, Proceedings of the 30th Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016. 36. Lin, C.; Hsieh, T.; Liu, Y.; Lin, Y.; Fang, C.; Wang, Y.; Chuang, C. Minority oversampling in kernel adaptive subspaces for class imbalanced datasets. IEEE Trans. Knowl. Data Eng. 2018, 30, 950–962. [CrossRef] 37. LÃžcke, J.; Sahani, M. Maximal causes for non-linear component extraction. J. Mach. Learn. Res. 2008, 9, 1227–1267. 38. Liu, X. Application of BP neural network in insider rock identiﬁcation of Taiguyu in Liaohe. Pet. Geol. Eng. 2010, 24, 40–42. 39. Royer, J.J.; Mejia, P.; Caumon, G.; Collon, P. 3D and 4D Geomodelling Applied to Mineral Resources Exploration— An Introduction. 3D, 4D and Predictive Modelling of Major Mineral Belts in Europe; Springer: Cham, Switzerland, 2015. © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Sciences Multidisciplinary Digital Publishing Institute http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/a-stratigraphic-prediction-method-based-on-machine-learning-g9lY5DIyz6

Loading next page...

References (41)

(2016)
The Relationships between Rock Elements and the Igneous Rocks, the Lithologic Discrimination and Mineral Identification of Sedimentary Rocks: A Study Based on the Method of Artificial Neural Network
Soumi Chaki, A. Routray, W. Mohanty (2018)
Well-Log and Seismic Data Integration for Reservoir Characterization: A Signal Processing and Machine-Learning Perspective
IEEE Signal Processing Magazine, 35
S. Houlding (1995)
3D Geoscience Modeling: Computer Techniques for Geological Characterization
(1994)
Recurrent Neural Networks and Robust Time Series Prediction; IEEE Press: Piscataway Township
Alex Graves (2012)
Supervised Sequence Labelling with Recurrent Neural Networks
, 385
J. Carbonell (2004)
Machine Learning: A maturing field
Machine Learning, 9
(1992)
A topology for 3D vector maps
Lin Chen, C. Ren, Lin Li, Yeqiao Wang, Bai Zhang, Zongming Wang, Linfeng Li (2019)
A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content
ISPRS Int. J. Geo Inf., 8
Jörg Lücke, M. Sahani (2008)
Maximal Causes for Non-linear Component Extraction
J. Mach. Learn. Res., 9
J. Mallet (2014)
Elements of Mathematical Sedimentary Geology: the GeoChron Model
A. Porwal, E. Carranza, M. Hale, M. Hale (2003)
Artificial Neural Networks for Mineral-Potential Mapping: A Case Study from Aravalli Province, Western India
Natural Resources Research, 12
P. Langley (2004)
Machine learning as an experimental science
Machine Learning, 3
G. Carr, A. Andrew, G. Denton, A. Giblin, M. Korsch, D. Whitford (1999)
The 'Glass Earth' - geochemical frontiers in exploration through cover
Imane Bachri, M. Hakdaoui, Mohammed Raji, A. Teodoro, A. Benbouziane (2019)
Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco
ISPRS Int. J. Geo Inf., 8
M. Burl, L. Asker, Padhraic Smyth, U. Fayyad, P. Perona, L. Crumpler, J. Aubele (1998)
Learning to Recognize Volcanoes on Venus
Machine Learning, 30
(2010)
Application of BP neural network in insider rock identification of Taiguyu in Liaohe
J. Qiao, M. Pan, Zhaoliang Li, Yi Jin (2011)
3D Geological modeling from DEM, boreholes, cross-sections and geological maps
2011 19th International Conference on Geoinformatics
(2017)
Recognition and Measurement of Pavement Disasters Based on Convolutional Neural Networks
(2002)
Geomodeling; Oxford University
A. Bertoncello, T. Sun, Hongmei Li, G. Mariéthoz, J. Caers (2013)
Conditioning Surface-Based Geological Models to Well and Thickness Data
Mathematical Geosciences, 45
F. Lallier, G. Caumon, J. Borgomano, S. Viseur, J. Royer, Christophe Antoine (2016)
Uncertainty assessment in the stratigraphic well correlation of a carbonate ramp: Method and application to the Beausset Basin, SE France
Comptes Rendus Geoscience, 348
Chin-Teng Lin, Tsung-Yu Hsieh, Yu-Ting Liu, Yang-Yin Lin, Chieh-Ning Fang, Yu-kai Wang, G. Yen, N. Pal, Chun-Hsiang Chuang (2018)
Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets
IEEE Transactions on Knowledge and Data Engineering, 30
J. Edwards, F. Lallier, G. Caumon, C. Carpentier (2018)
Uncertainty management in stratigraphic well correlation and stratigraphic architectures: A training-based method
Comput. Geosci., 111
P. Weihed (2015)
3D, 4D and Predictive Modelling of Major Mineral Belts in Europe
3D, 4D and Predictive Modelling of Major Mineral Belts in Europe
Charles Randle, C. Bond, R. Lark, A. Monaghan (2019)
Uncertainty in geological interpretations: Effectiveness of expert elicitations
Geosphere
G. Caumon, J. Mallet (2006)
3D Stratigraphic models: representation and stochastic modelling
J. Klump, R. Huber, J. Robertson, S. Cox, R. Woodcock (2014)
Linking descriptive geology and quantitative machine learning through an ontology of lithological concepts
, 2014
Homer Chen, Thomas Huang (1988)
A survey of construction and manipulation of octrees
Comput. Vis. Graph. Image Process., 43
A. Gaurav (2017)
Horizontal Shale Well EUR Determination Integrating Geology, Machine Learning, Pattern Recognition and MultiVariate Statistics Focused on the Permian Basin
V. Rodriguez-Galiano, M. Sánchez-Castillo, M. Chica-Olmo, M. Chica-Rivas (2015)
Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines
Ore Geology Reviews, 71
Liang-feng Zhu, Chengjuan Zhang, Mingjiang Li, Xin Pan, Jianzhong Sun (2012)
Building 3D solid models of sedimentary stratigraphic systems from borehole data: An automatic method and case studies
Engineering Geology, 127
N. Jones, J. Walker, S. Carle (2005)
Hydrogeologic unit flow characterization using transition probability geostatistics
Groundwater, 43
J. Connor, R. Martin, L. Atlas (1994)
Recurrent neural networks and robust time series prediction
IEEE transactions on neural networks, 5 2
Erik Mueller, J. On, simo Sandoval, Srikanth Mudigonda, Michael Elliott (2018)
A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri
ISPRS Int. J. Geo Inf., 8
J. Quinlan (1986)
Induction of Decision Trees
Machine Learning, 1
Ítalo Gonçalves, Sissa Kumaira, F. Guadagnin (2017)
A machine learning approach to the potential-field method for implicit modeling of geological structures
Comput. Geosci., 103
L. Breiman (2001)
Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author)
Statistical Science, 16
(2019)
Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel
Yan Zhang, Guo-shao Su, Liubin Yan (2011)
Gaussian Process Machine Learning Model for Forecasting of Karstic Collapse
J. Mallet (1989)
Discrete smooth interpolation
ACM Trans. Graph., 8
Anirudh Goyal, Alex Lamb, Ying Zhang, Saizheng Zhang, Aaron Courville, Yoshua Bengio (2016)
Professor Forcing: A New Algorithm for Training Recurrent Networks
ArXiv, abs/1610.09038

Publisher: Multidisciplinary Digital Publishing Institute
Copyright: © 1996-2023 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. Terms and Conditions Privacy Policy
ISSN: 2076-3417
DOI: 10.3390/app9173553
Publisher site: See Article on Publisher Site

Abstract

applied sciences Article A Stratigraphic Prediction Method Based on Machine Learning 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 Cuiying Zhou , Jinwu Ouyang , Weihua Ming , Guohao Zhang , Zichun Du 1 , 2 , and Zhen Liu * Civil Engineering, Sun Yat-sen University, Guangzhou 510275, China Guangdong Engineering Research Centre for Major Infrastructure Safety, Guangzhou 510275, China * Correspondence: [email protected] Received: 20 June 2019; Accepted: 27 August 2019; Published: 29 August 2019 Abstract: Simulation of a geostratigraphic unit is of vital importance for the study of geoinformatics, as well as geoengineering planning and design. A traditional method depends on the guidance of expert experience, which is subjective and limited, thereby making the eective evaluation of a stratum simulation quite impossible. To solve this problem, this study proposes a machine learning method for a geostratigraphic series simulation. On the basis of a recurrent neural network, a sequence model of the stratum type and a sequence model of the stratum thickness is successively established. The performance of the model is improved in combination with expert-driven learning. Finally, a machine learning model is established for a geostratigraphic series simulation, and a three-dimensional (3D) geological modeling evaluation method is proposed which considers the stratum type and thickness. The results show that we can use machine learning in the simulation of a series. The series model based on machine learning can describe the real situation at wells, and it is a complimentary tool to the traditional 3D geological model. The prediction ability of the model is improved to a certain extent by including expert-driven learning. This study provides a novel approach for the simulation and prediction of a series by 3D geological modeling. Keywords: recurrent neural network; series simulation; three-dimensional geological modeling; expert-driven learning 1. Introduction A geostratigraphic structure is the result of multiple factors in the course of the evolution of Earth’s history, forming a complex morphology and irregular distribution. Geological bodies have spatially successive relationships, thus forming a series of strata with dierent lateral extensions and thicknesses. A geostratigraphic series is spatially uncertain due to the variations in sequence and the number and thickness of the stratum layers. Within the rock-soil mass extending from the top of the bedrock (referring to lithiﬁed rock that underlies unconsolidated surface sediments, conglomerates or regolith) to the surface, only one layer or dozens of layers can be present. There can be a few layers, and each can be dierent. At the same time, the thickness of the strata also varies considerably, from tens of centimeters to hundreds of meters. Dierent geotechnical bodies have dierent physical, chemical, and mechanical properties, and weak stratum conditions directly threaten the safety of engineering construction and operation. A geostratigraphic series model with high reliability is helpful to understand the geological conditions of a construction area, providing far-reaching practical guidance for site planning and selection, engineering construction, environmental assessment, cost savings, and operational risk reduction. Therefore, building a series model and accurately describing the spatial distribution of strata have become important topics in the ﬁeld of geology and engineering geology. Appl. Sci. 2019, 9, 3553; doi:10.3390/app9173553 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 3553 2 of 29 To understand the geological structure, many techniques and methods have been developed to describe, simulate, and model strata [1–6]. With the introduction of the Glass Earth [7] concept and geological data, interdisciplinary theoretical integration and application research is being carried out. The most representative traditional method of simulating the stratum structure is three-dimensional (3D) geological modeling, such as that with the B-rep model [8], octree model [9], tri-prism model [10] and geochron concepts [11–14]. However, the traditional method relies on the guidance of expert knowledge and experience in the selection of assumptions, parameters, and data interpolation methods, which are subjective and limited [15]. Assumptions about the borehole data distribution must be made, and it is dicult to eectively evaluate the stratum simulation results. Machine learning [16–18] has been widely used in various ﬁelds of geology. The machine learning method does not make too many assumptions about the data but selects a model according to the data characteristics. Then, the machine learning method divides the data into a training set and a test set and constantly adjusts the parameters to obtain better accuracy. Machine learning is more concerned with the predictive power of models [19]. In the ﬁelds of geology and engineering, there have been numerous research and application examples in dierent ﬁelds [20–25]. Rodriguez-Galiano et al. conducted a study on mineral exploration based on a decision tree [26]. Porwal et al. used radial function and neural network to evaluate potential maps in mineral exploration [27]. Zhang studied the relationships between chemical elements and magmatite and between the sedimentary rock lithology and sedimentary rock minerals by using a multilayer perceptron and back propagation (BP) neural network [28]. Zhang et al. predicted karst collapse based on the Gaussian process [29]. Chaki et al. carried out an inversion of reservoir parameters by combining well logging and seismic data [30]. Gaurav combined machine learning, pattern recognition, and multivariate geostatistics to estimate the ﬁnal recoverable shale gas volume [31]. Sha et al. used a convolutional neural network to characterize unfavorable geological bodies and surface issues, etc. [32]. Generally, machine learning research on stratum distributions based on drilling data is in its infancy. To solve the above problems, this study explores the feasibility and reliability of machine learning through the simulation of a geostratigraphic series and proposes a machine learning geostratigraphic series simulation method. This method does not rely on subjective factors, and it is based on the principle of a recurrent neural network [33,34] to establish a stratum simulation model. This method can determine the stratum information accurately. The predictive power of machine learning models is examined with expert-driven mechanism based on supervisory learning [35]. Compared with the traditional 3D geological modeling method, this study shows that the proposed method can better describe the real situation. This study provides a novel approach for the simulation and prediction of a geostratigraphic series. This work has far-reaching practical signiﬁcance for the accurate description of the spatial distribution of lithologic features and guidance of site selection, engineering construction, and environmental assessment. 2. Geostratigraphic Series Simulation Method Based on Machine Learning 2.1. Geostratigraphic Series A sequence refers to a series of data of a system at a speciﬁc sampling interval. In reality, sequences are a very common form of data. For example, strata have a certain thickness, and a certain stratum may be distributed throughout the whole ﬁeld or only locally (namely, the stratum division). Stratum information can be interpreted as a sequence. Therefore, the strata can be regarded as a vertically oriented spatial sequence, as shown in Figure 1. The simulation of a geostratigraphic series is based on the learning results of borehole data to predict the geostratigraphic series at any point in the study area, including the stratum type and thickness of each layer in the sequence. Appl. Sci. 2019, 9, 3553 3 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 3 of 32 Figure 1. Geostratigraphic series diagram. Figure 1. Geostratigraphic series diagram. 2.2. Stratum Data Reconstruction Schemes Based on Machine Learning 2.2. Stratum Data Reconstruction Schemes Based on Machine Learning Drilling data reconstruction schemes based on machine learning include data normalization, data Drilling data reconstruction schemes based on machine learning include data normalization, segmentation, data ﬁlling, and data coding. data segmentation, data filling, and data coding. 2.2.1. Stratum Data Normalization 2.2.1. Stratum Data Normalization Data normalization refers to the process of compressing data into a small interval, and the interval Data no is usually rmalization taken re as fers [0,1]to or the proce [1,1]. Data ss of co normalization mpressingis da essentially ta into a sma a linear ll intransformation. terval, and the interval is usually taken as [0, 1] or [−1, 1]. Data normalization is essentially a linear transformation. Data normalization does not change the variation suppress and sequence of the data. There are Dat many a norm common alizatmeans ion does of ndata ot chang normalization, e the variation such sup as press linear and normalization, sequence of the and datinverse a. Therecotangent are many common means of data normalization, such as linear normalization, and inverse cotangent normalization. In this study, the most common method of linear normalization is adopted. For any norma data point, lization. In t the program his study, t determines he most the common met spatial coordinates hod of li and near norma the maximu lizat m ion i and s adopt minimum ed. For a values ny data point, the program determines the spatial coordinates and the maximum and minimum values (X and X , respectively) of the stratum thickness after traversing all the borehole data. The above max min (X linear max and normalization Xmin, respect is iv applied ely) of the by s using tratum Equation thicknes (1): s after traversing all the borehole data. The above linear normalization is applied by using Equation (1): X = (X X )/(X X ) (1) min max min X = (X − Xmin) / (Xmax − Xmin) (1) where X is the result of normalization. where X is the result of normalization. 2.2.2. Drilling Data Segmentation and Equalization 2.2.2. Drilling Data Segmentation and Equalization Machine learning is used to ensure that the designed model achieves good prediction results Machine learning is used to ensure that the designed model achieves good prediction results in in both the training set and the test set. Therefore, before machine learning, the original drilling data both the training set and the test set. Therefore, before machine learning, the original drilling data must be divided into training data and test data. This process is called data segmentation. To ensure must be divided into training data and test data. This process is called data segmentation. To ensure the eectiveness of machine learning, randomness and uniformity of the data distribution should be the effectiveness of machine learning, randomness and uniformity of the data distribution should be ensured during sampling of the training data and test data. ensured during sampling of the training data and test data. To ensure the eectiveness of the training data, we adopt a random replication strategy for small To ensure the effectiveness of the training data, we adopt a random replication strategy for small samples. We randomly select data from boreholes with dierent numbers of geological layers to samples. We randomly select data from boreholes with different numbers of geological layers to improve the replication eect. This method is used to comprehensively study data with dierent improve the replication effect. This method is used to comprehensively study data with different characteristics, improve the prediction ability of a model for dierent numbers of geological layers, characteristics, improve the prediction ability of a model for different numbers of geological layers, increase the number of dierent layers represented by nearby drilling data, and artiﬁcially upgrade the increase the number of different layers represented by nearby drilling data, and artificially upgrade training sample data at the equilibrium level. This approach of artiﬁcially replicating small data types the training sample data at the equilibrium level. This approach of artificially replicating small data is known as over sampling [36]. types is known as over sampling [36]. 2.2.3. Geostratigraphic Series Filling 2.2.3. Geostratigraphic Series Filling When a recurrent neural network (RNN) is used to process sequential problems, input data are When a recurrent neural network (RNN) is used to process sequential problems, input data are received at every moment, and output is produced after the hidden layer has ﬁnished processing the received at every moment, and output is produced after the hidden layer has finished processing the data. Therefore, the input and output of an RNN are equal in length, and it is dicult to process data. Therefore, the input and output of an RNN are equal in length, and it is difficult to process input data of different lengths at the same time. In drilling data, the number of layers in each borehole Appl. Sci. 2019, 9, 3553 4 of 29 input data of dierent lengths at the same time. In drilling data, the number of layers in each borehole varies, and the geostratigraphic series is nonuniform. Therefore, the use of an RNN for batch training using stratum data requires ﬁlling at the tail of the geostratigraphic series without changing the original sequence of the geostratigraphic series and extending all geostratigraphic series to the same length [37]. Before training, in addition to adding a start of sequence (SOS) to the geostratigraphic series, an end of sequence (EOS) must be added to the geostratigraphic series. For each training set, the sampling process stops when the termination marker appears in the equal length geostratigraphic series output of the RNN. As two virtual stratum types, the initiation and termination markers participate in the RNN training process via the input and output. The initiation markers represent the beginning of geostratigraphic series prediction, while the termination markers represent the end of the series prediction. The introduction of termination markers teaches the RNN model to predict when a sequence will end and overcomes the shortcomings of processing unequally long sequences by the RNN. In addition, the RNN model can conduct geostratigraphic series simulations with dierent numbers of layers at any location in the research area. 2.2.4. Stratum Coding Based on One-Hot Encoding In machine learning tasks, data characteristics are not always continuous values, such as coordinates. One-hot encoding is a data processing method used to address discrete features. In geology, stratum types are finite and countable, regardless of the criteria used to divide the strata. Therefore, the set of geostratigraphic series elements is determined after crossing all the borehole data, in addition to obtaining the maximum value of each feature and the number of layers. To facilitate the search and mathematical representation, in this study, each stratum is represented by a unique digital identification [38]. 2.3. Geostratigraphic Series Simulation Based on a Recurrent Neural Network 2.3.1. Establishment of the Sequence Model of the Stratum Type The model in geostratigraphic series prediction uses the RNN as the core of the neural network. The structure is shown in Figure 2. In the machine learning tasks, the input data are coordinated in a stratum, while the output result is the simulation result of the stratum type model corresponding to the given coordinates. Since the RNN does not have a state hidden from the previous moment at the current moment, it is necessary to assign the initial state of the hidden layer neurons in the RNN before each training run. The input coordinates are the common attributes of all the strata in a geostratigraphic series, and it guides the whole process of RNN simulation of the geostratigraphic series. Therefore, the assignment process establishes the correlation between the input coordinates and RNN, guiding the geostratigraphic series simulation from the beginning. The content of the assignment is determined by the input information. After the input layer receives the coordinates of the borehole and the basic elevation information, the coordinate input information is increased from the original three dimensions to the number of dimensions equal to the number of neurons. It serves as the initial state of the hidden layer neurons in the RNN. Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 32 Appl. Sci. 2019, 9, 3553 5 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 5 of 32 Figure 2. Schematic diagram of the stratum type model. Figure 2. Schematic diagram of the stratum type model. Figure 2. Schematic diagram of the stratum type model. At each moment, the RNN receives input of the neuron state and stratum information from the At each moment, the RNN receives input of the neuron state and stratum information from the At each moment, the RNN receives input of the neuron state and stratum information from the previous moment, and outputs the judgement of the stratum type through hidden layer calculations. previous moment, and outputs the judgement of the stratum type through hidden layer calculations. previous moment, and outputs the judgement of the stratum type through hidden layer calculations. By introducing an n-dimensional correct value vector, each item in the weight vector represents the By introducing an n-dimensional correct value vector, each item in the weight vector represents the By introducing an n-dimensional correct value vector, each item in the weight vector represents the possibility of a certain stratum. The larger the value is, the higher the probability of a certain stratum. possibility of a certain stratum. The larger the value is, the higher the probability of a certain stratum. possibility of a certain stratum. The larger the value is, the higher the probability of a certain stratum. Thus, the most likely stratum is the predicted value at that moment. Repeating the above process and Thus, the most likely stratum is the predicted value at that moment. Repeating the above process and Thus, the most likely stratum is the predicted value at that moment. Repeating the above process and removing the termination marker in the output, we can obtain the model’s simulation results for the removing the termination marker in the output, we can obtain the model’s simulation results for the removing the termination marker in the output, we can obtain the model’s simulation results for the input coordinate information of the geostratigraphic series. input coordinate information of the geostratigraphic series. input coordinate information of the geostratigraphic series. 2.3.2. Establishment of the Series Model of the Stratum Thickness 2.3.2. Establishment of the Series Model of the Stratum Thickness 2.3.2. Establishment of the Series Model of the Stratum Thickness Sequence Sequence-to-sequence -to-sequence (o (or r seq2se seq2seq) q) le learning arning h has as been beenwidely used widely usedin the proce in the processing ssing of m of machine achine Sequence-to-sequence (or seq2seq) learning has been widely used in the processing of machine tran translation slation an and d speech speech reco recognition, gnition, a also lsoknown known a assthe theencoder encoder -de -dec coder oder ne network. tworkIt . It map maps sequences s sequences as translation and speech recognition, also known as the encoder-decoder network. It maps sequences as input inpu to t to o output utpu sequences t sequences thr throu oughgdeep h deep neural neura networks. l networksThe . The seq2seq seq2seq m model odeis l is shown shown inin Figur Figu ere 3 . as input to output sequences through deep neural networks. The seq2seq model is shown in Figure 3. This This pr proce ocess includes ss include two s tw steps, o steps, input inp encoding ut encod and ingoutput and output decoding decoding and these and these two links two lin are handled ks are 3. This process includes two steps, input encoding and output decoding and these two links are handled by by the encoder the encoder and decoder and decoder , respectively , respectively. . The encoderThe encoder is responsibleis re for conv sponsib erting le for conver a variable-length ting a handled by the encoder and decoder, respectively. The encoder is responsible for converting a vari input able series -lenginto th inp a ﬁxed-length ut series into vector a fixed . This -lengﬁxed-length th vector. This vector fixed-length vec contains information tor contains about info the rmation input variable-length input series into a fixed-length vector. This fixed-length vector contains information ab series. out th The e in encoder put seisrie responsible s. The encoder for decoding is resp this onsﬁxed-length ible for decod vector ingand this f generating ixed-leng ath v variable-length ector and about the input series. The encoder is responsible for decoding this fixed-length vector and generating a variable-len output series according to gth the output information series accor content ding to the in the vector form repration con esents. tent the vector represents. generating a variable-length output series according to the information content the vector represents. Figure 3. The sequence-to-sequence (seq2seq) model. Figure 3. The sequence-to-sequence (seq2seq) model. Figure 3. The sequence-to-sequence (seq2seq) model. In contrast to the traditional RNN, the seq2seq architecture does not require input and generates In contrast to the traditional RNN, the seq2seq architecture does not require input and generates In contrast to the traditional RNN, the seq2seq architecture does not require input and generates output at every moment. Instead, the algorithm converts the input series of the stratum types into output at every moment. Instead, the algorithm converts the input series of the stratum types into a output at every moment. Instead, the algorithm converts the input series of the stratum types into a vector with the help of the encoder, and then outputs the results through the decoder. In other words, vector with the help of the encoder, and then outputs the results through the decoder. In other words, Appl. Sci. 2019, 9, 3553 6 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 6 of 32 a vector with the help of the encoder, and then outputs the results through the decoder. In other words, seq2seq carries more information when making predictions than the traditional RNN and infers the seq2seq carries more information when making predictions than the traditional RNN and infers the seq2seq carries more information when making predictions than the traditional RNN and infers the output content based on the input series as a whole. output content based on the input series as a whole. output content based on the input series as a whole. In this study, two RNNs are used as the encoder and decoder which are connected to each other. In this study, two RNNs are used as the encoder and decoder which are connected to each other. In this study, two RNNs are used as the encoder and decoder which are connected to each other. Seq2seq is now widely used to process machine translation and speech recognition problems, thus, Seq2seq is now widely used to process machine translation and speech recognition problems, thus, Seq2seq is now widely used to process machine translation and speech recognition problems, thus, we we apply it to the layer thickness recognition problem, that is to say, given the geostratigraphic series we apply it to the layer thickness recognition problem, that is to say, given the geostratigraphic series apply it to the layer thickness recognition problem, that is to say, given the geostratigraphic series x x = [x1, x2, x3, …,xn], an equal-length thickness sequence d = [d1, d2, d3, …,dn] is generated. N is the x = [x1, x2, x3, …,xn], an equal-length thickness sequence d = [d1, d2, d3, …,dn] is generated. N is the = [x1, x2, x3, : : : ,xn], an equal-length thickness sequence d = [d1, d2, d3, : : : ,dn] is generated. N is length of the sequence (i.e., the total number of strata at that point). The encoder receives the type length of the sequence (i.e., the total number of strata at that point). The encoder receives the type the length of the sequence (i.e., the total number of strata at that point). The encoder receives the information of the current stratum at each moment, n times in total. After the input has been information of the current stratum at each moment, n times in total. After the input has been type information of the current stratum at each moment, n times in total. After the input has been completely received, the hidden state, at the last moment of the encoder, is taken as the initial state completely received, the hidden state, at the last moment of the encoder, is taken as the initial state completely received, the hidden state, at the last moment of the encoder, is taken as the initial state to guide the decoder. Then, the decoder outputs the thickness of each layer step-by-step. The above to guide the decoder. Then, the decoder outputs the thickness of each layer step-by-step. The above to guide the decoder. Then, the decoder outputs the thickness of each layer step-by-step. The above process and model structure are shown in Figure 4. process and model structure are shown in Figure 4. process and model structure are shown in Figure 4. Figure 4. Structure diagram of the two-layer prediction network. Figure 4. Structure diagram of the two-layer prediction network. Figure 4. Structure diagram of the two-layer prediction network. 2. 2.3.3. 3.3. Es Establishment tablishment oof f the G the Geostratigraphic eostratigraphic SSeries eries Mo Modeling deling 2.3.3. Establishment of the Geostratigraphic Series Modeling The st The stratum ratum thi thickness ckness model model uses rea uses real l stra stratum tum type type datdata a in the in tr the aining training process process. . In prIn actic practice, e, the The stratum thickness model uses real stratum type data in the training process. In practice, the rea the l srteal ratum stratum type itype s unkno is unknown, wn, and the and outp the utoutput sequensequence ce of the str ofat the umstratum type motype del shou model ld bshould e used abe s real stratum type is unknown, and the output sequence of the stratum type model should be used as the j used udg asem the en judgement t basis. The basis. outpu The t of output the straof tum the typ stratum e model type is conn model ected is connected with the encode with the r ofencoder the layeof r the judgement basis. The output of the stratum type model is connected with the encoder of the layer thickn the layer ess mo thickness del. Wemodel. can obta Win e can a compl obtain ete a geo complete stratigraphic geostratigraphic series model. series The model. simula The tion simulation sequence thickness model. We can obtain a complete geostratigraphic series model. The simulation sequence of sequence the layer t ofh the icknes layer s is sh thickness own in Fig is shown ures 5 and 6. in Figur es 5 and 6. of the layer thickness is shown in Figures 5 and 6. Figure 5. Diagram of the neural network structure for stratum prediction. Figure 5. Diagram of the neural network structure for stratum prediction. Figure 5. Diagram of the neural network structure for stratum prediction. Appl. Sci. 2019, 9, 3553 7 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 7 of 32 Figure 6. Figure 6. Sim Simulation ulation proces process s o of f the stratu the stratum m t thickness hickness sequence. sequence. 2.4. Evaluation Method of Stratum Type Series Simulation 2.4. Evaluation Method of Stratum Type Series Simulation The stratum accuracy, the series edit distance, and the geostratigraphic series similarity based The stratum accuracy, the series edit distance, and the geostratigraphic series similarity based on the edit distance are used to evaluate the simulation performance of the series models of the on the edit distance are used to evaluate the simulation performance of the series models of the stratum type. stratum type. The stratum accuracy is the simplest evaluation index. By comparing elements at corresponding The stratum accuracy is the simplest evaluation index. By comparing elements at corresponding positions of the simulated sequence and the real geostratigraphic series, the proportion of the same positions of the simulated sequence and the real geostratigraphic series, the proportion of the same number of strata in the total number of strata was calculated by Equation (2): number of strata in the total number of strata was calculated by Equation (2): Correct stratum number (2)  The edit distance is a standard that is used to measure the similarity of series. The edit distance <math> represents the minimum number of edit operations required for one series to be converted into another series after insertion, deletion, and replacement. The smaller the edit distance between the two series, <semantics> the more similar the two series are. Since the length of the series for edit distance alignment is dierent, <mrow> the longer series has a notably higher similarity when editing two series with the same distance. To better describe the closeness of series, the following Equation (3) is used in the calculation of the <mfrac> similarity of series: D(S, T) <mrow> L(S, T) = 1 (3) max(jSj, jTj) (2) where D(S, T) represents the edit distance between series S and T. <mtext> Correct stratum number</mtext></mrow> There is no exact equation for calculating D(S, T). Its calculation examples are as follows: <mrow> Suppose there are two geostratigraphic series, t1 = [silt, ﬁne sand, silt, clay, silt, clay] and t2 = [miscellaneous ﬁll, sand, ﬁne sand, silt, clay]. In order to convert t1 to t2, the implementation process of the minimum operation times is as follows: <mtext>Total formation number of test&#x00 1. Replace the ﬁrst “silt” with “sand”; A0;data</mtext></mrow> 2. Insert “miscellaneous ﬁll” at the beginning of t1; </mfrac> </mrow> <annotation encoding='MathType-MTEF'>MathType@MTEF@5@5@+= Appl. Sci. 2019, 9, 3553 8 of 29 3. Remove the last “clay”; Appl. Sci. 2019, 9, x FOR PEER REVIEW 10 of 32 4. Delete the ﬁnal “silt”. Although the transition from one series to another through several insertions, deletions, and Throughout the above four steps to replace, delete, and insert operations, the geostratigraphic substitutions has many possibilities, the editing distance D (S, T) between the two series is always series t1 changed to series t2. Thus, the two geostratigraphic series of edit distance D(S, T) is 4. unique. Although the transition from one series to another through several insertions, deletions, and substitutions has many possibilities, the editing distance D(S, T) between the two series is always unique. 3. Results and Discussions 3. Results and Discussions 3.1. Study of the Regional Geology and Data Reconstruction Schemes 3.1. Study of the Regional Geology and Data Reconstruction Schemes The research area is located in a city in eastern China with a plain topography. The soil in the The research area is located in a city in eastern China with a plain topography. The soil in the study area is mainly composed of sandy soil, cohesive soil, and silty soil. The local strata are silt and study area is mainly composed of sandy soil, cohesive soil, and silty soil. The local strata are silt and silty soil. The research data come from the city’s geological survey work. There is a total of 1386 silty soil. The research data come from the city’s geological survey work. There is a total of 1386 borehole borehole data datasets, sets, and and all all the boreho the boreholes les termin terminate ate on on the the be bedr drock ock surface. surface. A A total totof al o 13f stratum 13 stratu types m types were determined. These boreholes are nonuniformly distributed in an area of 3882 square kilometers, were determined. These boreholes are nonuniformly distributed in an area of 3882 square kilometers, as shown in Figure 7. as shown in Figure 7. Figure 7. Figure 7. Distribution of the b Distribution of the bor oreholes eholes in in the the study study area. area. Using the reconstruction scheme of the stratum data proposed in this study, the drilling data are Using the reconstruction scheme of the stratum data proposed in this study, the drilling data are reconstructed. The speciﬁc operation process is as follows: reconstructed. The specific operation process is as follows: 1. Data normalization: In this study, the borehole data are used and the x coordinates, y coordinates, 1. Data normalization: In this study, the borehole data are used and the x coordinates, y hole elevation, and stratum thickness are continuous values. After reviewing all the borehole data, coordinates, hole elevation, and stratum thickness are continuous values. After reviewing all the it is found that the coordinates of the borehole data used the Xi’an 80 coordinate system, and their borehole data, it is found that the coordinates of the borehole data used the Xi’an 80 coordinate value reaches the millions, while the elevation of the oriﬁce and the thickness of the strata are only system, and their value reaches the millions, while the elevation of the orifice and the thickness of the within 100 m. The dierence between each characteristic is large and can be up to tens of thousands. strata are only within 100 m. The difference between each characteristic is large and can be up to tens To ensuring the same dimension, the above borehole data characteristics are compressed into the of th interval ousands. of To ensu [0,1] by linear ring normalization the same pr diocessing. mension, the above borehole data characteristics are 2. Drilling data segmentation and equalization: In this study, the training data and test data are compressed into the interval of [0,1] by linear normalization processing. selected randomly according to the ratio of 4:1 among all drilling points, and the data are balanced 2. Drilling data segmentation and equalization: In this study, the training data and test data are according to the number of layers. The spatial positions of the training data and test data are shown selected randomly according to the ratio of 4:1 among all drilling points, and the data are balanced in Figure 8. according to the number of layers. The spatial positions of the training data and test data are shown in Figure 8. Appl. Sci. 2019, 9, 3553 9 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 11 of 32 Figure 8. Schematic diagram of the training data and test data distributions. Figure 8. Schematic diagram of the training data and test data distributions. Figur Figure 8 sho e 8 shows ws th the e loca location tion dis distributions tributions ofof the tra the training ining datdata a and and testtest data data in the in st the udy study area af arter ea after the original drilling data are segmented into training data and test data, where the red symbols the original drilling data are segmented into training data and test data, where the red symbols r rep eprresent esent th theetraining training d data, ataand , and the the gr green een sym symbols bol repr s rep esent resen thet th test e tes data. t dThe ata. T positions, he positions plotting , plot scale, ting and geographic coordinates in Figure 8 are the same as in Figure 7. scale, and geographic coordinates in Figure 8 are the same as in Figure 7. 3. 3. Stratum Stratum co coding: ding: Accor Accord ding ing to to the the statistics, statistithe cs, the b borehole orehole s stratum tradata tum dat used, a in used this , in th studyis s , contain tudy, a total of 13 types of strata and 15 types of initiation and termination markers artiﬁcially introduced contain a total of 13 types of strata and 15 types of initiation and termination markers artificially in intro the duced subsequent in the sub geostratigraphic sequent geostr series. atigrap The hicnumbers series. The zer n ou to mbers 14 wer zero e assigned, to 14 we and re assig vectorization ned, and was vectorization carried out was c by one-hot arried o encoding. ut by one-ho Thet encoding number and . Th coding e number vectors and coding ve of the stratu ctors o m types f the stratum are shown in types a Table re shown in T 1. able 1. Table 1. Strata numbers and one-hot vectors. Table 1. Strata numbers and one-hot vectors. Stratum Types Number Coding Vector Stratum Types Number Coding Vector clay clay 0 0 (1(1, , 0, 0, 0, 0, 00, , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 00, , 0, 0, 0, 0, 0 0) , 0) silt 1 (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silt 1 (0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) plain ﬁll 2 (0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) plain fill 2 (0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) miscellaneous ﬁll 3 (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) miscellaneous fill 3 (0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silty sand 4 (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silty sand 4 (0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) silty clay 5 (0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) mucky soil 6 (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0) silty clay 5 (0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0) mucky clay 7 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0) mucky soil 6 (0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0) old city ﬁll 8 (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) mucky clay 7 (0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0) clay sand inclusion 9 (0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) old city fill 8 (0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0) mud 10 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0) clay sand inclusion 9 (0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) medium sand 11 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) intermediate ﬁne sand 12 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) mud 10 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0) start mark 13 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0) medium sand 11 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0) end mark 14 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) intermediate fine 12 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0) sand 4. Geostratigraphic series ﬁlling: According to the statistics, the maximum number of strata start mark 13 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0) in the study data is 10. Therefore, the ﬁlling length of the geostratigraphic series should be larger than end mark 14 (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1) 10 layers. For simplicity, the termination marker is used here to ﬁll all geostratigraphic series to the 11th layer 4. Ge.oSuppose stratigrap that hic ser all istratum es fillingtypes : Acco of rda ing bor to ehole the st ar aeticlay stics, , silt, the m silt axim sand, um clay num , and ber of mucky strata clay in, and the corresponding number vector is expressed as (0,1,4,0,7). The termination marker denoted by the study data is 10. Therefore, the filling length of the geostratigraphic series should be larger than 10 layers. For simplicity, the termination marker is used here to fill all geostratigraphic series to the 11th layer. Suppose that all stratum types of a borehole are clay, silt, silt sand, clay, and mucky clay, Appl. Sci. 2019, 9, 3553 10 of 29 the number 14 is repeatedly added at the end of the vector until the length of the numbered vector is 11. Finally, the geostratigraphic series data input of the machine learning model is obtained by replacing each item in the numbered vector with the corresponding one-hot encoding vector. 3.2. Machine Learning Simulation Result Analysis We have implemented the proposed algorithms written by Python software in the computer. Part of the algorithm code is as follows: 1. class CrossLoss(nn.Module): 2. def __init__(self,ignore_index = 0): 3. super(CrossLoss, self).__init__() 4. self.ignore_index = ignore_index 5. self.criterion = nn.CrossEntropyLoss(ignore_index = 0) 6. def forward(self, input, target): 7. ind = (target ! = self.ignore_index).ﬂoat() 8. num_all = torch.sum(ind).data[0] 9. #print(target) 10. size0 = target.size(0) 11. size1 = target.size(1) 12. temp = target.cpu().data 13. for i in range(size0): 14. for j in range(size1): 15. temp[i,j] = depthLabel(temp[i,j]) 16. pred = torch.mul(input,ind).long() 17. temp = temp.long() 18. loss = self.criterion(pred, temp) 19. return loss, num_all As the procedure may be further commercialized, it is not suitable to make it all public for the time being. Information about the algorithm’s computer performance is as follows: CPU: Intel Core i7-4790k @ 4.00GHz quad-core; Memory: 32 GB; VGA card: Nvidia GeForce GTX 770(2GB). 3.2.1. Training and Veriﬁcation of the Stratum Type Series Model (1) Model Training The cross-entropy loss function is used to describe the performance of the model in the training process. Figure 9 shows that as the number of training rounds increases, the loss value decreases continuously. However, the gradient of the loss curve begins to decrease after several cycles, and the amplitude of change gradually decreases. The ﬁnal loss value ﬂuctuates in a small range and tends to be stable. Appl. Sci. 2019, 9, x FOR PEER REVIEW 12 of 32 and the corresponding number vector is expressed as (0,1,4,0,7). The termination marker denoted by the number 14 is repeatedly added at the end of the vector until the length of the numbered vector is 11. Finally, the geostratigraphic series data input of the machine learning model is obtained by replacing each item in the numbered vector with the corresponding one-hot encoding vector. 3.2. Machine Learning Simulation Result Analysis We have implemented the proposed algorithms written by Python software in the computer. Part of the algorithm code is as follows: 1 class CrossLoss(nn.Module): 2 def __init__(self,ignore_index = 0): 3 super(CrossLoss, self).__init__() 4 self.ignore_index = ignore_index 5 self.criterion = nn.CrossEntropyLoss(ignore_index = 0) 6 def forward(self, input, target): 7 ind = (target ! = self.ignore_index).float() 8 num_all = torch.sum(ind).data[0] 9 #print(target) 10 size0 = target.size(0) 11 size1 = target.size(1) 12 temp = target.cpu().data 13 for i in range(size0): 14 for j in range(size1): 15 temp[i,j] = depthLabel(temp[i,j]) 16 pred = torch.mul(input,ind).long() 17 temp = temp.long() 18 loss = self.criterion(pred, temp) 19 return loss, num_all As the procedure may be further commercialized, it is not suitable to make it all public for the time being. Information about the algorithm’s computer performance is as follows: • CPU: Intel Core i7-4790k @ 4.00GHz quad-core; • Memory: 32 GB; • VGA card: Nvidia GeForce GTX 770(2GB). 3.2.1. Training and Verification of the Stratum Type Series Model (1) Model Training The cross-entropy loss function is used to describe the performance of the model in the training process. Figure 9 shows that as the number of training rounds increases, the loss value decreases continuously. However, the gradient of the loss curve begins to decrease after several cycles, and the amplitude of change gradually decreases. The final loss value fluctuates in a small range and tends Appl. Sci. 2019, 9, 3553 11 of 29 to be stable. Appl. Sci. 2019, 9, x FOR PEER REVIEW 13 of 32 Figure 9. Loss curve of the ﬁrst 50 training rounds. Figure 9. Loss curve of the first 50 training rounds. The model has completed most of its loss reduction after 50 training rounds, as shown in Figure 10. The model has completed most of its loss reduction after 50 training rounds, as shown in Figure After 50 rounds, the loss function tends to be stable, and the model is slowly learning from the training 10. After 50 rounds, the loss function tends to be stable, and the model is slowly learning from the data. The speciﬁc decline in the loss function is listed in Table 2. training data. The specific decline in the loss function is listed in Table 2. Figure 10. Figure 10. Lo Loss ss curve after curve after 500 500 training rounds. training rounds. Table 2. Statistical table of the loss decline. Table 2. Statistical table of the loss decline. Round Number 50 500 Round Number 50 500 Loss value 0.483226 0.374167 Loss value 0.483226 0.374167 Cumulative decline 0.327009 0.436068 Cumulative decline 0.327009 0.436068 Cumulative decline 40.36% 53.82% Cumulative decline 40.36% 53.82% (2) Model Test (2) Model Test The trained and ﬁnally stable model was tested, and the coordinate information of the test borehole The trained and finally stable model was tested, and the coordinate information of the test data was inputted successively. The position of the termination marker in the simulated stratum borehole data was inputted successively. The position of the termination marker in the simulated type sequence output by the model was searched and intercepted. All the elements before the ﬁrst stratum type sequence output by the model was searched and intercepted. All the elements before termination marker were taken as the stratum prediction series. By comparing the predicted value with the first termination marker were taken as the stratum prediction series. By comparing the predicted the real value one-to-one, the single-layer accuracy of the geostratigraphic series is tested. Then the value with the real value one-to-one, the single-layer accuracy of the geostratigraphic series is tested. similarity between the prediction sequence and the real geostratigraphic series is evaluated by using Then the similarity between the prediction sequence and the real geostratigraphic series is evaluated the edit distance algorithm. by using the edit distance algorithm. The accuracy of stratum type simulation varies with the training round, as shown in Figure 11. The accuracy of stratum type simulation varies with the training round, as shown in Figure 11. Figure 11 shows that as the number of training rounds increases, the overall prediction ability of Figure 11 shows that as the number of training rounds increases, the overall prediction ability of the the model continues to improve, and the accuracy of the stratum type and geostratigraphic series model continues to improve, and the accuracy of the stratum type and geostratigraphic series prediction is rapidly improved. The accuracy of the ﬁnal stratum type prediction was stable at 59.86%. prediction is rapidly improved. The accuracy of the final stratum type prediction was stable at 59.86%. As the loss function curve changes, the accuracy curve increases gradually. The accuracy achieved As the loss function curve changes, the accuracy curve increases gradually. The accuracy achieved in in the ﬁrst 50 rounds is almost the same as the ﬁnal accuracy. the first 50 rounds is almost the same as the final accuracy. The prediction of a single stratum is the ﬁrst step in establishing a spatial stratum distribution The prediction of a single stratum is the first step in establishing a spatial stratum distribution model. In addition to the accurate prediction of a single stratum, it is of greater concern whether model. In addition to the accurate prediction of a single stratum, it is of greater concern whether the model can make an accurate overall prediction of the geostratigraphic series in the study area. Then, the edit distance algorithm is used to evaluate the similarity between the simulated sequence and the real geostratigraphic series. If the edit distance between the prediction sequence and the real geostratigraphic series is larger than one, the prediction failed and will not be considered. The edit distance changes are shown in Figure 12. Appl. Sci. 2019, 9, 3553 12 of 29 the model can make an accurate overall prediction of the geostratigraphic series in the study area. Then, the edit distance algorithm is used to evaluate the similarity between the simulated sequence and Appl. the Sci. real 2019 geostratigraphic , 9, x FOR PEER REVIEW series. If the edit distance between the prediction sequence and the14 of real 32 geostratigraphic series is larger than one, the prediction failed and will not be considered. The edit Appl. Sci. 2019, 9, x FOR PEER REVIEW 14 of 32 distance changes are shown in Figure 12. Figure 11. Variation diagram of the simulation accuracy of the RNN model. Figure 11. Variation diagram of the simulation accuracy of the RNN model. Figure 11. Variation diagram of the simulation accuracy of the RNN model. Figure 12. Variation curve of the geostratigraphic series edit distance with time. Figure 12. Variation curve of the geostratigraphic series edit distance with time. In Figure 12, the lower curve indicates that the edit distance is zero, i.e., the proportion of the Figure 12. Variation curve of the geostratigraphic series edit distance with time. In Figure 12, the lower curve indicates that the edit distance is zero, i.e., the proportion of the number of boreholes in the predicted result in the test set is exactly equal to the real result. The above number of boreholes in the predicted result in the test set is exactly equal to the real result. The above curve indicates the proportion of the number of boreholes within an edit distance of one, i.e., the model In Figure 12, the lower curve indicates that the edit distance is zero, i.e., the proportion of the curve indicates the proportion of the number of boreholes within an edit distance of one, i.e., the makes no more than one wrong prediction in the whole sequence prediction process. The predicted number of boreholes in the predicted result in the test set is exactly equal to the real result. The above model makes no more than one wrong prediction in the whole sequence prediction process. The sequence can be converted into a real stratum sequence by a single insertion, replacement, or deletion curve indicates the proportion of the number of boreholes within an edit distance of one, i.e., the predicted sequence can be converted into a real stratum sequence by a single insertion, replacement, operation. In the end, the former curve converges to 35.2%, while the latter curve converges to 74%. model makes no more than one wrong prediction in the whole sequence prediction process. The or deletion operation. In the end, the former curve converges to 35.2%, while the latter curve Because the number of layers is dierent, it is dicult to accurately describe the similarity between predicted sequence can be converted into a real stratum sequence by a single insertion, replacement, converges to 74%. the predicted series and the real result by applying the edit distance alone. Therefore, the similarity or deletion operation. In the end, the former curve converges to 35.2%, while the latter curve Because the number of layers is different, it is difficult to accurately describe the similarity calculation equation based on the edit distance is adopted. The variation curve of the predicted series converges to 74%. between the predicted series and the real result by applying the edit distance alone. Therefore, the similarity with the number of training rounds is shown in Figure 13. Because the number of layers is different, it is difficult to accurately describe the similarity similarity calculation equation based on the edit distance is adopted. The variation curve of the between the predicted series and the real result by applying the edit distance alone. Therefore, the predicted series similarity with the number of training rounds is shown in Figure 13. similarity calculation equation based on the edit distance is adopted. The variation curve of the predicted series similarity with the number of training rounds is shown in Figure 13. Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 32 Appl. Sci. 2019, 9, 3553 13 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 15 of 32 Figure 13. Similarity curve of the geostratigraphic series. In Figure 13, with an increase in training rounds, the overall prediction ability of the model is Figure 13. Similarity curve of the geostratigraphic series. Figure 13. Similarity curve of the geostratigraphic series. continuously improved, and the average similarity curve between the predicted series and the actual geostratigraphic series also gradually increases and finally converges to 70.9%. This result shows that In In F Figur igure e 1 13 3,, with with an an i incr ncrease ease in in t training raining r rounds, ounds, the the overall prediction overall prediction ability ability of of t the he model model is is model accuracy continuously improves with increasing training rounds in the learning process and continuously improved, and the average similarity curve between the predicted series and the actual continuously improved, and the average similarity curve between the predicted series and the actual gradually establishes the correlation between the elevation information and the geostratigraphic geostratigraphic geostratigraphic ser series ies also also gradually gradually inc incr reases eases and and ﬁnally finally converg converges es to to 70. 70.9%. 9%. Th This is res result ult show shows s that that series in the study area. model accuracy continuously improves with increasing training rounds in the learning process and model accuracy continuously improves with increasing training rounds in the learning process and (3) Testing the Effect of Expert-Driven Learning gradually gradually establishes establishesthe the corr correl elation ation between between the the e elevation levatinformation ion informat and ion the and geostratigraphic the geostratigrap series hic To improve the learning performance of the RNN and test the effect of expert-driven learning, in the study area. series in the study area. this study conducted the training and testing of the expert-driven model based on supervisory (3) (3) Tes Testing ting the the Ef Efec ecttof ofExpert-Driven Expert-Driven Learning Learning learning in accordance with four ratios using the same dataset. The four expert ratios are 1/3, 1/2, 2/3 To improve the learning performance of the RNN and test the eect of expert-driven learning, this To improve the learning performance of the RNN and test the effect of expert-driven learning, and 1, i.e., expert-driven learning is carried out once every three rounds, once every two rounds, and study this stconducted udy conduct the ed training the traan inin dg testing and tes oftthe ing o expert-driven f the expert-dr model iven m based odel on bsupervisory ased on suplearning ervisory twice every three rounds, and the entire training process is conducted in the form of expert-driven in leaaccor rning dance in accordance with four wi ratios th fo using ur ratio the s usin same g dataset. the same The datfour aset.expert The foratios ur expar ert e r 1a /3, tios 1/2, ar2 e /3 1/and 3, 1/1, 2, i.e., 2/3 learning. expert-driven and 1, i.e., explearning ert-driveis n le carried arning out is car once rieevery d out thr once e ee rv ounds, ery three once roevery unds, two once r ounds, every two roun and twice ds, and every Figures 14–17 show the loss function curves of expert-driven learning using different factors. twice every three rounds, and the entire training process is conducted in the form of expert-driven three rounds, and the entire training process is conducted in the form of expert-driven learning. Since the model is based on the prediction results of both expert-driven learning and non-expert- learning. Figures 14–17 show the loss function curves of expert-driven learning using dierent factors. driven learning, the loss function is banded in the first three figures. The model obtained a higher Figures 14–17 show the loss function curves of expert-driven learning using different factors. Since the model is based on the prediction results of both expert-driven learning and non-expert-driven descent gradient under the guidance of correct monitoring signals as compared with the ordinary learning, Since the mo the loss del is function based on the is banded pred in icthe tionﬁrst resu thr lts o eefﬁgur bothes. expert-driven The model obtained learning aand non-expert- higher descent RNN model. The larger the proportion of expert-driven learning in the learning process is, the higher driven learning, the loss function is banded in the first three figures. The model obtained a higher gradient under the guidance of correct monitoring signals as compared with the ordinary RNN model. the rate of loss reduction. When expert-driven learning is completely adopted, the model loss The descen lart gr ger ad the iepr nt un oportion der the of gui expert-driven dance of correct m learning on in ito the ring s learning ignals pr as com ocessp is, ared the with higher the or the rate dinary of function curve decreases the fastest. Almost all of the gradient descent is completed within the first RNN model. The larger the proportion of expert-driven learning in the learning process is, the higher loss reduction. When expert-driven learning is completely adopted, the model loss function curve 50 training rounds. decr the ra eases te o the f lo fastest. ss redu Almost ction. Wh all of en the expert- gradient driven descent learnin is completed g is complete within ly a the do ﬁrst pted, 50 training the model rounds. loss function curve decreases the fastest. Almost all of the gradient descent is completed within the first 50 training rounds. Figure 14. Expert-driven learning with a factor of 1/3. Figure 14. Expert-driven learning with a factor of 1/3. Figure 14. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Appl. Sci. 2019, 9, 3553 14 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 16 of 32 Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of ½. Figure 15. Expert-driven learning with a factor of . Figure 16. Expert-driven learning with a factor of 2/3. Figure 16. Expert-driven learning with a factor of 2/3. Figure Figure 16. 16. Expert-driven Expert-driven lear learning ning with a factor of 2/3 with a factor of 2/3. . Figure 16. Expert-driven learning with a factor of 2/3. Figure 17. Full expert-driven learning. Figure 17. Full expert-driven learning. Figure 17. Full expert-driven learning. Figure 17. Full expert-driven learning. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. Figure 17. Full expert-driven learning. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. The single-stratum accuracy rate curve in each test round is shown in Figures 18–21. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Figure 18. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Appl. Sci. 2019, 9, 3553 15 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 17 of 32 Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of ½. Figure 19. Expert-driven learning with a factor of . Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 20. Expert-driven learning with a factor of 2/3. Figure 21. Full expert-driven learning. Figure Figure 21. 21. Full Full expert-driven expert-driven learning. learning. The accuracy of the model simulation results under different tutor ratios is shown in Table 3 Figure 21. Full expert-driven learning. below. Figure 21. Full expert-driven learning. The accuracy of the model simulation results under dierent tutor ratios is shown in Table 3 below. The accuracy of the model simulation results under different tutor ratios is shown in Table 3 The accuracy of the model simulation results under different tutor ratios is shown in Table 3 below. The accuracy of the m Table 3. ode Stratu l sim m type accuracy unde ulation results undr different expert ratios. er different tutor ratios is shown in Table 3 Table 3. Stratum type accuracy under dierent expert ratios. below. below. Table 3. Stratum type accuracy under different expert ratios. Expert Ratio 0 1/3 1/2 2/3 1 Expert Ratio 0 1/3 1/2 2/3 1 Table 3. Stratum type accuracy under different expert ratios. Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% ExMaximumpert Ratio value 61.42%0 63.83%1/3 1 64.82%/2 2 63.40% 64.82%/3 1 Table 3. Stratum type accuracy under different expert ratios. Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Expert Ratio 0 1/3 1/2 2/3 1 Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Maximum value 61.42% 63.83% 64.82% 63.40% 64.82% Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the Steady value 59.86% 60.00% 62.41% 61.13% 60.42% Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data St beead tween y va the plue rediction 59ser.86% ies of 60 the.00% mod 62 el and the r.41% e 61 al geostrat.1igr3% a 60 phic serie.4 s2% . De tailed Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. Figures 22–25 show the proportion of the drilling data with edit distances of zero and one in the total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. total test data between the prediction series of the model and the real geostratigraphic series. Detailed statistics are shown in Table 4. statistics are shown in Table 4. Figure 22. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Figure 22. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 18 of 32 Figure 22. Expert-driven learning with a factor of 1/3. Figure 22. Expert-driven learning with a factor of 1/3. Appl. Sci. 2019, 9, 3553 16 of 29 Figure 22. Expert-driven learning with a factor of 1/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 23. Expert-driven learning with a factor of 2/3. Figure 24. Expert-driven learning with a factor of ½. Figure 24. Expert-driven learning with a factor of ½. Figure 24. Expert-driven learning with a factor of ½. Figure 24. Expert-driven learning with a factor of . Figure 24. Expert-driven learning with a factor of ½. Figure 25. Full expert-driven learning. Figure 25. Full expert-driven learning. Figure 25. Full expert-driven learning. Table 4. Statistical results of the edit distance under the different expert ratios. Figure 25. Full expert-driven learning. Table 4. Statistical results of the edit distance under the dierent expert ratios. Figure 25. Full expert-driven learning. Table 4. Stat Expe istrt Rat ical results of io the edit distance under the d 0 1/3 iff1 erent expert ratios. /2 2/3 1 Table 4. Statistical results of the edit distance under the different expert ratios. Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Table 4. Statistical results of the edit distance under the different expert ratios. Expert Ratio 0 1/3 1/2 2/3 1 Edit Distance = 0 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Edit Distance Ex= pe 0 rt Ratio 0 1/3 1/2 2/3 1 Steady value 35.2% 38% 38.4% 38.4% 35.6% Steady value 35.2% 38% 38.4% 38.4% 35.6% Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Expert Ratio 0 1/3 1/2 2/3 1 Edit Distance = 0 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Maximum value 76% 77.2% 76.4% 77.2% 76.4% Maximum value 76% 77.2% 76.4% 77.2% 76.4% Edit Edit Dista Distance nce <= = 0 1 Steady value 35.2% 38% 38.4% 38.4% 35.6% Edit Distance <= 1 Maximum value 37.2% 39.6% 39.2% 39.6% 36.4% Steady value 74% 75.6% 75.6% 75.6% 73.6% StStead eadyy vvaalluuee 3574%.2% 38% 75.6% 38 75.4% .6% 7538.4% .6% 7335.6% .6% Edit Distance = 0 Maximum value 76% 77.2% 76.4% 77.2% 76.4% Steady value 35.2% 38% 38.4% 38.4% 35.6% Edit Distance <= 1 Maximum value 76% 77.2% 76.4% 77.2% 76.4% Steady value 74% 75.6% 75.6% 75.6% 73.6% Edit Distance <= 1 Maximum value 76% 77.2% 76.4% 77.2% 76.4% The similarity curves between the prediction series of the model and the real geostratigraphic Steady value 74% 75.6% 75.6% 75.6% 73.6% The similarity curves between the prediction series of the model and the real geostratigraphic Edit Distance <= 1 Steady value 74% 75.6% 75.6% 75.6% 73.6% series under different expert ratios is shown in Figures 26–29. series under dierent expert ratios is shown in Figures 26–29. The similarity curves between the prediction series of the model and the real geostratigraphic The similarity curves between the prediction series of the model and the real geostratigraphic series under different expert ratios is shown in Figures 26–29. The similarity curves between the prediction series of the model and the real geostratigraphic series under different expert ratios is shown in Figures 26–29. series under different expert ratios is shown in Figures 26–29. Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 32 Figure Figure 26. 26. Expert-driven Expert-driven learning with a factor of 1/3 learning with a factor of 1/3. . Figure 26. Expert-driven learning with a factor of 1/3. Figure 26. Expert-driven learning with a factor of 1/3. Figure 26. Expert-driven learning with a factor of 1/3. Figure Figure 27. 27. Expert-driven Expert-driven learning with a factor of 2/3 learning with a factor of 2/3. . Figure 28. Expert-driven learning with a factor of ½. Figure 29. Full expert-driven learning. The statistics of series similarity under different expert ratios are shown in Table 5. Table 5. Statistical results of the series similarity. Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% It can be seen that adopting the expert-driven learning mechanism is helpful to improve the performance of test models for stratum type series simulation based on machine learning, as shown in Table 5. However, the amplitude of the improvement effect is not significant. The expert-driven model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the faster the model will reach stability. From the highest and stable values of the various indicators in the different models, it is not the rule that the higher the expert ratio is, the better the effect will be. The ultimate performance of full expert-driven learning was only slightly better than that of the RNN model. The best results were obtained by using a partial expert-driven learning strategy model. 3.2.2. Training and Verification of the Stratum Thickness Series Model (1) Layer Thickness Simulation Based on Multi-Category Classification The layer thickness of the study area is divided into six stratum thickness intervals as follows: within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness series simulation based on multi-category classification also needs to be numbered and coded for the different stratum thicknesses, as shown in Table 6. Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 32 Figure 27. Expert-driven learning with a factor of 2/3. Appl. Sci. 2019, 9, 3553 17 of 29 Figure 27. Expert-driven learning with a factor of 2/3. Figure 28. Expert-driven learning with a factor of ½. Figure Figure 28. 28. Expert-driven Expert-driven learning learning with with a factor of ½ a factor of . . Figure 29. Full expert-driven learning. Figure 29. Full expert-driven learning. Figure 29. Full expert-driven learning. The statistics of series similarity under dierent expert ratios are shown in Table 5. The statistics of series similarity under different expert ratios are shown in Table 5. The statistics of series similarity under different expert ratios are shown in Table 5. Table 5. Statistical results of the series similarity. Table 5. Statistical results of the series similarity. Table 5. Statistical results of the series similarity. Expert Ratio 0 1/3 1/2 2/3 1 Expert Ratio 0 1/3 1/2 2/3 1 Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Maximum value 71.85% 73.60% 73.95% 73.98% 72.51% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% Steady value 70.91% 72.64% 73.57% 73.09% 71.68% It can be seen that adopting the expert-driven learning mechanism is helpful to improve the It can be seen that adopting the expert-driven learning mechanism is helpful to improve the It can be seen that adopting the expert-driven learning mechanism is helpful to improve the performance of test models for stratum type series simulation based on machine learning, as shown performance of test models for stratum type series simulation based on machine learning, as shown performance of test models for stratum type series simulation based on machine learning, as shown in Table 5. However, the amplitude of the improvement effect is not significant. The expert-driven in Table 5. However, the amplitude of the improvement eect is not signiﬁcant. The expert-driven in Table 5. However, the amplitude of the improvement effect is not significant. The expert-driven model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the faster model can accelerate the convergence of the learning curve, and the higher the expert ratio is, the faster the model will reach stability. From the highest and stable values of the various indicators in the model will reach stability. From the highest and stable values of the various indicators in the faster the model will reach stability. From the highest and stable values of the various indicators in the different models, it is not the rule that the higher the expert ratio is, the better the effect will be. dierent models, it is not the rule that the higher the expert ratio is, the better the eect will be. the different models, it is not the rule that the higher the expert ratio is, the better the effect will be. The ultimate performance of full expert-driven learning was only slightly better than that of the RNN The ultimate performance of full expert-driven learning was only slightly better than that of the RNN The ultimate performance of full expert-driven learning was only slightly better than that of the RNN model. The best results were obtained by using a partial expert-driven learning strategy model. model. The best results were obtained by using a partial expert-driven learning strategy model. model. The best results were obtained by using a partial expert-driven learning strategy model. 3.2.2. Training and Verification of the Stratum Thickness Series Model 3.2.2. Training and Veriﬁcation of the Stratum Thickness Series Model 3.2.2. Training and Verification of the Stratum Thickness Series Model (1) Layer Thickness Simulation Based on Multi-Category Classification (1) Layer Thickness Simulation Based on Multi-Category Classiﬁcation (1) Layer Thickness Simulation Based on Multi-Category Classification The layer thickness of the study area is divided into six stratum thickness intervals as follows: The layer thickness of the study area is divided into six stratum thickness intervals as follows: The layer thickness of the study area is divided into six stratum thickness intervals as follows: within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness within 3 m, 3 m to 5 m, 5 m to 10 m, 10 m to 20 m, 20 m to 30 m, and above 30 m. Stratum thickness series simulation based on multi-category classification also needs to be numbered and coded for the series simulation based on multi-category classiﬁcation also needs to be numbered and coded for the series simulation based on multi-category classification also needs to be numbered and coded for the different stratum thicknesses, as shown in Table 6. dierent stratum thicknesses, as shown in Table 6. different stratum thicknesses, as shown in Table 6. Table 6. Code of the layer thickness type. Layer Thickness Type Stratum Thickness Interval Coded Vector Coding Number <3 m 0 [1, 0, 0, 0, 0, 0, 0] 3–5 m 1 [0, 1, 0, 0, 0, 0, 0] 5–10 m 2 [0, 0, 1, 0, 0, 0, 0] 10–20 m 3 [0, 0, 0, 1, 0, 0, 0] 20–30 m 4 [0, 0, 0, 0, 1, 0, 0] >30 m 5 [0, 0, 0, 0, 0, 1, 0] initiation mark 6 [0, 0, 0, 0, 0, 0, 1] Appl. Sci. 2019, 9, x FOR PEER REVIEW 20 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 20 of 32 Table 6. Code of the layer thickness type. Table 6. Code of the layer thickness type. Layer Thickness Layer Thickness Stratum Thickness Interval Coded Vector Stratum Thickness Interval Coded Vector Type Coding Number Type Coding Number <3 m 0 [1, 0, 0, 0, 0, 0, 0] <3 m 0 [1, 0, 0, 0, 0, 0, 0] 3–5 m 1 [0, 1, 0, 0, 0, 0, 0] 3–5 m 1 [0, 1, 0, 0, 0, 0, 0] 5–10 m 2 [0, 0, 1, 0, 0, 0, 0] 5–10 m 2 [0, 0, 1, 0, 0, 0, 0] 10–20 m 3 [0, 0, 0, 1, 0, 0, 0] 10–20 m 3 [0, 0, 0, 1, 0, 0, 0] 20–30 m 4 [0, 0, 0, 0, 1, 0, 0] 20–30 m 4 [0, 0, 0, 0, 1, 0, 0] >30 m 5 [0, 0, 0, 0, 0, 1, 0] >30 m 5 [0, 0, 0, 0, 0, 1, 0] Appl. Sci. 2019, 9, 3553 18 of 29 initiation mark 6 [0, 0, 0, 0, 0, 0, 1] initiation mark 6 [0, 0, 0, 0, 0, 0, 1] Before the output of the model is generated, the encoder has received a complete series of Before Before the theoutput output of of the the model model is generated, is generated, the encoder the encoder h has received as receiv a complete ed a com series pleteof se stratum ries of stratum types, that is, the total number of stratum layers at the prediction point is known. Therefore, stratum types, that is, the total number of stratum layers at the prediction point is known. Therefore, types, that is, the total number of stratum layers at the prediction point is known. Therefore, there there is no need to add a termination marker for the layer thickness interval. Only an initiation mark there is no is need no n to eed to add a add termination a terminat marker ion marker for the for layer the layer thickness thickness interval. interval Only . Onan ly an initiation initiation mark mark is is introduced as the starting point of the decoder’s simulated layer thickness sequence. After all is introduced as the starting point of the decoder’s simulated layer thickness sequence. After all introduced as the starting point of the decoder ’s simulated layer thickness sequence. After all outputs outputs of the model are completed, a series equal to the number of layers is intercepted as the outp of the utmodel s of the m are completed, odel are com a p series leted,equal a serito es eq theu number al to the n of layers umber of is inte laye rcepted rs is in as tercep the pr ted ediction as the prediction sequence of the layer thickness. prediction sequence of the layer thickness. sequence of the layer thickness. (2) Model Training and Testing (2) Model Training and Testing (2) Model Training and Testing The stratum thickness series model adopts the seq2seq architecture and uses the drilling data in The stratum thickness series model adopts the seq2seq architecture and uses the drilling data in The stratum thickness series model adopts the seq2seq architecture and uses the drilling data the training set for training. To accurately reflect the actual performance of the model, the highest the training set for training. To accurately reflect the actual performance of the model, the highest in the training set for training. To accurately reﬂect the actual performance of the model, the highest accuracy and average accuracy of the model in the test set were compared. After each round of accuracy and average accuracy of the model in the test set were compared. After each round of accuracy and average accuracy of the model in the test set were compared. After each round of training, training, the model was tested, and the test results were recorded. After training 500 rounds, the loss training, the model was tested, and the test results were recorded. After training 500 rounds, the loss the model was tested, and the test results were recorded. After training 500 rounds, the loss curve curve of the model is shown in Figure 30, and the changes in prediction accuracy are shown in Figure curve of the model is shown in Figure 30, and the changes in prediction accuracy are shown in Figure of the model is shown in Figure 30, and the changes in prediction accuracy are shown in Figure 31. 31. As the number of training rounds increases, the prediction performance of the model increases 31. As the number of training rounds increases, the prediction performance of the model increases As the number of training rounds increases, the prediction performance of the model increases slowly slowly and finally converges to 63.53%. slowly and finally converges to 63.53%. and ﬁnally converges to 63.53%. Figure 30. Loss function curve. Figure 30. Loss function curve. Figure 30. Loss function curve. Figure 31. Prediction accuracy curve of the layer thickness. Figure 31. Prediction accuracy curve of the layer thickness. Figure 31. Prediction accuracy curve of the layer thickness. (3) Testing the Eect of Expert-Driven Learning (3) Testing the Effect of Expert-Driven Learning (3) Testing the Effect of Expert-Driven Learning To further improve the accuracy of the model and improve the prediction ability of the model To further improve the accuracy of the model and improve the prediction ability of the model To further improve the accuracy of the model and improve the prediction ability of the model for the stratum thickness category, this section conducts expert-driven model based on supervisory for the stratum thickness category, this section conducts expert-driven model based on supervisory for the stratum thickness category, this section conducts expert-driven model based on supervisory learning in dierent proportions and compares the learning eect to determine the model with the learning in different proportions and compares the learning effect to determine the model with the learning in different proportions and compares the learning effect to determine the model with the highest accuracy and the greatest prediction ability. In this section, the expert ratios adopted by the highest accuracy and the greatest prediction ability. In this section, the expert ratios adopted by the highest accuracy and the greatest prediction ability. In this section, the expert ratios adopted by the seq2seq model in the learning process are 1/3, 1/2, 2/3, and 1. The accuracy performance of the dierent seq2seq model in the learning process are 1/3, 1/2, 2/3, and 1. The accuracy performance of the seq2seq model in the learning process are 1/3, 1/2, 2/3, and 1. The accuracy performance of the models in test data is provided in Table 7. different models in test data is provided in Table 7. different models in test data is provided in Table 7. Table 7. Prediction accuracy of the layer thickness. Expert Ratio 0 1/3 1/2 2/3 1 Maximum value 65.07% 73.05% 80.08% 75.60% 70.07% Steady value 63.53% 70.07% 75.05% 72.62% 67.94% Table 7 shows the highest value of the results achieved in the test data and the ﬁnal stable value after convergence, based on the dierent expert ratios. As we can see from the test results, with the increase in the proportion of expert-driven learning, the accuracy of the model in terms of the test data ﬁrst increases and then decreases. In addition, the models that do not adopt expert-driven learning and Appl. Sci. 2019, 9, x FOR PEER REVIEW 21 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 21 of 32 Table 7. Prediction accuracy of the layer thickness. Table 7. Prediction accuracy of the layer thickness. Expert Expert 0 1/3 1/2 2/3 1 0 1/3 1/2 2/3 1 Ratio Ratio Maximum Maximum 65.07% 73.05% 80.08% 75.60% 70.07% 65.07% 73.05% 80.08% 75.60% 70.07% value value Steady Steady 63.53% 70.07% 75.05% 72.62% 67.94% 63.53% 70.07% 75.05% 72.62% 67.94% value value Table 7 shows the highest value of the results achieved in the test data and the final stable value Table 7 shows the highest value of the results achieved in the test data and the final stable value after convergence, based on the different expert ratios. As we can see from the test results, with the after convergence, based on the different expert ratios. As we can see from the test results, with the increase in the proportion of expert-driven learning, the accuracy of the model in terms of the test increase in the proportion of expert-driven learning, the accuracy of the model in terms of the test Appl. Sci. 2019, 9, 3553 19 of 29 data first increases and then decreases. In addition, the models that do not adopt expert-driven data first increases and then decreases. In addition, the models that do not adopt expert-driven learning and completely adopt expert-driven learning do not achieve the highest accuracy. Clearly, learning and completely adopt expert-driven learning do not achieve the highest accuracy. Clearly, the relationship between the expert ratio and the prediction accuracy rate is not simply a positive completely adopt expert-driven learning do not achieve the highest accuracy. Clearly, the relationship the relationship between the expert ratio and the prediction accuracy rate is not simply a positive correlation. The loss function of 50% expert-driven learning and the training process is shown in correlation. T between the expert he loss func ratio and tion of the pr 50% ediction expert-d accuracy riven learn rate isinot ng and the trainin simply a positive g process is sh correlation. The own loss in Figure 32. When 50% expert-driven learning is applied, the stable value of the layer thickness function of 50% expert-driven learning and the training process is shown in Figure 32. When 50% Figure 32. When 50% expert-driven learning is applied, the stable value of the layer thickness prediction accuracy is 75.05%, and the highest value is 80.08%, which is the best model performance p expert-driven rediction accurac learning y is 7 is 5.applied, 05%, and the the stabl highes e value t valof ue the is 80.08%, wh layer thickness ich is the prediction best model per accuracy f is or 75.05%, mance in the test set, as shown in Figure 33. At this point, the prediction ability of the model for unknown and the highest value is 80.08%, which is the best model performance in the test set, as shown in the test set, as shown in Figure 33. At this point, the prediction ability of the model for unknown data is the greatest, which is consistent with the experience with the stratum type identification model. da in ta Figur is the e 33 gre . At atethis st, whi point, ch is the consis prediction tent with th ability e exp of the eriemodel nce with the for unknown stratumdata typeis ident the i gr fica eatest, tion m which odel. Therefore, expert-driven learning can improve the prediction ability of the model and accelerate Therefore is consistent , ex with pert-d the riven experience learning c with an the improve th stratum type e predic identiﬁcation tion ability model. of the m Thero efor del e,and expert-driven accelerate convergence, but it is not the rule that the higher the expert ratio is, the better the performance of the conv learning ergence, can impr but it ove is no the t th pre rul ediction e thaability t the higher of thethe model expert and raaccelerate tio is, the better the performanc convergence, but it is e of not the the model. model. rule that the higher the expert ratio is, the better the performance of the model. Figure 32. Loss curve of 50% expert-driven learning. Figure 32. Loss curve of 50% expert-driven learning. Figure 32. Loss curve of 50% expert-driven learning. Figure 33. Accuracy of 50% expert-driven learning. Figure 33. Accuracy of 50% expert-driven learning. Figure 33. Accuracy of 50% expert-driven learning. The ﬁnal results show that the maximum accuracy of the layer thickness model is 80.85% under The final results show that the maximum accuracy of the layer thickness model is 80.85% under The final results show that the maximum accuracy of the layer thickness model is 80.85% under the 50% expert ratio, which accurately predicts the layer thickness in the test data. the 50% expert ratio, which accurately predicts the layer thickness in the test data. the 50% expert ratio, which accurately predicts the layer thickness in the test data. 3.2.3. Veriﬁcation of the Geostratigraphic Series Model 3.2.3. Verification of the Geostratigraphic Series Model 3.2.3. Verification of the Geostratigraphic Series Model To verify the true prediction ability of the geostratigraphic series model, the stratum data in the To verify the true prediction ability of the geostratigraphic series model, the stratum data in the To verify the true prediction ability of the geostratigraphic series model, the stratum data in the test borehole data are used for practical testing, and the dierences between the simulated series output test borehole data are used for practical testing, and the differences between the simulated series test borehole data are used for practical testing, and the differences between the simulated series by the model and the real geostratigraphic series are compared. Selected examples of the real borehole output by the model and the real geostratigraphic series are compared. Selected examples of the real output by the model and the real geostratigraphic series are compared. Selected examples of the real stratum conditions and prediction results of machine learning are shown in Table 8. borehole stratum conditions and prediction results of machine learning are shown in Table 8. borehole stratum conditions and prediction results of machine learning are shown in Table 8. Table 8 shows that by comparing the prediction results of the model with the real borehole data, the machine learning model based on the seq2seq architecture has a high accuracy in stratum type prediction. According to the statistics, in all data of the test set, the machine learning model accurately simulates 62.98% of the stratum types, and the similarity between the simulated sequence and the real stratum sequence is 72.16%. In addition, the accuracy rate of the stratum thickness prediction is 74.04%, which basically realizes the determination of the stratum thickness in the study area, as shown in Table 9. In conclusion, the machine learning model based on a recurrent neural network can accurately simulate the real stratum situation in the study area, and its feasibility is veriﬁed. Appl. Sci. 2019, 9, 3553 20 of 29 Table 8. Comparison of the real borehole stratum and machine learning prediction results. The Real Borehole Strata Prediction Results of Machine Learning Number Stratum Thickness Sequence Stratum Thickness Sequence Stratum Type Sequence Stratum Type Sequence (m) (m) 1 silt, clay 0.3, 3.9 ﬂoury soil, clay, plain ﬁll within 3 m, within 3 m, 3–5 m 2 clay 2 clay within 3 m 3 miscellaneous ﬁll 0.6 plain ﬁll 5–10 m 4 plain ﬁll, clay 3.1, 9.8 plain ﬁll, clay within 3 m, 5–10 m miscellaneous ﬁll, plain ﬁll, mucky soil, within 3 m, within 3 m, within 3 m, within 3 5 miscellaneous ﬁll, clay, mucky soil, plain ﬁll, clay 1.2, 1.3, 1.5, 2.4, 13.3 plain ﬁll, clay m, 10–20 m within 3 m, within 3 m, within 3 m, within 3 6 ﬂoury soil, silty clay, plain ﬁll, clay, plain ﬁll, clay 1.0, 0.5, 2.5, 1.2, 0.3, 3.6 ﬂoury soil, plain ﬁll, clay, plain ﬁll, clay m 5–10 m 7 miscellaneous ﬁll, plain ﬁll, clay 0.7, 3.0, 4.5 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 3–5 m 8 miscellaneous ﬁll, clay 0.6, 4.0 miscellaneous ﬁll within 3 m 9 miscellaneous ﬁll, plain ﬁll, clay 0.5, 1.0, 11.9 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 10–20 m 10 miscellaneous ﬁll, clay 1.0, 9.8 miscellaneous ﬁll, clay within 3 m, 5–10 m 11 miscellaneous ﬁll, silt, plain ﬁll, clay 4.1, 11.2, 7.0, 10.0 miscellaneous ﬁll, plain ﬁll, clay within 3 m, 10–20 m, 5–10 m 12 ﬂoury soil, plain ﬁll, mucky soil, clay 0.5, 6.7, 1.2, 8.6 ﬂoury soil, plain ﬁll, plain ﬁll, clay within 3 m, within 3 m, within 3 m, 5–10 m 13 silt, clay 0.4, 6.6 ﬂoury soil, clay within 3 m, 5–10 m 14 silt, clay 0.4, 10.4 ﬂoury soil, clay within 3 m, 5–10 m 15 miscellaneous ﬁll, silt, plain ﬁll, clay 0.7, 1.9, 3.4, 24.0 miscellaneous ﬁll, ﬂoury soil, plain ﬁll, clay within 3 m, within 3 m, within 3 m, 20–30 m miscellaneous ﬁll soil, plain ﬁll soil, old city miscellaneous ﬁll, ﬂoury soil, plain ﬁll, old within 3 m, within 3 m, within 3 m, 5–10 m, 16 1.2, 2.6, 6.5, 13.0 miscellaneous ﬁll soil, clay town ﬁll, clay 10–20 m 17 miscellaneous ﬁll soil, plain ﬁll soil, clay 0.5, 2.8, 10.2 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 10–20 m 18 miscellaneous ﬁll soil, plain ﬁll soil, clay 2.1, 0.8, 12.9 miscellaneous ﬁll, plain ﬁll, clay within 3 m, within 3 m, 10–20 m, Appl. Sci. 2019, 9, 3553 21 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 23 of 32 Table 8 shows that by comparing the prediction results of the model with the real borehole data, Table 9. Statistical results of the geostratigraphic series model simulations. the machine learning model based on the seq2seq architecture has a high accuracy in stratum type prediction. Stratum According Type Accuracy to the statistic As verage , in all Sequence data of the Similarity test set, the m Stratum achine Thickness learning m Accuracy odel accurately simulates 62.98% of the stratum types, and the similarity between the simulated sequence and the 62.98% 72.16% 74.04% real stratum sequence is 72.16%. In addition, the accuracy rate of the stratum thickness prediction is 74.04%, which basically realizes the determination of the stratum thickness in the study area, as 3.3. Three-Dimensional Geological Modeling and Testing shown in Table 9. 3.3.1. Three-Dimensional Geological Modeling Table 9. Statistical results of the geostratigraphic series model simulations. To further test the geostratigraphic series simulation eect based on machine learning, this section Stratum Type Accuracy Average Sequence Similarity Stratum Thickness Accuracy compares the geostratigraphic series simulation method based on machine learning with the traditional 62.98% 72.16% 74.04% method based on 3D geological modeling. On the basis of the training data, a 3D geological model of the research area is constructed by using the triangulated irregular network (TIN) 3D geological In conclusion, the machine learning model based on a recurrent neural network can accurately modeling method [39]. The 3D geological model is consistent with the real strata at the borehole simulate the real stratum situation in the study area, and its feasibility is verified. locations, and it can directly show the complex geological structure and the spatial distributions of the rock and soil masses comprehensively. 3.3. Three-Dimensional Geological Modeling and Testing The main steps for the construction the 3D geological model in this study are as follows: 1. Drilling treatment: According to the geological conditions and drilling stratiﬁcation data, 3.3.1. Three-Dimensional Geological Modeling the strata are classiﬁed and integrated, and the strata are preliminarily sorted from top to bottom. 2. Interpolation mesh generation: Using Delaunay’s triangulation and subdivision algorithms, To further test the geostratigraphic series simulation effect based on machine learning, this a TIN mesh is generated, as shown in Figure 34. section compares the geostratigraphic series simulation method based on machine learning with the 3. Network reﬁnement: The generated irregular triangular interpolation network is adjusted until traditional method based on 3D geological modeling. On the basis of the training data, a 3D the accuracy meets the requirements. geological model of the research area is constructed by using the triangulated irregular network (TIN) 4. Uniform drilling series: All drilling holes are traversed and a uniform geostratigraphic series 3D geological modeling method [39]. The 3D geological model is consistent with the real strata at the is established by considering special stratum conditions such as missing data and reversals. Then, borehole locations, and it can directly show the complex geological structure and the spatial according to the uniﬁed geostratigraphic series, the original stratiﬁcation of all borehole data is distributions of the rock and soil masses comprehensively. transformed into a uniﬁed stratiﬁcation of the borehole series, as shown in Figure 35. If a stratum is The main steps for the construction the 3D geological model in this study are as follows: not included in the original data of the borehole, its layer thickness is set to zero. 1. Drilling treatment: According to the geological conditions and drilling stratification data, the 5. Spatial interpolation: For each layer of the uniform drilling series, the Kriging method is used strata are classified and integrated, and the strata are preliminarily sorted from top to bottom. to calculate the elevation at the top and bottom of the layer in the interpolation grid. If the elevation of 2. Interpolation mesh generation: Using Delaunay’s triangulation and subdivision algorithms, a the top layer is the same as that of the bottom layer, this layer does not exist. TIN mesh is generated, as shown in Figure 34. 6. Stratum construction: If the elevation of the top and bottom of the stratum are dierent, the top 3. Network refinement: The generated irregular triangular interpolation network is adjusted and bottom can be connected with adjacent points to the interpolation point to form a stratum of the until the accuracy meets the requirements. 3D model, as shown in Figure 36. 4. Uniform drilling series: All drilling holes are traversed and a uniform geostratigraphic series 7. Inspection: The generated 3D model is inspected, and the model is adjusted according to is established by considering special stratum conditions such as missing data and reversals. Then, experience and geological characteristics. according to the unified geostratigraphic series, the original stratification of all borehole data is 8. Model generation: A 3D stratum model is rendered, and the redundant parts are removed, transformed into a unified stratification of the borehole series, as shown in Figure 35. If a stratum is while only the research area is maintained. not included in the original data of the borehole, its layer thickness is set to zero. Figure Figure 34. 34. Interpolation Interpolation netwo network rk diagram. diagram. Appl. Sci. 2019, 9, x FOR PEER REVIEW 24 of 32 Appl. Sci. 2019, 9, 3553 22 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 24 of 32 Figure 35. Unified geostratigraphic series diagram. 5. Spatial interpolation: For each layer of the uniform drilling series, the Kriging method is used to calculate the elevation at the top and bottom of the layer in the interpolation grid. If the elevation of the top layer is the same as that of the bottom layer, this layer does not exist. 6. Stratum construction: If the elevation of the top and bottom of the stratum are different, the top and bottom can be connected with adjacent points to the interpolation point to form a stratum of the 3D model, as shown in Figure 36. 7. Inspection: The generated 3D model is inspected, and the model is adjusted according to experience and geological characteristics. 8. Model generation: A 3D stratum model is rendered, and the redundant parts are removed, Figure 35. Uniﬁed geostratigraphic series diagram. Figure 35. Unified geostratigraphic series diagram. while only the research area is maintained. 5. Spatial interpolation: For each layer of the uniform drilling series, the Kriging method is used to calculate the elevation at the top and bottom of the layer in the interpolation grid. If the elevation of the top layer is the same as that of the bottom layer, this layer does not exist. 6. Stratum construction: If the elevation of the top and bottom of the stratum are different, the top and bottom can be connected with adjacent points to the interpolation point to form a stratum of the 3D model, as shown in Figure 36. 7. Inspection: The generated 3D model is inspected, and the model is adjusted according to experience and geological characteristics. 8. Model generation: A 3D stratum model is rendered, and the redundant parts are removed, while only the research area is maintained. Figure 36. Schematic diagram of stratum construction. Figure 36. Schematic diagram of stratum construction. The method to determine the boundary conditions of the model is as follows: According to The method to determine the boundary conditions of the model is as follows: According to boundary on the map of the study area, boundary points are selected at appropriate distances. boundary on the map of the study area, boundary points are selected at appropriate distances. The The boundary points are used as the control points of the estimated stratigraphic boundaries. boundary points are used as the control points of the estimated stratigraphic boundaries. Then, these Then, these control points are connected successively to form a closed polygon. The closed polygon control points are connected successively to form a closed polygon. The closed polygon is used as the is used as the boundary of the estimated stratum. After determining the estimated stratigraphic boundary of the estimated stratum. After determining the estimated stratigraphic boundary, we boundary, we extended the area of the borehole to the boundary of the estimated stratum and eventually extended the area of the borehole to the boundary of the estimated stratum and eventually established the entire 3D geological model. established the entire 3D geological model. The whole process of 3D geological model modeling, from borehole data processing to the ﬁnal The whole process of 3D geological model modeling, from borehole data processing to the final generation of the model, is shown in Figure 37 below. Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 generation of the model, is shown in Figure 37 below. Figure 36. Schematic diagram of stratum construction. The method to determine the boundary conditions of the model is as follows: According to boundary on the map of the study area, boundary points are selected at appropriate distances. The boundary points are used as the control points of the estimated stratigraphic boundaries. Then, these control points are connected successively to form a closed polygon. The closed polygon is used as the boundary of the estimated stratum. After determining the estimated stratigraphic boundary, we extended the area of the borehole to the boundary of the estimated stratum and eventually established the entire 3D geological model. The whole process of 3D geological model modeling, from borehole data processing to the final generation of the model, is shown in Figure 37 below. Figure 37. Workflow of 3D geological modeling. Figure 37. Workﬂow of 3D geological modeling. Finally, a 3D geological model of the research area is constructed (as shown in Figure 38) and sectioned. The stratum types and series after sectioning are shown in Figures 39–41. Figure 38. Three-dimensional geological model. The 3D geological model software was developed by our own team. Figure 39. Three-dimensional boreholes. Figure 40. Borehole and stratum distributions. Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 25 of 32 Appl. Sci. 2019, 9, 3553 23 of 29 Figure 37. Workflow of 3D geological modeling. Figure 37. Workflow of 3D geological modeling. Figure 37. Workflow of 3D geological modeling. Finally Finally, a , a 3D 3D geolog geological ical model model of of the the r resear esearch ch ar area ea is is constr constructed ucted ( (as as shown in Fig shown in Figur ure 3 e 38 8)) and and Finally, a 3D geological model of the research area is constructed (as shown in Figure 38) and Finally, a 3D geological model of the research area is constructed (as shown in Figure 38) and sectioned. sectioned. Th The e s stratum tratum types types and s and series eries after after sectioning sectioning ar are e shown shown in F in Figur igures es 39–41. 39–41. sectioned. The stratum types and series after sectioning are shown in Figures 39–41. sectioned. The stratum types and series after sectioning are shown in Figures 39–41. Figure 38. Three-dimensional geological model. The 3D geological model software was developed by Figure 38. Three-dimensional geological model. The 3D geological model software was developed by Figure Figure 38. 38. Three-d Three-dimensional imensional geol geologi ogical cal model model. . The 3D The 3D geol geological ogical mo model del software was de software was developed veloped by by our own team. our own team. our our own team. own team. Figure 39. Three-dimensional boreholes. Figure 39. Three-dimensional boreholes. Figure 39. Three-dimensional boreholes. Figure 39. Three-dimensional boreholes. Appl. Sci. 2019, 9, x FOR PEER REVIEW 26 of 32 Figure 40. Borehole and stratum distributions. Figure Figure 40. 40. Bor Bore ehole hole and and stratum stratum distribu distributions. tions. Figure 40. Borehole and stratum distributions. Figure 41. Geological section. Figure 41. Geological section. 3.3.2. Three-Dimensional Geological Model Veriﬁcation 3.3.2. Three-Dimensional Geological Model Verification At the same positions as the data in Section 3.2.3, the borehole coordinate information is input into At the same positions as the data in Section 3.2.3, the borehole coordinate information is input the 3D geological model. Then the comparison prediction results between the 3D geological model into the 3D geological model. Then the comparison prediction results between the 3D geological and the real borehole stratum are obtained, as shown in Table 10. model and the real borehole stratum are obtained, as shown in Table 10. Appl. Sci. 2019, 9, 3553 24 of 29 Table 10. Comparison of the real borehole stratum conditions and 3D geological modeling prediction results. The Real Borehole Strata Prediction Results of 3D Geological Modeling Number Stratum Thickness Sequence Stratum Thickness Sequence Stratum Type Sequence Stratum Type Sequence (m) (m) 1 silt, clay 0.3, 3.9 clay, silt 0.3, 3.9 2 clay 2 miscellaneous ﬁll 2.0 3 miscellaneous ﬁll 0.6 miscellaneous ﬁll 0.6 4 plain ﬁll, clay 3.1, 9.8 miscellaneous ﬁll 13.5 5 miscellaneous ﬁll, clay, mucky soil, plain ﬁll, clay 1.2, 1.3, 1.5, 2.4, 13.3 miscellaneous ﬁll, clay, mucky soil, silt 1.2, 1.3, 3.9, 13.3 6 ﬂoury soil, silty clay, plain ﬁll, clay, plain ﬁll, clay 1.0, 0.5, 2.5, 1.2, 0.3, 3.6 plain ﬁll, silt clay, silt, clay, silt 1, 0.5, 2.5, 1.2, 3.9 7 miscellaneous ﬁll, plain ﬁll, clay 0.7, 3.0, 4.5 miscellaneous ﬁll, silt 0.7, 8.5 8 miscellaneous ﬁll, clay 0.6, 4.0 miscellaneous ﬁll 4.6 9 miscellaneous ﬁll, plain ﬁll, clay 0.5, 1.0, 11.9 miscellaneous ﬁll, silt 0.5, 0.5 10 miscellaneous ﬁll, clay 1.0, 9.8 miscellaneous ﬁll 12.2 11 miscellaneous ﬁll, silt, plain ﬁll, clay 4.1, 11.2, 7.0, 10.0 miscellaneous ﬁll, silt 2.8, 25.2 12 ﬂoury soil, plain ﬁll, mucky soil, clay 0.5, 6.7, 1.2, 8.6 plain ﬁll, silt 0.5, 16.5 13 silt, clay 0.4, 6.6 plain ﬁll 7 14 silt, clay 0.4, 10.4 plain ﬁll 10.9 15 miscellaneous ﬁll, silt, plain ﬁll, clay 0.7, 1.9, 3.4, 24.0 miscellaneous ﬁll, plain ﬁll, silt, silt 0.7, 1.9, 3.4, 24 miscellaneous ﬁll soil, plain ﬁll soil, old city miscellaneous ﬁll, plain ﬁll, old city 16 1.2, 2.6, 6.5, 13.0 1.2, 2.6, 22.5 miscellaneous ﬁll soil, clay miscellaneous ﬁll soil 17 miscellaneous ﬁll soil, plain ﬁll soil, clay 0.5, 2.8, 10.2 miscellaneous ﬁll, plain ﬁll 0.5, 13 18 miscellaneous ﬁll soil, plain ﬁll soil, clay 2.1, 0.8, 12.9 miscellaneous ﬁll, silt, clay 2.1, 0.8, 12.9 Appl. Sci. 2019, 9, x FOR PEER REVIEW 28 of 32 Appl. Sci. 2019, 9, 3553 25 of 29 From Table 10, the 3D geological model performs poorly in terms of the number of layers, stratum type, and sequence similarity, but it can better predict the stratum thickness. When the From Table 10, the 3D geological model performs poorly in terms of the number of layers, stratum prediction of the stratum type is accurate, the corresponding thickness prediction is close to the real type, and sequence similarity, but it can better predict the stratum thickness. When the prediction of value. the stratum type is accurate, the corresponding thickness prediction is close to the real value. Some borehole data are randomly selected in the training set, and the borehole coordinate Some borehole data are randomly selected in the training set, and the borehole coordinate information is input into the 3D geological model to obtain the stratum sequence prediction results information is input into the 3D geological model to obtain the stratum sequence prediction results of of the borehole points. According to the statistics, in all the data of the test set, the 3D geological the borehole points. According to the statistics, in all the data of the test set, the 3D geological model model accurately simulates 30.78% of the stratum types, and the similarity between the simulated accurately simulates 30.78% of the stratum types, and the similarity between the simulated series and series and the real geostratigraphic series is 32.27%. In addition, the accuracy rate of the stratum the real geostratigraphic series is 32.27%. In addition, the accuracy rate of the stratum thickness is thickness is 64.52%, as shown in Table 11. 64.52%, as shown in Table 11. Table 11. Statistics of 3D geological model prediction results. Table 11. Statistics of 3D geological model prediction results. Stratum Type Average Sequence Stratum Thickness Stratum Type Accuracy Average Sequence Similarity Stratum Thickness Accuracy Accuracy Similarity Accuracy 30.78% 32.27% 64.52% 30.78% 32.27% 64.52% Comparing Tables 9 and 11, the prediction results histogram of machine learning and 3D geological Comparing Tables 9 and 11, the prediction results histogram of machine learning and 3D modeling is obtained in terms of the stratum type, average series similarity, and stratum thickness geological modeling is obtained in terms of the stratum type, average series similarity, and stratum accuracy, as shown in Figure 42. thickness accuracy, as shown in Figure 42. Figure 42. Comparison histogram of the prediction results of machine learning and 3D geological Figure 42. Comparison histogram of the prediction results of machine learning and 3D geological modeling. modeling. Figure 42 shows that there is a certain dierence in accuracy between the geostratigraphic series models Figu based re 42 sho on w 3D s th geological at there is modeling a certain di and fferemachine nce in accurac learning. y betwe Generally en the geos , these tratig two raph methods ic series can describe the real stratum situation well. The model based on machine learning has a good models based on 3D geological modeling and machine learning. Generally, these two methods can simulation describe the e re ect al st in ra terms tum sit ofuthe atiostratum n well. The type, mod and el all based on its corr mach esponding ine learn indexes ing ha ar s ea good superior sim to ulation those of the traditional 3D geological model. The machine learning model provides stratum information effect in terms of the stratum type, and all its corresponding indexes are superior to those of the by trad pr itedicting ional 3D geolo the layer gical thicknesses model. The within machine the le strata arning mode and it is slightly l provides mor setra accurate tum info than rmatthe ion by 3D geological model. predicting the layer thicknesses within the strata and it is slightly more accurate than the 3D geological model. 3.4. Evaluation of 3D Geological Modeling Based on the Geostratigraphic Series Model 3.4. Evaluation of 3D Geological Modeling Based on the Geostratigraphic Series Model Considering the actual performance of the machine learning model in the prediction of the stratum type and stratum thickness, this study proposes an evaluation algorithm for a 3D geological model. Considering the actual performance of the machine learning model in the prediction of the In the absence of real data guidance, the learning results based on the machine learning model represent stratum type and stratum thickness, this study proposes an evaluation algorithm for a 3D geological the accuracy of geological modeling. For any geostratigraphic series, the reliability evaluation process model. In the absence of real data guidance, the learning results based on the machine learning model is described below. represent the accuracy of geological modeling. For any geostratigraphic series, the reliability The evaluation objects are divided into a stratum type series and stratum thickness series. evaluation process is described below. The geostratigraphic series model generates output in the same position, including stratum type and The evaluation objects are divided into a stratum type series and stratum thickness series. The stratum thickness series. geostratigraphic series model generates output in the same position, including stratum type and stratum thickness series. Appl. Sci. 2019, 9, 3553 26 of 29 Appl. Sci. 2019, 9, x FOR PEER REVIEW 29 of 32 Appl. Sci. 2019, 9, x FOR PEER REVIEW 29 of 32 The similarity of the stratum type series calculated by the edit distance algorithm is used as the The similarity of the stratum type series calculated by the edit distance algorithm is used as the The similarity of the stratum type series calculated by the edit distance algorithm is used as the evaluation index. evaluation index. evaluation index. Comparing the layer thickness series, if the 3D layer thickness is the same as the most likely Comparing the layer thickness series, if the 3D layer thickness is the same as the most likely Comparing the layer thickness series, if the 3D layer thickness is the same as the most likely thickness, the score is one; if the 3D layer thickness is the same as the second most likely thickness, thickness, the score is one; if the 3D layer thickness is the same as the second most likely thickness, thickness, the score is one; if the 3D layer thickness is the same as the second most likely thickness, the score is 0.5; otherwise, the score is zero. the score is 0.5; otherwise, the score is zero. the score is 0.5; otherwise, the score is zero. The scores are added, and the score sum is divided by the 3D series length, which is then used The scores are added, and the score sum is divided by the 3D series length, which is then used as The scores are added, and the score sum is divided by the 3D series length, which is then used as the layer thickness evaluation index. The average values of the type evaluation index and thickness the layer thickness evaluation index. The average values of the type evaluation index and thickness as the layer thickness evaluation index. The average values of the type evaluation index and thickness evaluation index are calculated, and the reliability score of this point in the 3D geological model is evaluation index are calculated, and the reliability score of this point in the 3D geological model is evaluation index are calculated, and the reliability score of this point in the 3D geological model is obtained. If the reliability score is higher than 0.5, the simulation of the real stratum is considered to obtained. If the reliability score is higher than 0.5, the simulation of the real stratum is considered to obtained. If the reliability score is higher than 0.5, the simulation of the real stratum is considered to be reliable. be reliable. be reliable. The calculation process of this algorithm consists of two parts, the type evaluation index and the The calculation process of this algorithm consists of two parts, the type evaluation index and the The calculation process of this algorithm consists of two parts, the type evaluation index and the layer thickness evaluation index. The reliability score is the average of these two indexes. The range layer thickness evaluation index. The reliability score is the average of these two indexes. The range of layer thickness evaluation index. The reliability score is the average of these two indexes. The range of reliability scores calculated by this algorithm is [0,1], representing the matching degree between reliability scores calculated by this algorithm is [0,1], representing the matching degree between the of reliability scores calculated by this algorithm is [0,1], representing the matching degree between the evaluation object and the empirical cognition of the machine learning model. The higher the evaluation object and the empirical cognition of the machine learning model. The higher the reliability the evaluation object and the empirical cognition of the machine learning model. The higher the reliability score is, the closer the evaluation object and the model are in predicting the stratum score is, the closer the evaluation object and the model are in predicting the stratum distribution of reliability score is, the closer the evaluation object and the model are in predicting the stratum distribution of this point. this point. distribution of this point. The test borehole provides the real stratum data, and its evaluation result should be higher than The test borehole provides the real stratum data, and its evaluation result should be higher than The test borehole provides the real stratum data, and its evaluation result should be higher than that of the 3D model. Moreover, if the stratum distribution of a point in the 3D model is similar to the that of the 3D model. Moreover, if the stratum distribution of a point in the 3D model is similar to the that of the 3D model. Moreover, if the stratum distribution of a point in the 3D model is similar to the real situation, the scoring result will be similar to the result of the real stratum. To test the feasibility real situation, the scoring result will be similar to the result of the real stratum. To test the feasibility of real situation, the scoring result will be similar to the result of the real stratum. To test the feasibility of the evaluation algorithm based on the 3D geological model, this study uses the algorithm to the evaluation algorithm based on the 3D geological model, this study uses the algorithm to calculate of the evaluation algorithm based on the 3D geological model, this study uses the algorithm to calculate the reliability score of the test borehole data and the 3D geological model. The calculation the reliability score of the test borehole data and the 3D geological model. The calculation and statistical calculate the reliability score of the test borehole data and the 3D geological model. The calculation and statistical results show that the average reliability score of the test borehole data is 0.6293, which results show that the average reliability score of the test borehole data is 0.6293, which is higher than and statistical results show that the average reliability score of the test borehole data is 0.6293, which is higher than that of the 3D geological model, as shown in Table 12. In addition, the reliability scores that of the 3D geological model, as shown in Table 12. In addition, the reliability scores of the test is higher than that of the 3D geological model, as shown in Table 12. In addition, the reliability scores of the test boreholes are mostly higher than 0.5, while those of the 3D geological model are mainly boreholes are mostly higher than 0.5, while those of the 3D geological model are mainly below 0.5, of the test boreholes are mostly higher than 0.5, while those of the 3D geological model are mainly below 0.5, as shown in the Figures 43 and 44. as shown in the Figures 43 and 44. below 0.5, as shown in the Figures 43 and 44. Table 12. Average reliability of the test borehole data and 3D geological model. Table 12. Average reliability of the test borehole data and 3D geological model. Table 12. Average reliability of the test borehole data and 3D geological model. Test Borehole Data Three-Dimensional Geological Model Test Borehole Data Three-Dimensional Geological Model Test Borehole Data Three-Dimensional Geological Model Average reliability 0.6293 0.3205 Average reliability 0.6293 0.3205 Average reliability 0.6293 0.3205 Figure 43. Histogram of the reliability index frequency of the 3D geological model. Figure 43. Histogram of the reliability index frequency of the 3D geological model. Figure 43. Histogram of the reliability index frequency of the 3D geological model. Figure 44. Histogram of the borehole data reliability index frequency. Figure 44. Histogram of the borehole data reliability index frequency. Figure 44. Histogram of the borehole data reliability index frequency. In conclusion, the evaluation method of 3D geological modeling based on the geostratigraphic In conclusion, the evaluation method of 3D geological modeling based on the geostratigraphic series model is feasible in this study. series model is feasible in this study. Appl. Sci. 2019, 9, 3553 27 of 29 In conclusion, the evaluation method of 3D geological modeling based on the geostratigraphic series model is feasible in this study. 4. Conclusions (1) In view of the disadvantages of the traditional simulation method of the structure of a geostratigraphic series, this study proposes a method based on the principle of a recurrent neural network. This method has the advantage of not relying on subjective factors such as assumptions and expert experience. Moreover, this approach can eectively evaluate geostratigraphic series simulation results in terms of characteristics such as the stratum thickness, stratum type, and stratum sequence. In the process of stratum simulation, utilizing expert-driven learning can improve both the learning eciency and the predictive ability of the model. (2) A complete machine learning model for geostratigraphic series simulation is established, and a model-based 3D geological modeling evaluation method is designed. This study provides a novel approach for the simulation and prediction of geostratigraphic series with 3D geological modeling. This work has far-reaching practical signiﬁcance for the accurate description of the spatial distributions of geological features and guidance of site selection, engineering construction, and environmental assessment. (3) The series model based on machine learning can describe the real situation at wells, and is a complimentary tool to the traditional 3D geological model. This study directly shows that machine learning is feasible and reliable in geostratigraphic series simulation. Additionally, our research provides new ideas and references for the popularization of machine learning in other ﬁelds of geology and engineering, especially 3D geological modeling. Author Contributions: Z.L., conceptualization, methodology, data curation, formal analysis; writing—original draft preparation, writing—review and editing, project administration, funding acquisition; C.Z., conceptualization, methodology, writing—original draft preparation, supervision, project administration, funding acquisition; J.O., data curation, formal analysis, writing—original draft preparation and editing; W.M., writing—original draft preparation; Z.D., G.Z., formal analysis, writing—original draft preparation. Funding: This research presented is funded by the Provincial Science and Technology Project of Guangdong Province (Grant no. 2016B010124007), the Science and Technology Youth Top-Notch Talent Project of Guangdong Special Support Program (Grant no. 2015 TQ01Z344) and the Guangzhou Science and Technology Project (Grant no. 201803030005). Acknowledgments: The authors would like to thank the anonymous reviewers for their very constructive and helpful comments. Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Bertoncello, A.; Sun, T.; Li, H.; Mariethoz, G.; Caers, J. Conditioning surface-based geological models to well and thickness data. Math. Geosci. 2013, 45, 873–893. [CrossRef] 2. Zhu, L.; Zhang, C.; Li, M.; Pan, X.; Sun, J. Building 3D solid models of sedimentary stratum systems from borehole data: An automatic method and case studies. Eng. Geol. 2012, 127, 1–13. [CrossRef] 3. Jones, N.L.; Walker, J.R.; Carle, S.F. Hydrogeologic unit ﬂow characterization using transition probability geostatistics. Groundwater 2005, 43, 285–289. [CrossRef] 4. Qiao, J.; Pan, M.; Li, Z.; Jin, Y. 3D Geological modeling from DEM, boreholes, cross-sections and geological maps. In Proceedings of the 2011 19th International Conference on Geoinformatics, Shanghai, China, 24–26 June 2011; pp. 1–5. 5. Lallier, F.; Caumon, G.; Borgomano, J.; Viseur, S.; Royer, J.J.; Antoine, C. Uncertainty assessment in the stratigraphic well correlation of a carbonate ramp: Method and application to the Beausset Basin, SE France. Comptes Rendus Geosci. 2016, 348, 499–509. [CrossRef] 6. Edwards, J.; Lallier, F.; Caumon, G.; Carpentier, C. Uncertainty management in stratigraphic well correlation and stratigraphic architectures: A training-based method. Comput. Geosci. 2017, 111, 11–17. [CrossRef] Appl. Sci. 2019, 9, 3553 28 of 29 7. Carr, G.R.; Andrew, A.S.; Denton, G.; Giblin, A.; Korsch, M.; Whitford, D. The “Glass Earth”—Geochemical frontiers in exploration through cover. Aust. Inst. Geosci. Bull. 1999, 28, 33–40. 8. Molennar, M. A topology for 3D vector maps. ITC J. 1992, 1, 25–33. 9. Chen, H.; Huang, T. A survey of construction and manipulation of octrees. Comput. Vis. Graph. Image Process. 1988, 43, 409–431. [CrossRef] 10. Houlding, S.W. 3D Geoscience Modeling—Computer Techniques for Geological Characterization; Springer: New York, NY, USA, 1994; p. 303. 11. Caumon, G.; Mallet, J.L. 3D Stratigraphic models: Representation and stochastic modelling. In Proceedings of the IAMG 2006, Liège, Belgium, 3–8 September 2006. 12. Mallet, J.L. Discrete Smooth Interpolation. ACM Trans. Graph. 1989, 8, 121–144. [CrossRef] 13. Mallet, J.L. Geomodeling; Oxford University Press: New York, NY, USA, 2002; p. 612. 14. Mallet, J.L. Elements of Mathematical Sedimentary Geology: The GeoChron Model; EAGE: Houten, The Netherlands, 2014. 15. Randle, C.H.; Bond, C.E.; Lark, R.M.; Monaghan, A.A. Uncertainty in geological interpretations: Eectiveness of expert elicitations. Geosphere 2019, 15, 108–118. [CrossRef] 16. Carbonell, J. Machine Learning: A Maturing Field. Mach. Learn. 1992, 9, 5–7. [CrossRef] 17. Langley, P. Machine learning as an experimental science. Mach. Learn. 1988, 3, 5–8. [CrossRef] 18. Quinlan, J.R. Induction of Decision Trees. Mach. Learn. 1986, 1, 81–106. [CrossRef] 19. Breiman, L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat. Sci. 2001, 16, 199–231. [CrossRef] 20. Bachri, I.; Hakdaoui, M.; Raji, M.; Teodoro, A.C.; Benbouziane, A. Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco. ISPRS Int. J. Geo-Inf. 2019, 8, 248. [CrossRef] 21. Chen, L.; Ren, C.; Li, L.; Wang, Y.; Zhang, B.; Wang, Z.; Li, L. A Comparative Assessment of Geostatistical, Machine Learning, and Hybrid Approaches for Mapping Topsoil Organic Carbon Content. ISPRS Int. J. Geo-Inf. 2019, 8, 174. [CrossRef] 22. Mueller, E.; Sandoval, J.; Mudigonda, S.; Elliott, M. A Cluster-Based Machine Learning Ensemble Approach for Geospatial Data: Estimation of Health Insurance Status in Missouri. ISPRS Int. J. Geo-Inf. 2019, 8, 13. [CrossRef] 23. Burl, M.C.; Asker, L.; Smyth, P.; Fayyad, U.; Perona, P.; Crumpler, L.; Aubele, J. Learning to Recognize Volcanoes on Venus. Mach. Learn. 1998, 30, 165–194. [CrossRef] 24. Gonçalves, Í.G.; Kumaira, S.; Guadagnin, F. A machine learning approach to the potential-ﬁeld method for implicit modeling of geological structures. Comput. Geosci. 2017, 103, 173–182. [CrossRef] 25. Klump, J.F.; Huber, R.; Robertson, J.; Cox, S.J.; Woodcock, R. Linking descriptive geology and quantitative machine learning through an ontology of lithological concepts. In Proceedings of the AGU Fall Meeting Abstracts 2014, San Francisco, CA, USA, 15–19 December 2004. 26. Rodriguez-Galiano, V.; Sanchez-Castillo, M.; Chica-Olmo, M.; Chica-Rivas, M. Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geol. Rev. 2015, 71, 804–818. [CrossRef] 27. Porwal, A.; Carranza, E.J.M.; Hale, M. Artiﬁcial Neural Networks for Mineral-Potential Mapping: A Case Study from Aravalli Province, Western India. Nat. Resour. Res. 2003, 12, 155–171. [CrossRef] 28. Zhang, T. The Relationships between Rock Elements and the Igneous Rocks, the Lithologic Discrimination and Mineral Identiﬁcation of Sedimentary Rocks: A Study Based on the Method of Artiﬁcial Neural Network. Ph.D. Thesis, Northwest University, Xi’an, China, 2016. 29. Zhang, Y.; Su, G.; Yan, L. Gaussian Process Machine Learning Model for Forecasting of Karstic Collapse. In International Conference on Applied Informatics and Communication; Springer: Berlin/Heidelberg, Germany, 2011; pp. 365–372. 30. Chaki, S.; Routray, A.; Mohanty, W.K. Well-Log and Seismic Data Integration for Reservoir Characterization: A Signal Processing and Machine-Learning Perspective. IEEE Signal Process. Mag. 2018, 35, 72–81. [CrossRef] 31. Gaurav, A. Horizontal shale well eur determination integrating geology, machine learning, pattern recognition and multivariate statistics focused on the permian basin. In SPE Liquids-Rich Basins Conference-North America; Society of Petroleum Engineers: Richardson, TX, USA, 2017. 32. Sha, A.; Tong, Z.; Gao, J. Recognition and Measurement of Pavement Disasters Based on Convolutional Neural Networks. China J. Highw. Transp. 2017, 31, 1–10. Appl. Sci. 2019, 9, 3553 29 of 29 33. Connor, J.T.; Martin, R.D.; Atlas, L.E. Recurrent Neural Networks and Robust Time Series Prediction; IEEE Press: Piscataway Township, NJ, USA, 1994. 34. Graves, A. Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 35. Lamb, A.M.; Goyal, A.G.; Zhang, Y.; Zhang, S.; Courville, A.C.; Bengio, Y. Professor Forcing: A New Algorithm for Training Recurrent Networks. In Advances in Neural Information Processing Systems, Proceedings of the 30th Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; Curran Associates, Inc.: Red Hook, NY, USA, 2016. 36. Lin, C.; Hsieh, T.; Liu, Y.; Lin, Y.; Fang, C.; Wang, Y.; Chuang, C. Minority oversampling in kernel adaptive subspaces for class imbalanced datasets. IEEE Trans. Knowl. Data Eng. 2018, 30, 950–962. [CrossRef] 37. LÃžcke, J.; Sahani, M. Maximal causes for non-linear component extraction. J. Mach. Learn. Res. 2008, 9, 1227–1267. 38. Liu, X. Application of BP neural network in insider rock identiﬁcation of Taiguyu in Liaohe. Pet. Geol. Eng. 2010, 24, 40–42. 39. Royer, J.J.; Mejia, P.; Caumon, G.; Collon, P. 3D and 4D Geomodelling Applied to Mineral Resources Exploration— An Introduction. 3D, 4D and Predictive Modelling of Major Mineral Belts in Europe; Springer: Cham, Switzerland, 2015. © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Journal

Applied Sciences – Multidisciplinary Digital Publishing Institute

Published: Aug 29, 2019

Keywords: recurrent neural network; series simulation; three-dimensional geological modeling; expert-driven learning

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A Stratigraphic Prediction Method Based on Machine Learning

A Stratigraphic Prediction Method Based on Machine Learning

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

A Stratigraphic Prediction Method Based on Machine Learning

A Stratigraphic Prediction Method Based on Machine Learning

References (41)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies