Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classification on Computed Tomography Images

Weiming Li; Siqi Yu; Runhuang Yang; Yixing Tian; Tianyu Zhu; Haotian Liu; Danyang Jiao; Feng Zhang; Xiangtong Liu; Lixin Tao; Yan Gao; Qiang Li; Jingbo Zhang; Xiuhua Guo

doi:10.3390/cancers15225417

Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classification on Computed Tomography Images

Li, Weiming;Yu, Siqi;Yang, Runhuang;Tian, Yixing;Zhu, Tianyu;Liu, Haotian;Jiao, Danyang;Zhang, Feng;Liu, Xiangtong;Tao, Lixin;Gao, Yan;Li, Qiang;Zhang, Jingbo;Guo, Xiuhua 2023-11-15 00:00:00 cancers Article Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classiﬁcation on Computed Tomography Images 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 Weiming Li , Siqi Yu , Runhuang Yang , Yixing Tian , Tianyu Zhu , Haotian Liu , Danyang Jiao , 1 , 2 1 , 2 1 , 2 3 4 4 1 , 2 , Feng Zhang , Xiangtong Liu , Lixin Tao , Yan Gao , Qiang Li , Jingbo Zhang and Xiuhua Guo * Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; [email protected] (W.L.); [email protected] (S.Y.); [email protected] (R.Y.); [email protected] (Y.T.); [email protected] (T.Z.); [email protected] (H.L.); [email protected] (D.J.); [email protected] (F.Z.); [email protected] (X.L.); [email protected] (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China Department of Nuclear Medicine, Xuanwu Hospital Capital Medical University, Beijing 100053, China; [email protected] Beijing Physical Examination Center, Beijing 100050, China; [email protected] (Q.L.); [email protected] (J.Z.) * Correspondence: [email protected] Simple Summary: Machine learning methods have shown promise in accurately identifying small lung nodules. However, further exploration is needed to fully harness the potential of machine learning in distinguishing between benign and malignant nodules. This study aimed to develop and evaluate a ResNet50-Ensemble Voting model for detecting the nature (benign or malignant) of small pulmonary nodules (less than 20 mm) based on CT images. This study involved 834 CT imaging data from 396 patients with small pulmonary nodules. CT image features were extracted using ResNet50 and VGG16 algorithms, and classiﬁcation was performed using XGBoost, SVM, and Ensemble Voting Citation: Li, W.; Yu, S.; Yang, R.; Tian, techniques, incorporating ten different combinations of machine learning classiﬁers. Among the Y.; Zhu, T.; Liu, H.; Jiao, D.; Zhang, F.; Liu, X.; Tao, L.; et al. Machine models tested, the ResNet50-Ensemble Voting algorithm demonstrated the highest performance Learning Model of in the test set, achieving an accuracy of 0.943 (0.938, 0.948), with sensitivity and speciﬁcity values ResNet50-Ensemble Voting for of 0.964 and 0.911, respectively. The implementation of machine learning models, particularly the Malignant–Benign Small Pulmonary ResNet50-Ensemble Voting approach, showed excellent performance in accurately identifying benign Nodule Classiﬁcation on Computed and malignant small pulmonary nodules (less than 20 mm) from diverse sources. These models Tomography Images. Cancers 2023, have the potential to assist doctors in accurately diagnosing the nature of early-stage lung nodules in 15, 5417. https://doi.org/10.3390/ clinical practice. cancers15225417 Academic Editors: Maria Li Lung Abstract: Background: The early detection of benign and malignant lung tumors enabled patients to and Josephine Ko diagnose lesions and implement appropriate health measures earlier, dramatically improving lung cancer patients’ quality of living. Machine learning methods performed admirably when recognizing Received: 29 August 2023 small benign and malignant lung nodules. However, exploration and investigation are required to Revised: 21 September 2023 fully leverage the potential of machine learning in distinguishing between benign and malignant Accepted: 26 September 2023 Published: 15 November 2023 small lung nodules. Objective: The aim of this study was to develop and evaluate the ResNet50- Ensemble Voting model for detecting the benign and malignant nature of small pulmonary nodules (<20 mm) based on CT images. Methods: In this study, 834 CT imaging data from 396 patients with small pulmonary nodules were gathered and randomly assigned to the training and validation sets in Copyright: © 2023 by the authors. an 8:2 ratio. ResNet50 and VGG16 algorithms were utilized to extract CT image features, followed by Licensee MDPI, Basel, Switzerland. XGBoost, SVM, and Ensemble Voting techniques for classiﬁcation, for a total of ten different classes This article is an open access article of machine learning combinatorial classiﬁers. Indicators such as accuracy, sensitivity, and speciﬁcity distributed under the terms and were used to assess the models. The collected features are also shown to investigate the contrasts conditions of the Creative Commons between them. Results: The algorithm we presented, ResNet50-Ensemble Voting, performed best Attribution (CC BY) license (https:// in the test set, with an accuracy of 0.943 (0.938, 0.948) and sensitivity and speciﬁcity of 0.964 and creativecommons.org/licenses/by/ 4.0/). 0.911, respectively. VGG16-Ensemble Voting had an accuracy of 0.887 (0.880, 0.894), with a sensitivity Cancers 2023, 15, 5417. https://doi.org/10.3390/cancers15225417 https://www.mdpi.com/journal/cancers Cancers 2023, 15, 5417 2 of 13 and speciﬁcity of 0.952 and 0.784, respectively. Conclusion: Machine learning models that were implemented and integrated ResNet50-Ensemble Voting performed exceptionally well in identifying benign and malignant small pulmonary nodules (<20 mm) from various sites, which might help doctors in accurately diagnosing the nature of early-stage lung nodules in clinical practice. Keywords: ResNet50; ensemble voting; XGBoost; small pulmonary nodules; pulmonary cancer 1. Introduction Currently, lung cancer remains one of the leading causes of cancer-related mortality worldwide. It was estimated that there would be about 1.8 million deaths from lung cancer in 2020, accounting for 18% of all cancer deaths [1]. Screening for benign and malignant lung nodules in the early stages of lung cancer could increase the 5-year survival rate from 16.1% to 19.7% [2]. Therefore, the early detection and accurate classiﬁcation of pulmonary nodules are crucial for improving patient prognosis. The gold standard cannot be relied on to distinguish the benign or malignant nature of small lung nodules in the early stages of lung cancer, because the nodules are too small to acquire pathologic evidence of the lung nodules [3]. Transbronchial biopsies of isolated pulmonary nodules (SPNs) smaller than 20 mm are typically performed under ﬂuoroscopic guidance, but there is great variability in the availability of pathologic tissue [4]. As a result of its high resolution and non-invasive nature, computed tomography (CT) imaging has emerged as a signiﬁcant tool in the identiﬁcation and therapy of lung nodules [5]. A study demonstrated that annual lung cancer screening using CT imaging reduced lung cancer mortality by 20% [6]. However, radiologists continued to struggle with reliably discriminating between malignant and benign, small pulmonary nodules based only on CT imaging. Computer- assisted diagnostic tools had the potential to improve the detection and screening of benign and malignant lung nodules [7]. Kaliyugarasan applied a new extension of the fastai deep learning framework to a 3D medical imaging task and combined it with the MONAI deep learning library to achieve a ﬁnal classiﬁcation accuracy of 92.4% [8]. Zhao constructed a hybrid CNN of LeNet and AlexNet to distinguish benign and malignant lung nodules using CT images, and the accuracy and area under the curve reached 0.822 and 0.877, respectively, obtaining better results [9]. Keyan proposed an MSM-ViT model aiming to achieve promising performance in lung nodule classiﬁcation, solving the problems of the poor generalization of ViT structure and the difﬁculty in extracting multi-scale features, and the best accuracy of 94.04% was obtained [10]. Mkindu proposed 3D-CNN and squeeze-and-excitation networks, with the joint algorithm yielding the highest detection sensitivity of 98.65% [11]. Hassan proposed an automated computer-aided diagnosis (CAD) scheme for lung nodule detection based on the Vision Transformer architecture and Bayesian optimization, obtaining 98.39% of the highest detection sensitivity with a signiﬁcant reduction in network parameters [12]. However, the preceding research focused mostly on regular-sized lung nodules and did not investigate the model’s capabilities in small lung nodules (<20 mm). There are studies that have used Bayesian penalized likelihood reconstruction [13] or maximum-intensity projection [14] to enhance the representation of small lung nodules on CT images to improve clinicians’ identiﬁcation and the diagnosis of early-stage lung cancer. However, the diagnosis has not been made accurately to a certain extent. Kum used the modiﬁed AlexNET algorithm for the diagnosis of the nature of lung nodules smaller than 20 mm with an AUC value of 0.82, exploring the application of this method for the diagnosis of early lung nodules [15]. Mei introduced the Otsu thresholding algorithm to preprocess the data and ﬁlter the interfering information, obtaining nodule features, and parallel radiomics was added to the 3D convolutional neural network, reaching an AUC of 0.90 [16]. However, the capability to diagnose lung nodules in early stages remained insufﬁcient, and further exploration and enhancement were desirable. Liu achieved superior outcomes Cancers 2023, 15, 5417 3 of 13 by combining CT radiomics with machine learning to predict the invasiveness of small nodules [17]. Classical machine learning classiﬁcation algorithms such as XGBoost [18] and SVM (Support Vector Machine) [19] have achieved better performance in diagnosing the nature of lung nodules. However, the diagnosis of small lung nodules in the early stage needed to be further explored. Furthermore, machine learning classiﬁcation models based on feature extraction were further developed and explored. The Local Mesh Peak-Valley Edge Pattern (LMePVEP) technique for splicing-based feature extraction based on dynamic thresholding could improve the classiﬁcation accuracy by up to 12.56% [20]. However, the accuracy of this method for the diagnosis of the nature of lung nodules still needed to be promoted. The ResNet18 scheme combined with different classiﬁers helped to achieve better accuracy, such as the SoftMax (95.2%) classiﬁer and Decision Tree Classiﬁer (99%), in lung disease recognition [21]. Therefore, the concept of extracting features based on deep learning and combining different classiﬁers for disease classiﬁcation model construction was proven to be feasible. In this study, we utilized a combination of deep learning feature extraction and different classiﬁers to construct a fusion model to explore and improve the diagnostic capability of the benign and malignant nature of lung nodules (<20 mm) in the early stage. 2. Materials and Methods 2.1. Data Source From 2015 to 2019, 396 individuals were recruited for this study from four hospitals and two open access databases, and informed consent was obtained. All patients’ lung CT images were obtained in DICOM format, with a total of 934 layers involving pulmonary nodules. We adopted a questionnaire to collect clinician diagnoses and basic demographic information after analyzing patients’ medical records and admission data. A checklist of the subjects and the images is shown in Table 1. Table 1. The checklist of subjects and images. Database Subjects (n, %) Images (n, %) Beijing Chest Hospital 43 (10.86) 89 (10.67) Beijing Cancer Hospital 106 (26.77) 228 (27.37) Xuanwu Hospital 96 (24.24) 204 (24.46) Beijing Physical Examination Center 79 (19.95) 175 (20.98) TCGA Public Database 26 (6.57) 51 (6.12) LIDC-IDRI 46 (11.62) 87 (10.44) Total 396 (100.00) 834 (100.00) It was further analyzed whether there was a difference in age and gender between patients with benign or malignant lung nodules. However, the results showed that no statistical difference was found between the two, which is shown in Table 2. Table 2. Comparison of clinical information between benign group and malignant group. Clinic Information Benign (n = 154) Malignant (n = 242) p Age (years, mean SD) 61.43 12.38 68.42 10.29 0.057 Gender (n, %) 0.903 Male 81 (0.53) 135 (0.56) Female 73 (0.47) 107 (0.44) a b t-test was used for the distribution difference of continuous variables. Chi-square test was used for the distribution difference of categorical variables. 2.1.1. Inclusion/Exclusion Criteria Inclusion criteria: The subjects of this study should be adults (age 18 years); Cancers 2023, 15, x FOR PEER REVIEW 4 of 14 a b t-test was used for the distribution diﬀerence of continuous variables. Chi-square test was used for the distribution diﬀerence of categorical variables. 2.1.1. Inclusion/Exclusion Criteria Cancers 2023, 15, 5417 4 of 13 Inclusion criteria: • The subjects of this study should be adults (age ≥ 18 years); • In order to ensure the integrity of the information in the lung nodule images, the In order to ensure the integrity of the information in the lung nodule images, the number of CT images containing nodules should not be less than 2 per patient; number of CT images containing nodules should not be less than 2 per patient; • Clear physician’s diagnostic report was available; Clear physician’s diagnostic report was available; • Small pulmonary nodules less than 20 mm in size for which a deﬁnitive pathologic Small pulmonary nodules less than 20 mm in size for which a deﬁnitive pathologic diagnosis cannot be made. diagnosis cannot be made. Exclusion criteria: Exclusion criteria: • Patients treated with chemo-radiotherapy or surgery; Patients treated with chemo-radiotherapy or surgery; • Images of nodules that were diﬃcult to segment; Images of nodules that were difﬁcult to segment; • The size of the lung nodule was above 20 mm. The size of the lung nodule was above 20 mm. 2.1.2. Diagnostic Criteria 2.1.2. Diagnostic Criteria This study utilized the gold standard for lung nodules with a clear pathologic diag- This study utilized the gold standard for lung nodules with a clear pathologic diagnosis nosis of the nature of small lung nodules, and in instances when a pathologic diagnosis of the nature of small lung nodules, and in instances when a pathologic diagnosis could could not be obtained due to the small size of the lung nodule, the diagnostic report based not be obtained due to the small size of the lung nodule, the diagnostic report based on the on the clinician’s a priori knowledge prevailed. The Chinese Expert Consensus on the Di- clinician’s a priori knowledge prevailed. The Chinese Expert Consensus on the Diagnosis agnosis and Treatment of Lung Nodules (2018 edition) contains detailed diagnostic crite- and Treatment of Lung Nodules (2018 edition) contains detailed diagnostic criteria for ria for lung nodules. lung nodules. 2.2. Research Design Process 2.2. Research Design Process In this study, data from the six aforementioned databases of patients with small pul- In this study, data from the six aforementioned databases of patients with small monary nodules were collected and acquired from ﬁnished CT scans of small pulmonary pulmonary nodules were collected and acquired from ﬁnished CT scans of small pulmonary nodules using the criteria. Image preprocessing techniques such as normalization were nodules using the criteria. Image preprocessing techniques such as normalization were used after initially identifying the region of interest (ROI) of lung nodules according to used after initially identifying the region of interest (ROI) of lung nodules according to expert clinicians. Feature extraction was performed on the acquired CT images of the lung expert clinicians. Feature extraction was performed on the acquired CT images of the lung nodule region of interest, mostly using ResNet50 and VGG16. The nodules were then cat- nodule region of interest, mostly using ResNet50 and VGG16. The nodules were then egorized as benign or malignant using ﬁve diﬀerent classiﬁers. The dataset was divided categorized as benign or malignant using ﬁve different classiﬁers. The dataset was divided into two parts: the training set (80%) and the validation set (20%). Finally, the model was into two parts: the training set (80%) and the validation set (20%). Finally, the model was evaluated in terms of accuracy, AUC value, speciﬁcity, and sensitivity. The speciﬁc pro- evaluated in terms of accuracy, AUC value, speciﬁcity, and sensitivity. The speciﬁc process cess is shown in Figure 1. is shown in Figure 1. Figure 1. Flowchart for the design of a machine diagnostic model for benign and malignant pul- Figure 1. Flowchart for the design of a machine diagnostic model for benign and malignant pulmo- monary nodules. nary nodules. 2.3. Image Preprocessing In this study, each CT image of the small lung nodule was taken as the object of this study. Semi-automatic segmentation of the whole CT image was performed by two experienced radiologists using MATLAB 2017 to segment the region of interest (ROI) using region growing method. As a result, one ROI image was obtained from each CT image. The image was also resized on the basis of the sub-base, and the resizing was set to be 32 32. Processing of the already intercepted images of small lung nodules was performed Cancers 2023, 15, 5417 5 of 13 by means of the Adaptive Histogram Equalization (AHE) algorithm. The parameters of the AHE algorithm were set to clipLimit = 2.0 and tileGridSize = (8, 8). clipLimit controls the degree of limitation of the contrast enhancement, and tileGridSize deﬁnes the equalization region of the image. The method is based on conventional histogram equalization, where the image is divided into small blocks and histogram equalization is performed within each block to avoid introducing discontinuities between blocks. Eventually, noise reduction was performed using median ﬁltering, which is a ﬁltering method based on sorting statistics that uses the median value in the neighborhood around the pixel to replace the current pixel value. Median ﬁltering is effective for removing pretzel noise or impulse noise, as well as preserving edges and details. 2.4. Deep Learning Algorithm Recognizing benign and malignant lung nodules remains a popular classiﬁcation job in machine vision. In general, image recognition consists of two crucial stages: picture feature extraction and feature categorization. The goal of image feature extraction is to convert the original picture data into a more expressive and identiﬁable feature representa- tion. Picture characteristics can be extracted to reduce the dimensionality of picture data, eliminate extraneous information, and choose important image information. Deep learning algorithms trained on large-scale datasets extract high-level semantic characteristics from photos. ResNet50 and VGG16, two common examples of convolutional neural networks, have a signiﬁcant advantage in visual feature extraction. 2.4.1. ResNet50 ResNet50 addresses the vanishing gradients problem in deep neural networks, which use residual connections, allowing the network to learn residual mappings [22]. The connections avoid layers, which reduce the deterioration in deep networks. It contains 50 layers, which include convolutional, pooling, fully connected, and shortcut layers. ResNet50 is composed of a number of residual blocks with convolutional layers and shortcuts. The direct gradient ﬂow is facilitated by the shortcut connectors [23]. The hyperparameters for extracting image features for the ResNet50 model mainly consist of two categories: weights and include_top. Weights set to ‘ImageNet’ indicates that weights pre-trained on the ImageNet dataset are used to help improve the performance and generalization of the model. Include_top set to False indicates that the top fully connected layer is not included. The hyperparameters of the ResNet50 model for extracting image features are shown speciﬁcally in Supplementary File, Table S1. 2.4.2. VGG16 VGG16 is a convolutional neural network (CNN) architecture designed to build a deep network with a consistent architecture composed of repeated convolutional layers followed by max-pooling layers for spatial downsampling. By gradually increasing the depth while keeping the ﬁlter size modest (3 3), the network intends to learn hierarchical representations of pictures [24]. When compared to larger ﬁlters, the usage of tiny ﬁlters allows for a deeper network with fewer parameters. VGG16 s structure is distinguished by its depth, as the name suggests. It includes 16 layers, including 13 convolutional layers and 3 fully linked layers. The convolutional layers are divided into ﬁve blocks, each with several convolutional layers followed by a max-pooling layer. The completely linked layers at the network’s conclusion are in charge of categorization [25]. The parameterization of the VGG16 model is consistent with ResNet50. It is also pre-trained by ImageNet. The speciﬁc settings of VGG16 are detailed in Supplementary File, Table S2. 2.5. Machine Learning Classiﬁers The classifiers setup is a vital task in machine learning that entails categorizing instances based on specified input data. ResNet50 and VGG16 have their own classification capabil- ities. However, these are frequently insufficient in categorizing finer pictures such as CT Cancers 2023, 15, 5417 6 of 13 scans of lung nodules. In this work, we used the following five approaches as ResNet50 and VGG16 classifiers to build a fusion model to increase the model’s classification capabilities. 2.5.1. Ensemble Voting Ensemble Voting, a machine learning technique that integrates the predictions of numerous models to produce a ﬁnal choice, was one of the methods utilized. It was founded on the idea that combining the predictions of many models might frequently result in better overall performance than using a single model alone. Ensemble Voting is widely utilized in machine learning problems like as classiﬁcation and regression [26]. There are different types of voting schemes, such as majority voting, weighted voting, and soft voting. In majority voting, each model in the ensemble casts a single vote for its predicted class label, and the class label with the majority of votes is chosen as the ﬁnal prediction [27]. The Ensemble Voting classiﬁers are composed of RandomForestClassiﬁer, XGBClassiﬁer, SVC (Support Vector Machine Classiﬁer), and GaussianNB. Voting = ‘soft’ indicates the utilization of a soft voting model, implying that when classiﬁcation is performed, the predictions of the base classiﬁer are converted into probability estimates for the categories, and the best of these probabilities are voted on as the ﬁnal classiﬁcation result. 2.5.2. Random Forest Random Forest is based on the principle of ensemble learning, in which decision trees are trained separately on various subsets of data. Each Random Forest decision tree is built with a random selection of features and a bootstrapped sample of the original data. The ﬁnal prediction is formed by collecting all of the individual tree forecasts via voting (for classiﬁcation) or averaging (for regression) [28]. This randomness helps to capture different aspects of the data and improves the overall performance of the ensemble. Random Forest consists of multiple decision trees. It constructs multiple independent decision trees through random sampling and feature selection, and then produces integrated predictions by voting or averaging. Random Forest reduces overﬁtting, has good generalization ability, and evaluates feature importance. It is suitable for classiﬁcation and regression problems and provides stable and accurate predictions. The parameter settings for the Random Forest classiﬁer were as follows: n_estimators, 100; min_samples_leaf, 1; min_samples_split, 2; and bootstrap, True. 2.5.3. XGBoost XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm known for its efﬁciency and performance in both regression and classiﬁcation tasks. XG- Boost builds an ensemble of decision trees sequentially, where each tree corrects the mis- takes made by the previous trees. The algorithm focuses on optimizing a speciﬁc loss function while regularizing the model to prevent overﬁtting [29]. In each iteration, XG- Boost calculates a gradient based on the difference between the current model’s prediction and the true value, and uses this gradient to adjust the model parameters. Each new decision tree tries to correct the errors of all the previous trees and is constructed taking into account the prediction errors of the previous trees. This iterative process reduces the error and improves the predictive performance of the model gradually. The parameters of the XGBoost classiﬁer were set as follows: binary using logisti regression; max_depth, 10; learning_rate, 0.01; and n_estimators, 100. 2.5.4. SVM Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classiﬁcation and regression tasks, which can handle both linearly separable and non-linearly separable data by using different kernel functions to transform the data into a higher-dimensional space [30]. SVM’s structure includes identifying support vectors, which are the data points closest to the decision border or hyperplane. These support vectors are critical in establishing the decision boundary and making forecasts. Depending Cancers 2023, 15, 5417 7 of 13 on the situation at hand, SVM might have a linear or non-linear decision boundary, which is performed by selecting an appropriate kernel function. The SVM classiﬁer (SVC) param- eters were set as follows: strength of regularization parameters C, 1.0; break_ties, False; cache_size, 200; degree, 3; and kernel, rbf (radial basis function). 2.5.5. Naïve Bayes Naïve Bayes is a simple yet powerful machine learning algorithm based on Bayesian probability. The principle behind Nave Bayes is to utilize Bayes’ theorem to estimate the likelihood of a speciﬁc class given the observed features [31]. Given the input characteristics, it estimates the conditional probability of each class and chooses the class with the highest probability as the predicted class. The Nave Bayes structure entails creating a probabilistic model based on the training data. It calculates the prior probability of each class as well as the probability of detecting each characteristic given each class. This assumption simpliﬁes probability computation and enables effective training and prediction. The GaussianNB parameters are set as follows: priors, None; var_smoothing, 1 10 . 2.6. Feature Visualization The collected feature information from ResNet50 and VGG16 was displayed in this study using t-SNE and feature ranking algorithms. The use of t-SNE lowered the di- mensionality of the global features from 256 to 2, allowing the features to be shown on a two-dimensional scatter plot. Each data point on the plot represented a sample, and examining their spatial arrangement revealed information about the samples’ grouping, closeness, or dispersion depending on their learning attributes. This visual analysis proved useful in determining the features’ discriminative strength and separability. 2.7. Statistical Analysis The statistical descriptions of patient information are presented as the mean and the standard deviation (SD) or percentage; R 4.0.3 software was used to perform the test or t-test for the basic clinical data of patients and images. The difference was statistically signiﬁcant at p < 0.05. Considering that deletion of observations containing missing values would result in loss of data and may affect the accuracy and reliability of subsequent analyses or modeling, the Morphological Characteristics Random Forest method was chosen to ﬁll in the missing values. To evaluate the classiﬁcation performance in the valid set, accuracy, sensitivity, speciﬁcity, positive predictive value (PPV), negative predictive value (NPV), and F1-score were calculated. In the present study, the threshold for sensitivity and speciﬁcity values was determined to be 0.5. Since we considered the current study to address the differentiation of the benign and malignant nature of lung nodules, regardless of the category to which they belonged, it is of great signiﬁcance. Mean Absolute Error (MAE) is presented to evaluate the average of the distances between the model predictions and the true values of the samples. Curves from the receiver operating characteristics (ROC) were plotted to visually compare the differences between the models. 3. Results 3.1. Combined Machine Learning Models In this study, we utilized ResNet50 and VGG16 as the basis of feature extraction classiﬁers, which were federated with Ensemble Voting, XGBoost, Random Forest, SVM, and Naïve Bayes to form a new machine learning classiﬁcation process. In this study, a total of 2048 features were extracted using the last convolutional layer (layer 6) of ResNet50, while a total of 512 features were extracted using the last convolutional layer (layer 5) of VGG16. The feature ﬁltering was performed through the XGBoost process and ﬁnally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, respectively. As shown in Table 3, ResNet50-Ensemble Voting achieved the best performance with an accuracy of 0.943 (0.938, 0.948) and sensitivity and speciﬁcity of 0.964 and 0.911, respectively. Cancers 2023, 15, x FOR PEER REVIEW 8 of 14 true values of the samples. Curves from the receiver operating characteristics (ROC) were plott ed to visually compare the diﬀerences between the models. 3. Results 3.1. Combined Machine Learning Models In this study, we utilized ResNet50 and VGG16 as the basis of feature extraction clas- siﬁers, which were federated with Ensemble Voting, XGBoost, Random Forest, SVM, and Naïve Bayes to form a new machine learning classiﬁcation process. In this study, a total of 2048 features were extracted using the last convolutional layer (layer 6) of ResNet50, while a total of 512 features were extracted using the last convolutional layer (layer 5) of VGG16. The feature ﬁltering was performed through the XGBoost process and ﬁnally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, respectively. As Cancers 2023, 15, 5417 8 of 13 shown in Table 3, ResNet50-Ensemble Voting achieved the best performance with an ac- curacy of 0.943 (0.938, 0.948) and sensitivity and speciﬁcity of 0.964 and 0.911, respectively. It was not only higher than that of the ResNet50 deep learning model, but also bett er than It was not only higher than that of the ResNet50 deep learning model, but also better than those of the comparative models with improved classiﬁers such as ResNet50-XGBoost. those of the comparative models with improved classiﬁers such as ResNet50-XGBoost. From a global perspective, the classiﬁcation levels of the fusion models with the ResNet50 From a global perspective, the classiﬁcation levels of the fusion models with the ResNet50 model as the feature extraction were signiﬁcantly superior to those of VGG16. Then, the model as the feature extraction were signiﬁcantly superior to those of VGG16. Then, the screening pass features were classiﬁed. The best AUC value (ResNet50-SVM) achieved screening pass features were classiﬁed. The best AUC value (ResNet50-SVM) achieved was was 0.91. In the ROC curve, each of the operating points was optimized, which also indi- 0.91. In the ROC curve, each of the operating points was optimized, which also indicates cates the comprehensive standard of the method. The ROC curves of classiﬁcation models the comprehensive standard of the method. The ROC curves of classiﬁcation models are are plott ed in Figure 2. plotted in Figure 2. Table 3. Classiﬁcation and diagnosis of diabetic nephropathy based on the migration model. Table 3. Classiﬁcation and diagnosis of diabetic nephropathy based on the migration model. Models Accuracy Sensitivity Specificity PPV NPV AUC MAE F1-Score Models Accuracy Sensitivity Speciﬁcity PPV NPV AUC MAE F1-Score ResNet50 0.75 (0.73, 0.77) 0.82 0.66 0.78 0.71 0.81 0.27 0.80 ResNet50 VGG16 0.75 0.61 (0 (0.73, .59, 0.63) 0.77) 0.82 0.37 0.66 0.90 0.82 0.78 0.710.54 0.810.61 0.270.40 0.800.51 VGG16 0.61 (0.59, 0.63) 0.37 0.90 0.82 0.54 0.61 0.40 0.51 VGG16-Ensemble Voting 0.88 (0.88, 0.89) 0.95 0.78 0.74 0.54 0.77 0.11 0.91 VGG16-Ensemble Voting 0.88 (0.88, 0.89) 0.95 0.78 0.74 0.54 0.77 0.11 0.91 VGG16-XGBoost 0.74 (0.72, 0.77) 0.86 0.57 0.73 0.68 0.76 0.25 0.80 VGG16-XGBoost 0.74 (0.72, 0.77) 0.86 0.57 0.73 0.68 0.76 0.25 0.80 VGG16-Random Forest 0.73 (0.71, 0.75) 0.89 0.49 0.72 0.74 0.79 0.27 0.80 VGG16-Random Forest 0.73 (0.71, 0.75) 0.89 0.49 0.72 0.74 0.79 0.27 0.80 VGG16-SVM 0.72 (0.70, 0.75) 0.90 0.46 0.72 0.76 0.78 0.27 0.80 VGG16-SVM 0.72 (0.70, 0.75) 0.90 0.46 0.72 0.76 0.78 0.27 0.80 VGG16-Naïve Bayes 0.63 (0.61, 0.66) 0.69 0.54 0.69 0.55 0.66 0.37 0.69 VGG16-Naïve Bayes 0.63 (0.61, 0.66) 0.69 0.54 0.69 0.55 0.66 0.37 0.69 ResNet50-Ensemble Voting 0.94 (0.93, 0.94) 0.96 0.91 0.85 0.63 0.88 0.06 0.95 ResNet50-Ensemble Voting 0.94 (0.93, 0.94) 0.96 0.91 0.85 0.63 0.88 0.06 0.95 ResNet50-XGBoost 0.82 (0.80, 0.83) 0.89 0.70 0.82 0.81 0.90 0.18 0.86 ResNet50-XGBoost 0.82 (0.80, 0.83) 0.89 0.70 0.82 0.81 0.90 0.18 0.86 ResNet50-Random Forest 0.82 (0.80, 0.84) 0.92 0.66 0.81 0.86 0.89 0.19 0.86 ResNet50-Random Forest 0.82 (0.80, 0.84) 0.92 0.66 0.81 0.86 0.89 0.19 0.86 ResNet50-SVM ResNet50-SVM 0.83 0.83 (0 (0.82, .82, 0.85) 0.85) 0.93 0.93 0.690.69 0.82 0.82 0.860.86 0.910.91 0.170.17 0.870.87 ResNet50-Naïve Bayes 0.71 (0.69, 0.73) 0.75 0.66 0.77 0.63 0.75 0.29 0.76 ResNet50-Naïve Bayes 0.71 (0.69, 0.73) 0.75 0.66 0.77 0.63 0.75 0.29 0.76 Figure 2. The ROC curves of different combinations of classiﬁcation models in the test set. (a) Features extracted by ResNet50. (b) Features extracted by VGG16. 3.2. Feature Visualization In this study, feature ﬁltering was performed through the XGBoost process and ﬁnally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, ﬁltered by importance of all features over 0.001. To further explore signiﬁcant results in the image feature extraction results of small lung nodules, we performed further visualization of speciﬁc lung nodules. The t-SNE results demonstrate the variability in the extraction results by ResNet50 and VGG16. As shown in Figure 3, the distinction between benign and malignant lung nodules with characteristics was not discernible, but the labeled region suggested that the ResNet50 model retrieved more differentiated locales, implying that the ﬁnal classiﬁcation result of this model was likewise superior. Cancers 2023, 15, x FOR PEER REVIEW 9 of 14 Figure 2. The ROC curves of diﬀerent combinations of classiﬁcation models in the test set. (a) Fea- tures extracted by ResNet50. (b) Features extracted by VGG16. 3.2. Feature Visualization In this study, feature ﬁltering was performed through the XGBoost process and ﬁ- nally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, ﬁl- tered by importance of all features over 0.001. To further explore signiﬁcant results in the image feature extraction results of small lung nodules, we performed further visualization of speciﬁc lung nodules. The t-SNE results demonstrate the variability in the extraction results by ResNet50 and VGG16. As shown in Figure 3, the distinction between benign and malignant lung nodules with characteristics was not discernible, but the labeled re- gion suggested that the ResNet50 model retrieved more diﬀerentiated locales, implying Cancers 2023, 15, 5417 9 of 13 that the ﬁnal classiﬁcation result of this model was likewise superior. Figure 3. Differential feature visualization of small lung nodules. (a) Features extracted by ResNet50. Figure 3. Diﬀerential feature visualization of small lung nodules. (a) Features extracted by Res- Net50. ( (b) Featur b) Features extracted es extracted by VGG16. by VGG16. The black The black b box is the ox is th area of e area of differential diﬀer featur ential feature e clustering. clustering. To further demonstrate the feature extraction differences between the two methods, To further demonstrate the feature extraction diﬀerences between the two methods, we categorized and presented the differential feature locations using the Identity Mapping we categorized and presented the diﬀerential feature locations using the Identity Map- method. Through Figure 4, we discovered that there was little differentiation between the ping method. Through Figure 4, we discovered that there was litt le diﬀerentiation be- Cancers 2023, 15, x FOR PEER REVIEW 10 of 14 two methods of extracting features as a whole; however, for small malignant lung nodules, tween the two methods of extracting features as a whole; however, for small malignant ResNet50 discovered more diversiﬁed features, demonstrating the efﬁciency of the feature lung nodules, ResNet50 discovered more diversiﬁed features, demonstrating the eﬃ- extraction strategy. However, the less relevant particular aspects were considered. ciency of the feature extraction strategy. However, the less relevant particular aspects were considered. Figure 4. Identity Mapping of visualization of the effectiveness of the learned features. (a) Features Figure 4. Identity Mapping of visualization of the eﬀectiveness of the learned features. (a) Features extracted extracted by R by ResNet50. esNet50. ( (b b)) Fe Featur atures extracte es extracted d by VGG16. by VGG16. Based on our ﬁndings, we ordered the features from most important to least important Based on our ﬁndings, we ordered the features from most important to least im- and selected the top 20 most important features recovered by the ResNet50 and VGG16 portant and selected the top 20 most important features recovered by the ResNet50 and algorithms. Among the ResNet50 results, Feature 867, Feature 869, and Feature 438 were VGG16 algorithms. Among the ResNet50 results, Feature 867, Feature 869, and Feature determined to be the most important. In the instance of VGG16, Feature 228, Feature 277, 438 were determined to be the most important. In the instance of VGG16, Feature 228, and Feature 439 were selected as the most essential characteristics, in that order. The results Feature 277, and Feature 439 were selected as the most essential characteristics, in that are shown in Figure 5. order. The results are shown in Figure 5. Figure 5. Feature importance ranking for feature screening via XGBoost. (a) Features extracted by ResNet50 and (b) features extracted by VGG16. 4. Discussion Accurate evaluation of the benign and malignant nature of small lung nodules (<20 mm) detected in CT is essential for the early diagnosis and management of lung cancer, and it has remained a challenging undertaking during clinical practice [32]. In this study, we developed and validated a classiﬁcation diagnostic model combining deep learning and machine learning to distinguish between benign and malignant early lung nodules using CT images of small lung nodules from six diﬀerent databases. Our results demon- Cancers 2023, 15, x FOR PEER REVIEW 10 of 14 Figure 4. Identity Mapping of visualization of the eﬀectiveness of the learned features. (a) Features extracted by ResNet50. (b) Features extracted by VGG16. Based on our ﬁndings, we ordered the features from most important to least im- portant and selected the top 20 most important features recovered by the ResNet50 and VGG16 algorithms. Among the ResNet50 results, Feature 867, Feature 869, and Feature 438 were determined to be the most important. In the instance of VGG16, Feature 228, Feature 277, and Feature 439 were selected as the most essential characteristics, in that Cancers 2023, 15, 5417 10 of 13 order. The results are shown in Figure 5. Figure Figure 5. 5. Featur Feature importance ranking for fea e importance ranking for featu ture screening via XGBoost re screening via XGBoost.. ( (aa )) Features extracted by Features extracted by ResNet50 and (b) features extracted by VGG16. ResNet50 and (b) features extracted by VGG16. 4. Discussion 4. Discussion Accurate evaluation of the benign and malignant nature of small lung nodules Accurate evaluation of the benign and malignant nature of small lung nodules (<20 (<20 mm) detected in CT is essential for the early diagnosis and management of lung mm) detected in CT is essential for the early diagnosis and management of lung cancer, cancer, and it has remained a challenging undertaking during clinical practice [32]. In and it has remained a challenging undertaking during clinical practice [32]. In this study, this study, we developed and validated a classiﬁcation diagnostic model combining deep we developed and validated a classiﬁcation diagnostic model combining deep learning learning and machine learning to distinguish between benign and malignant early lung and machine learning to distinguish between benign and malignant early lung nodules nodules using CT images of small lung nodules from six different databases. Our results using CT images of small lung nodules from six diﬀerent databases. Our results demon- demonstrated that our proposed method, ResNet50-Ensemble Voting, achieved superior performance, reaching an accuracy of 0.943 (0.938, 0.948) along with a sensitivity of 0.964 and speciﬁcity of 0.911. In addition, ResNet50-SVM achieved an AUC of 0.91, and the accuracy attained 0.83 (0.82, 0.85). In the ROC curve, each of the operating points was optimized, which also indicates the comprehensive standard of the method. This result showed the competence of diagnosing the benign and malignant nature of small lung nod- ules in the validation set. This study further demonstrated the feature extraction capability of ResNet50 and VGG16, visualized the features, and compared the performance of the combined model in diagnosing the benign and malignant nature of lung nodules. The early detection and identiﬁcation of lung nodules are particularly critical and challenging, especially when the gold standard of pathological tissue is not available. In this context, the ResNet50-SVM and ResNet50-XGBoost models developed in this study made signiﬁcant contributions by selecting the best combination of feature extraction and classiﬁers. This could improve the diagnostic capabilities for small lung nodules and reduce the misdiagno- sis and missed diagnosis rates among clinicians. Ultimately, it provides clearer diagnostic guidance for patients in the early stages of lung cancer. In recent years, ResNet50 and VGG16 have been applied as the most classical CNN network models for diagnosis and recognition of diseases [22,33,34]. ResNet50 has a deeper network depth compared to traditional deep networks to better capture details and semantic information in images [35]. VGG16, on the other hand, is able to capture features at different scales by stacking multiple small convolutional kernels and pooling layers to increase the nonlinear expressiveness of the network [36]. Both methods have demonstrated competence in the diagnosis of the nature of pulmonary nodules. There are numerous studies that have utilized residual networks to classify lung cancer. One study excluded the results of a multilevel crossover residual network for lung nodule classiﬁcation, which could reach an 85.88% accuracy rate [37]. Zhang used ResNet as the basic framework combined with CBAM to classify conventional lung nodules, and the AUC could reach more than 0.95 [38]. Xie utilized the collaborative deep learning of knowledge in a staging chest CT of benign and malignant lung nodules with an accuracy of up to 95.70% [39]. Wang built a multi-scale residual network (MResNet) to accurately extract the Cancers 2023, 15, 5417 11 of 13 features of lung nodules and classiﬁed them in conjunction with deep learning, achieving an accuracy of 99.12% [40]. This shows that the research on regular lung nodules is well established, but the diagnosis of small, early lung nodules needs to be further clariﬁed. In addition, consideration and improvements should be made to the related research methods. The current study focused more on the nature of small lung nodules in the early stages of lung cancer. Size and growth are crucial factors in evaluating the malignant potential of a nodule. The likelihood of malignancy is positively correlated with nodule diameter, and therefore the importance of morphology in CT images should not be underestimated [41]. Farjah primarily discovered the relationship between lung cancer diagnosis and nodal features using multivariate analysis, and the created model had an AUC of 0.75, indicating that detection capacity needed to be improved further [42]. Wookjin classiﬁed the early imaging features of lung cancer by low-dose CT lung nodules with an AUC value of 0.89. Although this study targeted nodules in the early stages of lung cancer, it did not account for the speciﬁc size of the nodules [43]. DNA promoter hypermethylation was found to be diagnostic for early-stage lung cancer, and speciﬁc markers such as SOX17, TAC1, and HOXA7 were shown to be diagnostic at an AUC of 0.89 [44]. Tumor necrosis factor- receptor-associated protein (TRAP1) was also of signiﬁcance in the diagnostic process of lung nodules in the early stages of lung cancer, with an AUC value of approximately 0.835 [45]. Relevant biomarkers, despite displaying good performance, prevented screening from being applied to broad populations due to their expensive cost. On this premise, the current ﬁndings enhanced the diagnosis of early lung nodules. This study additionally demonstrated the characteristics from various viewpoints and attempted to investigate the capacity of various aspects to contribute. Not only is our proposed method noninvasive, but its cost is also readily acceptable compared to biomarkers. Nonetheless, our study had several drawbacks. First and foremost, because this was a study of small lung nodules, the gold standard could not be achieved. The aim was just to bring the method as close to the physician’s diagnostic level as possible. Second, the model was not combined and compared with radiologic features. Despite the fact that both were related to imaging, the method proposed in this study cannot directly provide information such as clinical indications such as the burr sign. Finally, one shortcoming of the technique was that it required a high number of precisely labeled counts, making data collection a greater challenge. 5. Conclusions In conclusion, the combined machine learning model ResNet50-Ensemble Voting showed remarkable performance in the identiﬁcation of benign and malignant small pulmonary nodules (<20 mm) from multiple centers. The combined feature visualization process further clariﬁes the variability in different features. The model can help clinicians accurately diagnose the nature of early-stage lung sub-nodules in clinical practice. Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers15225417/s1, Table S1: The ResNet50 model was pre-trained with ImageNet to extract hyper-parameter information of image features; Table S2: The VGG16 model was pre-trained with ImageNet to extract hyper-parameter information of image features. Author Contributions: Conceptualization, W.L., S.Y. and F.Z.; data curation, T.Z., H.L., D.J., Y.G. and Q.L.; formal analysis, S.Y., R.Y., Y.T. and L.T.; funding acquisition, X.G.; investigation, Y.T., T.Z. and D.J.; methodology, W.L. and F.Z.; project administration, J.Z. and X.G.; resources, T.Z., H.L., Y.G. and Q.L.; software, R.Y. and X.L.; supervision, X.L., J.Z. and X.G.; validation, X.L. and L.T.; visualization, W.L.; writing—original draft, W.L.; writing—review and editing, X.L. and L.T. All authors have read and agreed to the published version of the manuscript. Funding: This study was supported by the National Natural Science Foundation of China (grant number 82173617 and 82373683) and Beijing Medical Science and Technology Promotion Center (grant number KCZX-KT-002). Cancers 2023, 15, 5417 12 of 13 Institutional Review Board Statement: This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of the Beijing Physical Examination Center (protocol code: 002, approval date: 24 March 2022). Informed Consent Statement: Considering the privacy of patient data, the section is not applicable. Data Availability Statement: The data presented in this study are available in this article (and Supplementary Materials). Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Lung Cancer Screening Considerations During Respiratory Infection Outbreaks, Epidemics or Pandemics: An International Association for the Study of Lung Cancer Early Detection and Screening Committee Report—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/S1556086421033268 (accessed on 25 August 2023). 2. Zeng, H.; Chen, W.; Zheng, R.; Zhang, S.; Ji, J.S.; Zou, X.; Xia, C.; Sun, K.; Yang, Z.; Li, H.; et al. Changing Cancer Survival in China during 2003–15: A Pooled Analysis of 17 Population-Based Cancer Registries. Lancet Glob. Health 2018, 6, e555–e567. [CrossRef] 3. Wang, L.; Zhang, M.; Pan, X.; Zhao, M.; Huang, L.; Hu, X.; Wang, X.; Qiao, L.; Guo, Q.; Xu, W.; et al. Integrative Serum Metabolic Fingerprints Based Multi-Modal Platforms for Lung Adenocarcinoma Early Detection and Pulmonary Nodule Classiﬁcation. Adv. Sci. 2022, 9, 2203786. [CrossRef] 4. Eberhardt, R.; Ernst, A.; Herth, F.J.F. Ultrasound-Guided Transbronchial Biopsy of Solitary Pulmonary Nodules Less than 20 Mm. Eur. Respir. J. 2009, 34, 1284–1287. [CrossRef] [PubMed] 5. An Assisted Diagnosis System for Detection of Early Pulmonary Nodule in Computed Tomography Images|SpringerLink. Available online: https://link.springer.com/article/10.1007/s10916-016-0669-0?utm_source=xmol&utm_medium=afﬁliate& utm_content=meta&utm_campaign=DDCN_1_GL01_metadata (accessed on 25 August 2023). 6. Management of Small Lung Nodules in the Era of Lung Cancer Screening|Lung Cancer|JAMA Surgery|JAMA Network. Available online: https://jamanetwork.com/journals/jamasurgery/fullarticle/2719456 (accessed on 26 August 2023). 7. Huang, P.; Park, S.; Yan, R.; Lee, J.; Chu, L.C.; Lin, C.T.; Hussien, A.; Rathmell, J.; Thomas, B.; Chen, C.; et al. Added Value of Computer-Aided CT Image Features for Early Lung Cancer Diagnosis with Small Pulmonary Nodules: A Matched Case-Control Study. Radiology 2018, 286, 286–295. [CrossRef] [PubMed] 8. Kaliyugarasan, S.; Lundervold, A.; Lundervold, A.S. Pulmonary Nodule Classiﬁcation in Lung Cancer from 3D Thoracic CT Scans Using Fastai and MONAI. Int. J. Interact. Multimed. Artif. Intell. 2021, 6, 83. [CrossRef] 9. Zhao, X.; Liu, L.; Qi, S.; Teng, Y.; Li, J.; Qian, W. Agile Convolutional Neural Network for Pulmonary Nodule Classiﬁcation Using CT Images. Int. J. Comput. Ass. Rad. 2018, 13, 585–595. [CrossRef] 10. Cao, K.; Tao, H.; Wang, Z.; Jin, X. MSM-ViT: A Multi-Scale MobileViT for Pulmonary Nodule Classiﬁcation Using CT Images. J. X-ray Sci. Technol. 2023, 31, 731–744. [CrossRef] 11. Mkindu, H.; Wu, L.; Zhao, Y. Lung Nodule Detection of CT Images Based on Combining 3D-CNN and Squeeze-and-Excitation Networks. Multimed. Tools Appl. 2023, 82, 25747–25760. [CrossRef] 12. Mkindu, H.; Wu, L.; Zhao, Y. Lung Nodule Detection in Chest CT Images Based on Vision Transformer Network with Bayesian Optimization. Biomed. Signal Process. Control 2023, 85, 104866. [CrossRef] 13. Howard, B.A.; Morgan, R.; Thorpe, M.P.; Turkington, T.G.; Oldan, J.; James, O.G.; Borges-Neto, S. Comparison of Bayesian Penalized Likelihood Reconstruction versus OS-EM for Characterization of Small Pulmonary Nodules in Oncologic PET/CT. Ann. Nucl. Med. 2017, 31, 623–628. [CrossRef] 14. Incremental Beneﬁt of Maximum-Intensity-Projection Images on Observer Detection of Small Pulmonary Nodules Revealed by Multidetector CT|AJR. Available online: https://www.ajronline.org/doi/10.2214/ajr.179.1.1790149 (accessed on 25 August 2023). 15. Chae, K.J.; Jin, G.Y.; Ko, S.B.; Wang, Y.; Zhang, H.; Choi, E.J.; Choi, H. Deep Learning for the Classiﬁcation of Small (2 cm) Pulmonary Nodules on CT Imaging: A Preliminary Study. Acad. Radiol. 2020, 27, e55–e63. [CrossRef] [PubMed] 16. Mei, M.; Ye, Z.; Zha, Y. An Integrated Convolutional Neural Network for Classifying Small Pulmonary Solid Nodules. Front. Neurosci. 2023, 17, 1152222. [PubMed] 17. Liu, R.-S.; Ye, J.; Yu, Y.; Yang, Z.-Y.; Lin, J.-L.; Li, X.-D.; Qin, T.-S.; Tao, D.-P.; Song, W.; Wang, G.; et al. The Predictive Accuracy of CT Radiomics Combined with Machine Learning in Predicting the Invasiveness of Small Nodular Lung Adenocarcinoma. Transl. Lung Cancer Res. 2023, 12, 530–546. [CrossRef] [PubMed] 18. Guan, X.; Du, Y.; Ma, R.; Teng, N.; Ou, S.; Zhao, H.; Li, X. Construction of the XGBoost Model for Early Lung Cancer Prediction Based on Metabolic Indices. BMC Med. Inform. Decis. Mak. 2023, 23, 107. [CrossRef] 19. Jain, S. Computer-Aided Detection System for the Classiﬁcation of Non-Small Cell Lung Lesions Using SVM. Curr. Comput.-Aided Drug Des. 2021, 16, 833–840. [CrossRef] 20. Srivastava, V.; Gupta, S.; Chaudhary, G.; Balodi, A.; Khari, M.; García-Díaz, V. An Enhanced Texture-Based Feature Extraction Approach for Classiﬁcation of Biomedical Images of CT-Scan of Lungs. Int. J. Interact. Multimed. Artif. Intell. 2021, 6, 18. [CrossRef] Cancers 2023, 15, 5417 13 of 13 21. Rajinikanth, V.; Kadry, S.; Moreno-Ger, P. ResNet18 Supported Inspection of Tuberculosis in Chest Radiographs with Integrated Deep, LBP, and DWT Features. Int. J. Interact. Multimed. Artif. Intell. 2023, 8, 38. [CrossRef] 22. Sharma, A.K.; Nandal, A.; Dhaka, A.; Koundal, D.; Bogatinoska, D.C.; Alyami, H. Enhanced Watershed Segmentation Algorithm- Based Modiﬁed ResNet50 Model for Brain Tumor Detection. BioMed Res. Int. 2022, 2022, 7348344. [CrossRef] 23. Hossain, M.d.B.; Iqbal, S.M.H.S.; Islam, M.d.M.; Akhtar, M.d.N.; Sarker, I.H. Transfer Learning with Fine-Tuned Deep CNN ResNet50 Model for Classifying COVID-19 from Chest X-ray Images. Inform. Med. Unlocked 2022, 30, 100916. [CrossRef] 24. A New Model Based on Improved VGG16 for Corn Weed Identiﬁcation, Frontiers in Plant Science—X-MOL. Available online: https://www.x-mol.com/paper/1677428630847471616?adv (accessed on 29 August 2023). 25. Circuit Manufacturing Defect Detection Using VGG16 Convolutional Neural Networks. Available online: https://www.hindawi. com/journals/wcmc/2022/1070405/ (accessed on 29 August 2023). 26. Advanced Defensive Distillation with Ensemble Voting and Noisy Logits|SpringerLink. Available online: https://link.springer. com/article/10.1007/s10489-022-03495-3?utm_source=xmol&utm_medium=afﬁliate&utm_content=meta&utm_campaign= DDCN_1_GL01_metadata (accessed on 29 August 2023). 27. Shehab, M.A.; Kahraman, N. A Weighted Voting Ensemble of Efﬁcient Regularized Extreme Learning Machine. Comput. Electr. Eng. 2020, 85, 106639. [CrossRef] 28. Mantas, C.J.; Castellano, J.G.; Moral-García, S.; Abellán, J. A Comparison of Random Forest Based Algorithms: Random Credal Random Forest versus Oblique Random Forest. Soft Comput. 2019, 23, 10739–10754. [CrossRef] 29. Li, J.; An, X.; Li, Q.; Wang, C.; Yu, H.; Zhou, X.; Geng, Y. Application of XGBoost Algorithm in the Optimization of Pollutant Concentration. Atmos. Res. 2022, 276, 106238. [CrossRef] 30. Ding, S.; Shi, Z.; Tao, D.; An, B. Recent Advances in Support Vector Machines. Neurocomputing 2016, 211, 1–3. [CrossRef] 31. Redivo, E.; Viroli, C.; Farcomeni, A. Quantile-Distribution Functions and Their Use for Classiﬁcation, with Application to Naïve Bayes Classiﬁers. Statist. Comput. 2023, 33, 55. [CrossRef] 32. Kadara, H.; Tran, L.M.; Liu, B.; Vachani, A.; Li, S.; Sinjab, A.; Zhou, X.J.; Dubinett, S.M.; Krysan, K. Early Diagnosis and Screening for Lung Cancer. Cold Spring Harb. Perspect. Med. 2021, 11, a037994. [CrossRef] [PubMed] 33. Huang, H.; You, Z.; Cai, H.; Xu, J.; Lin, D. Fast Detection Method for Prostate Cancer Cells Based on an Integrated ResNet50 and YoloV5 Framework. Comput. Methods Programs Biomed. 2022, 226, 107184. [CrossRef] 34. Alshammari, A. Construction of VGG16 Convolution Neural Network (VGG16_CNN) Classiﬁer with NestNet-Based Segmenta- tion Paradigm for Brain Metastasis Classiﬁcation. Sensors 2022, 22, 8076. [CrossRef] 35. A Method for Detecting the Quality of Cotton Seeds Based on an Improved ResNet50 Model. Available online: https://pubmed. ncbi.nlm.nih.gov/36791128/ (accessed on 26 August 2023). 36. VGG16 Feature Extractor with Extreme Gradient Boost Classiﬁer for Pancreas Cancer Prediction. Available online: https: //pubmed.ncbi.nlm.nih.gov/37504815/ (accessed on 26 August 2023). 37. Lyu, J.; Bi, X.; Ling, S.H. Multi-Level Cross Residual Network for Lung Nodule Classiﬁcation. Sensors 2020, 20, 2837. [CrossRef] 38. Deep-Learning Model of ResNet Combined with CBAM for Malignant-Benign Pulmonary Nodules Classiﬁcation on Computed Tomography Images. Available online: https://pubmed.ncbi.nlm.nih.gov/37374292/ (accessed on 26 August 2023). 39. Xie, Y.; Xia, Y.; Zhang, J.; Song, Y.; Feng, D.; Fulham, M.; Cai, W. Knowledge-Based Collaborative Deep Learning for Benign- Malignant Lung Nodule Classiﬁcation on Chest CT. IEEE Trans. Med. Imaging 2019, 38, 991–1004. [CrossRef] 40. Wang, H.; Zhu, H.; Ding, L.; Yang, K. A Diagnostic Classiﬁcation of Lung Nodules Using Multiple-Scale Residual Network. Sci. Rep. 2023, 13, 11322. [CrossRef] 41. Evaluation of the Solitary Pulmonary Nodule: Size Matters, but Do Not Ignore the Power of Morphology | Insights into Imaging. Available online: https://link.springer.com/article/10.1007/s13244-017-0581-2?utm_source=xmol&utm_medium=afﬁliate& utm_content=meta&utm_campaign=DDCN_1_GL01_metadata (accessed on 26 August 2023). 42. Patient and Nodule Characteristics Associated with a Lung Cancer Diagnosis Among Individuals with Incidentally Detected Lung Nodules—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/S0012369222039009 (accessed on 26 August 2023). 43. Choi, W.; Oh, J.H.; Riyahi, S.; Liu, C.-J.; Jiang, F.; Chen, W.; White, C.; Rimner, A.; Mechalakos, J.G.; Deasy, J.O.; et al. Radiomics Analysis of Pulmonary Nodules in Low-Dose CT for Early Detection of Lung Cancer. Med. Phys. 2018, 45, 1537–1549. [CrossRef] [PubMed] 44. Early Detection of Lung Cancer Using DNA Promoter Hypermethylation in Plasma and Sputum|Clinical Cancer Re- search|American Association for Cancer Research. Available online: https://aacrjournals.org/clincancerres/article/23/8/1998/ 123278/Early-Detection-of-Lung-Cancer-Using-DNA-Promoter (accessed on 26 August 2023). 45. Li, X.; Li, X.; Chen, S.; Wu, Y.; Liu, Y.; Hu, T.; Huang, J.; Yu, J.; Pei, Z.; Zeng, T.; et al. TRAP1 Shows Clinical Signiﬁcance in the Early Diagnosis of Small Cell Lung Cancer. J. Inﬂamm. Res. 2021, 14, 2507–2514. [CrossRef] [PubMed] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Cancers Multidisciplinary Digital Publishing Institute http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/machine-learning-model-of-resnet50-ensemble-voting-for-malignant-RT21xbu14k

Loading next page...

References (54)

F. Farjah, Sarah Monsell, R. Greenlee, M. Gould, R. Smith-Bindman, Matthew Banegas, Kurt Schoen, A. Ramaprasan, D. Buist (2022)
Patient and Nodule Characteristics Associated with a Lung Cancer Diagnosis Among Individuals with Incidentally Detected Lung Nodules.
Chest
Hassan Mkindu, Longwen Wu, Yaqin Zhao (2023)
Lung nodule detection in chest CT images based on vision transformer network with Bayesian optimization
Biomed. Signal Process. Control., 85
Varun Srivastava, Shilpa Gupta, Gopal Chaudhary, Arun Balodi, Manju Khari, Vicente Díaz (2021)
An Enhanced Texture-Based Feature Extraction Approach for Classification of Biomedical Images of CT-Scan of Lungs
Int. J. Interact. Multim. Artif. Intell., 6
Hongyuan Huang, Zhi You, Huayu Cai, Jianfeng Xu, Dongxu Lin (2022)
Fast detection method for prostate cancer cells based on an integrated ResNet50 and YoloV5 framework
Computer methods and programs in biomedicine, 226
K. Chae, G. Jin, S. Ko, Yi Wang, Hao Zhang, E. Choi, H. Choi (2019)
Deep Learning for the Classification of Small (≤2 cm) Pulmonary Nodules on CT Imaging: A Preliminary Study.
Academic radiology
Wilson Bakasa, Serestina Viriri (2023)
VGG16 Feature Extractor with Extreme Gradient Boost Classifier for Pancreas Cancer Prediction
Journal of Imaging, 9
Incremental Beneﬁt of Maximum-Intensity-Projection Images on Observer Detection of Small Pulmonary Nodules Revealed by Multidetector CT|AJR
Wookjin Choi, J. Oh, S. Riyahi, Chia-Ju Liu, F. Jiang, Wengen Chen, C. White, A. Rimner, J. Mechalakos, J. Deasy, W. Lu (2018)
Radiomics analysis of pulmonary nodules in low‐dose CT for early detection of lung cancer
Medical Physics, 45
A New Model Based on Improved VGG16 for Corn Weed Identiﬁcation
Hassan Mkindu, Longwen Wu, Yaqin Zhao (2023)
Lung nodule detection of CT images based on combining 3D-CNN and squeeze-and-excitation networks
Multimedia Tools and Applications
(2022)
Application of XGBoost Algorithm in the Optimization of Pollutant Concentration
Atmos. Res., 276
Patient and Nodule Characteristics Associated with a Lung Cancer Diagnosis Among Individuals with Incidentally Detected Lung Nodules—ScienceDirect
Mengqing Mei, Zhiwei Ye, Y. Zha (2023)
An integrated convolutional neural network for classifying small pulmonary solid nodules
Frontiers in Neuroscience, 17
Lin Wang, Mengji Zhang, Xu-feng Pan, Ming-Na Zhao, Lin Huang, Xiaomeng Hu, Xueqing Wang, L. Qiao, Qiaomei Guo, Wanxing Xu, Wenli Qian, Tingjia Xue, X. Ye, Ming Li, H. Su, Yinglan Kuang, Xing Lu, Xin Ye, Kun Qian, J. Lou (2022)
Integrative Serum Metabolic Fingerprints Based Multi‐Modal Platforms for Lung Adenocarcinoma Early Detection and Pulmonary Nodule Classification
Advanced Science, 9
H. Zeng, Wanqing Chen, R. Zheng, Siwei Zhang, John Ji, X. Zou, C. Xia, K. Sun, Zhixun Yang, He Li, Ning Wang, R. Han, Shuzheng Liu, Huizhang Li, Hui-juan Mu, Yutong He, Yanjun Xu, Z. Fu, Yan Zhou, Jie Jiang, Yanlei Yang, Jianguo Chen, K. Wei, Dongmei Fan, Jian Wang, F. Fu, De-li Zhao, G. Song, Jianshun Chen, Chunxiao Jiang, Xin Zhou, Xiao-ping Gu, F. Jin, Qi-long Li, Yanhua Li, Tong Wu, Chun-cheng Yan, Jian-mei Dong, Z. Hua, P. Baade, F. Bray, A. Jemal, X. Yu, Jie He (2018)
Changing cancer survival in China during 2003-15: a pooled analysis of 17 population-based cancer registries.
The Lancet. Global health, 6 5
Early Detection of Lung Cancer Using DNA Promoter Hypermethylation in Plasma and Sputum|Clinical Cancer Research|American Association for Cancer Research
Evaluation of the Solitary Pulmonary Nodule: Size Matters, but Do Not Ignore the Power of Morphology | Insights into Imaging
K. Berfield, O. Afolayan, D. Wood (2019)
Management of Small Lung Nodules in the Era of Lung Cancer Screening.
JAMA surgery
(2023)
Early Detection of Lung Cancer Using DNA Promoter Hypermethylation in Plasma and Sputum|Clinical Cancer Research|
M. Shehab, N. Kahraman (2020)
A weighted voting ensemble of efficient regularized extreme learning machine
Comput. Electr. Eng., 85
Deep-Learning Model of ResNet Combined with CBAM for Malignant-Benign Pulmonary Nodules Classiﬁcation on Computed Tomography Images
Xinwu Du, Laiqiang Si, Pengfei Li, Zhihao Yun (2023)
A method for detecting the quality of cotton seeds based on an improved ResNet50 model
PLOS ONE, 18
Hongfeng Wang, Hai-qing Zhu, Lihua Ding, Kaili Yang (2023)
A diagnostic classification of lung nodules using multiple-scale residual network
Scientific Reports, 13
Md. Hossain, S. Hasan, Sazzad Iqbal, Md. Islam, Md. Akhtar, Iqbal Sarker (2022)
Transfer learning with fine-tuned deep CNN ResNet50 model for classifying COVID-19 from chest X-ray images
Informatics in Medicine Unlocked, 30
Management of Small Lung Nodules in the Era of Lung Cancer Screening|Lung Cancer|JAMA Surgery|JAMA Network
An Assisted Diagnosis System for Detection of Early Pulmonary Nodule in Computed Tomography Images|SpringerLink
A. Sharma, Amita Nandal, Arvind Dhaka, Deepika Koundal, D. Bogatinoska, Hashem Alyami (2022)
Enhanced Watershed Segmentation Algorithm-Based Modified ResNet50 Model for Brain Tumor Detection
BioMed Research International, 2022
Xiuliang Guan, Yue Du, R. Ma, Nan Teng, Shu Ou, Hui Zhao, Xiaofeng Li (2023)
Construction of the XGBoost model for early lung cancer prediction based on metabolic indices
BMC Medical Informatics and Decision Making, 23
Jikui Liu, Hongyang Jiang, Mengdi Gao, Chenguang He, Yu Wang, Pu Wang, He Ma, Ye Li (2017)
An Assisted Diagnosis System for Detection of Early Pulmonary Nodule in Computed Tomography Images
Journal of Medical Systems, 41
Early Detection of Lung Cancer Using DNA Promoter Hypermethylation in Plasma and Sputum
Yutong Xie, Yong Xia, Jianpeng Zhang, Yang Song, D. Feng, M. Fulham, Weidong Cai (2019)
Knowledge-based Collaborative Deep Learning for Benign-Malignant Lung Nodule Classification on Chest CT
IEEE Transactions on Medical Imaging, 38
Advanced Defensive Distillation with Ensemble Voting and Noisy Logits|SpringerLink
Shruti Jain (2020)
Computer Aided Detection system for the Classification of Non Small Cell Lung Lesions using SVM.
Current computer-aided drug design
Juan Lyu, Xiaojun Bi, S. Ling (2020)
Multi-Level Cross Residual Network for Lung Nodule Classification
Sensors (Basel, Switzerland), 20
Edoardo Redivo, C. Viroli, A. Farcomeni (2023)
Quantile-distribution functions and their use for classification, with application to naïve Bayes classifiers
Statistics and Computing, 33
Lung Cancer Screening Considerations During Respiratory Infection Outbreaks, Epidemics or Pandemics: An International Association for the Study of Lung Cancer Early Detection and Screening Committee Report—ScienceDirect
V. Rajinikanth, Seifedine Kadry, P. Moreno-Ger (2023)
ResNet18 Supported Inspection of Tuberculosis in Chest Radiographs With Integrated Deep, LBP, and DWT Features
Int. J. Interact. Multim. Artif. Intell., 8
Xinzhuo Zhao, Liyao Liu, Shouliang Qi, Yueyang Teng, Jianhua Li, W. Qian (2018)
Agile convolutional neural network for pulmonary nodule classification using CT images
International Journal of Computer Assisted Radiology and Surgery, 13
Peng Huang, Seyoun Park, Rongkai Yan, Junghoon Lee, L. Chu, C. Lin, Amira Hussien, J. Rathmell, Brett Thomas, Chen Chen, R. Hales, D. Ettinger, M. Brock, P. Hu, E. Fishman, E. Gabrielson, S. Lam (2018)
Added Value of Computer-aided CT Image Features for Early Lung Cancer Diagnosis with Small Pulmonary Nodules: A Matched Case-Control Study.
Radiology, 286 1
A. Snoeckx, P. Reyntiens, D. Desbuquoit, M. Spinhoven, P. Schil, J. Meerbeeck, P. Parizel (2017)
Evaluation of the solitary pulmonary nodule: size matters, but do not ignore the power of morphology
Insights into Imaging, 9
Shifei Ding, Zhongzhi Shi, D. Tao, Bo An (2016)
Recent advances in Support Vector Machines
Neurocomputing, 211
B. Howard, Rustain Morgan, M. Thorpe, T. Turkington, J. Oldan, O. James, S. Borges-Neto (2017)
Comparison of Bayesian penalized likelihood reconstruction versus OS-EM for characterization of small pulmonary nodules in oncologic PET/CT
Annals of Nuclear Medicine, 31
Satheshkumar Kaliyugarasan, A. Lundervold, A. Lundervold (2021)
Pulmonary Nodule Classification in Lung Cancer from 3D Thoracic CT Scans Using fastai and MONAI
Int. J. Interact. Multim. Artif. Intell., 6
Xiaohan Li, Xu Li, Si-Mei Chen, Yang Wu, Yuhan Liu, Tingting Hu, Jiayi Huang, Jianlin Yu, Z. Pei, T. Zeng, L. Tan (2021)
TRAP1 Shows Clinical Significance in the Early Diagnosis of Small Cell Lung Cancer
Journal of Inflammation Research, 14
Yuting Liang, Reza Samavi (2022)
Advanced defensive distillation with ensemble voting and noisy logits
Applied Intelligence, 53
Abdulaziz Alshammari (2022)
Construction of VGG16 Convolution Neural Network (VGG16_CNN) Classifier with NestNet-Based Segmentation Paradigm for Brain Metastasis Classification
Sensors (Basel, Switzerland), 22
H. Kadara, L. Tran, Bin Liu, Anil Vachani, Shuo Li, Ansam Sinjab, X. Zhou, S. Dubinett, K. Krysan (2021)
Early Diagnosis and Screening for Lung Cancer.
Cold Spring Harbor perspectives in medicine
S. Althubiti, Fayadh Alenezi, S. Shitharth, S. K., Chennareddy Reddy (2022)
Circuit Manufacturing Defect Detection Using VGG16 Convolutional Neural Networks
Wireless Communications and Mobile Computing
R. Eberhardt, A. Ernst, F. Herth (2009)
Ultrasound-guided transbronchial biopsy of solitary pulmonary nodules less than 20 mm
European Respiratory Journal, 34
Keyan Cao, Hangbo Tao, Zhiqiong Wang, Xi Jin (2023)
MSM-ViT: A multi-scale MobileViT for pulmonary nodule classification using CT images
Journal of X-Ray Science and Technology, 31
C. Mantas, Francisco Castellano, Serafín Moral-García, J. Abellán (2018)
A comparison of random forest based algorithms: random credal random forest versus oblique random forest
Soft Computing, 23
Jiangtao Li, X. An, Qingyong Li, Chao Wang, Haomin Yu, Xinyuan Zhou, Yangli-ao Geng (2022)
Application of Xgboost Algorithm in the Optimization of Pollutant Concentration
SSRN Electronic Journal
Rong-Sheng Liu, J. Ye, Yang Yu, Zhi-Yan Yang, Jun Lin, Xiao-dong Li, Tian-Shou Qin, Daiqin Tao, Wei Song, G. Wang, Jun Peng (2023)
The predictive accuracy of CT radiomics combined with machine learning in predicting the invasiveness of small nodular lung adenocarcinoma
Translational Lung Cancer Research, 12
A New Model Based on Improved VGG16 for Corn Weed Identification, Frontiers in Plant Science—X-MOL

Publisher: Multidisciplinary Digital Publishing Institute
Copyright: © 1996-2023 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. Terms and Conditions Privacy Policy
ISSN: 2072-6694
DOI: 10.3390/cancers15225417
Publisher site: See Article on Publisher Site

Abstract

cancers Article Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classiﬁcation on Computed Tomography Images 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 1 , 2 Weiming Li , Siqi Yu , Runhuang Yang , Yixing Tian , Tianyu Zhu , Haotian Liu , Danyang Jiao , 1 , 2 1 , 2 1 , 2 3 4 4 1 , 2 , Feng Zhang , Xiangtong Liu , Lixin Tao , Yan Gao , Qiang Li , Jingbo Zhang and Xiuhua Guo * Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, Beijing 100069, China; [email protected] (W.L.); [email protected] (S.Y.); [email protected] (R.Y.); [email protected] (Y.T.); [email protected] (T.Z.); [email protected] (H.L.); [email protected] (D.J.); [email protected] (F.Z.); [email protected] (X.L.); [email protected] (L.T.) Beijing Municipal Key Laboratory of Clinical Epidemiology, Capital Medical University, Beijing 100069, China Department of Nuclear Medicine, Xuanwu Hospital Capital Medical University, Beijing 100053, China; [email protected] Beijing Physical Examination Center, Beijing 100050, China; [email protected] (Q.L.); [email protected] (J.Z.) * Correspondence: [email protected] Simple Summary: Machine learning methods have shown promise in accurately identifying small lung nodules. However, further exploration is needed to fully harness the potential of machine learning in distinguishing between benign and malignant nodules. This study aimed to develop and evaluate a ResNet50-Ensemble Voting model for detecting the nature (benign or malignant) of small pulmonary nodules (less than 20 mm) based on CT images. This study involved 834 CT imaging data from 396 patients with small pulmonary nodules. CT image features were extracted using ResNet50 and VGG16 algorithms, and classiﬁcation was performed using XGBoost, SVM, and Ensemble Voting Citation: Li, W.; Yu, S.; Yang, R.; Tian, techniques, incorporating ten different combinations of machine learning classiﬁers. Among the Y.; Zhu, T.; Liu, H.; Jiao, D.; Zhang, F.; Liu, X.; Tao, L.; et al. Machine models tested, the ResNet50-Ensemble Voting algorithm demonstrated the highest performance Learning Model of in the test set, achieving an accuracy of 0.943 (0.938, 0.948), with sensitivity and speciﬁcity values ResNet50-Ensemble Voting for of 0.964 and 0.911, respectively. The implementation of machine learning models, particularly the Malignant–Benign Small Pulmonary ResNet50-Ensemble Voting approach, showed excellent performance in accurately identifying benign Nodule Classiﬁcation on Computed and malignant small pulmonary nodules (less than 20 mm) from diverse sources. These models Tomography Images. Cancers 2023, have the potential to assist doctors in accurately diagnosing the nature of early-stage lung nodules in 15, 5417. https://doi.org/10.3390/ clinical practice. cancers15225417 Academic Editors: Maria Li Lung Abstract: Background: The early detection of benign and malignant lung tumors enabled patients to and Josephine Ko diagnose lesions and implement appropriate health measures earlier, dramatically improving lung cancer patients’ quality of living. Machine learning methods performed admirably when recognizing Received: 29 August 2023 small benign and malignant lung nodules. However, exploration and investigation are required to Revised: 21 September 2023 fully leverage the potential of machine learning in distinguishing between benign and malignant Accepted: 26 September 2023 Published: 15 November 2023 small lung nodules. Objective: The aim of this study was to develop and evaluate the ResNet50- Ensemble Voting model for detecting the benign and malignant nature of small pulmonary nodules (<20 mm) based on CT images. Methods: In this study, 834 CT imaging data from 396 patients with small pulmonary nodules were gathered and randomly assigned to the training and validation sets in Copyright: © 2023 by the authors. an 8:2 ratio. ResNet50 and VGG16 algorithms were utilized to extract CT image features, followed by Licensee MDPI, Basel, Switzerland. XGBoost, SVM, and Ensemble Voting techniques for classiﬁcation, for a total of ten different classes This article is an open access article of machine learning combinatorial classiﬁers. Indicators such as accuracy, sensitivity, and speciﬁcity distributed under the terms and were used to assess the models. The collected features are also shown to investigate the contrasts conditions of the Creative Commons between them. Results: The algorithm we presented, ResNet50-Ensemble Voting, performed best Attribution (CC BY) license (https:// in the test set, with an accuracy of 0.943 (0.938, 0.948) and sensitivity and speciﬁcity of 0.964 and creativecommons.org/licenses/by/ 4.0/). 0.911, respectively. VGG16-Ensemble Voting had an accuracy of 0.887 (0.880, 0.894), with a sensitivity Cancers 2023, 15, 5417. https://doi.org/10.3390/cancers15225417 https://www.mdpi.com/journal/cancers Cancers 2023, 15, 5417 2 of 13 and speciﬁcity of 0.952 and 0.784, respectively. Conclusion: Machine learning models that were implemented and integrated ResNet50-Ensemble Voting performed exceptionally well in identifying benign and malignant small pulmonary nodules (<20 mm) from various sites, which might help doctors in accurately diagnosing the nature of early-stage lung nodules in clinical practice. Keywords: ResNet50; ensemble voting; XGBoost; small pulmonary nodules; pulmonary cancer 1. Introduction Currently, lung cancer remains one of the leading causes of cancer-related mortality worldwide. It was estimated that there would be about 1.8 million deaths from lung cancer in 2020, accounting for 18% of all cancer deaths [1]. Screening for benign and malignant lung nodules in the early stages of lung cancer could increase the 5-year survival rate from 16.1% to 19.7% [2]. Therefore, the early detection and accurate classiﬁcation of pulmonary nodules are crucial for improving patient prognosis. The gold standard cannot be relied on to distinguish the benign or malignant nature of small lung nodules in the early stages of lung cancer, because the nodules are too small to acquire pathologic evidence of the lung nodules [3]. Transbronchial biopsies of isolated pulmonary nodules (SPNs) smaller than 20 mm are typically performed under ﬂuoroscopic guidance, but there is great variability in the availability of pathologic tissue [4]. As a result of its high resolution and non-invasive nature, computed tomography (CT) imaging has emerged as a signiﬁcant tool in the identiﬁcation and therapy of lung nodules [5]. A study demonstrated that annual lung cancer screening using CT imaging reduced lung cancer mortality by 20% [6]. However, radiologists continued to struggle with reliably discriminating between malignant and benign, small pulmonary nodules based only on CT imaging. Computer- assisted diagnostic tools had the potential to improve the detection and screening of benign and malignant lung nodules [7]. Kaliyugarasan applied a new extension of the fastai deep learning framework to a 3D medical imaging task and combined it with the MONAI deep learning library to achieve a ﬁnal classiﬁcation accuracy of 92.4% [8]. Zhao constructed a hybrid CNN of LeNet and AlexNet to distinguish benign and malignant lung nodules using CT images, and the accuracy and area under the curve reached 0.822 and 0.877, respectively, obtaining better results [9]. Keyan proposed an MSM-ViT model aiming to achieve promising performance in lung nodule classiﬁcation, solving the problems of the poor generalization of ViT structure and the difﬁculty in extracting multi-scale features, and the best accuracy of 94.04% was obtained [10]. Mkindu proposed 3D-CNN and squeeze-and-excitation networks, with the joint algorithm yielding the highest detection sensitivity of 98.65% [11]. Hassan proposed an automated computer-aided diagnosis (CAD) scheme for lung nodule detection based on the Vision Transformer architecture and Bayesian optimization, obtaining 98.39% of the highest detection sensitivity with a signiﬁcant reduction in network parameters [12]. However, the preceding research focused mostly on regular-sized lung nodules and did not investigate the model’s capabilities in small lung nodules (<20 mm). There are studies that have used Bayesian penalized likelihood reconstruction [13] or maximum-intensity projection [14] to enhance the representation of small lung nodules on CT images to improve clinicians’ identiﬁcation and the diagnosis of early-stage lung cancer. However, the diagnosis has not been made accurately to a certain extent. Kum used the modiﬁed AlexNET algorithm for the diagnosis of the nature of lung nodules smaller than 20 mm with an AUC value of 0.82, exploring the application of this method for the diagnosis of early lung nodules [15]. Mei introduced the Otsu thresholding algorithm to preprocess the data and ﬁlter the interfering information, obtaining nodule features, and parallel radiomics was added to the 3D convolutional neural network, reaching an AUC of 0.90 [16]. However, the capability to diagnose lung nodules in early stages remained insufﬁcient, and further exploration and enhancement were desirable. Liu achieved superior outcomes Cancers 2023, 15, 5417 3 of 13 by combining CT radiomics with machine learning to predict the invasiveness of small nodules [17]. Classical machine learning classiﬁcation algorithms such as XGBoost [18] and SVM (Support Vector Machine) [19] have achieved better performance in diagnosing the nature of lung nodules. However, the diagnosis of small lung nodules in the early stage needed to be further explored. Furthermore, machine learning classiﬁcation models based on feature extraction were further developed and explored. The Local Mesh Peak-Valley Edge Pattern (LMePVEP) technique for splicing-based feature extraction based on dynamic thresholding could improve the classiﬁcation accuracy by up to 12.56% [20]. However, the accuracy of this method for the diagnosis of the nature of lung nodules still needed to be promoted. The ResNet18 scheme combined with different classiﬁers helped to achieve better accuracy, such as the SoftMax (95.2%) classiﬁer and Decision Tree Classiﬁer (99%), in lung disease recognition [21]. Therefore, the concept of extracting features based on deep learning and combining different classiﬁers for disease classiﬁcation model construction was proven to be feasible. In this study, we utilized a combination of deep learning feature extraction and different classiﬁers to construct a fusion model to explore and improve the diagnostic capability of the benign and malignant nature of lung nodules (<20 mm) in the early stage. 2. Materials and Methods 2.1. Data Source From 2015 to 2019, 396 individuals were recruited for this study from four hospitals and two open access databases, and informed consent was obtained. All patients’ lung CT images were obtained in DICOM format, with a total of 934 layers involving pulmonary nodules. We adopted a questionnaire to collect clinician diagnoses and basic demographic information after analyzing patients’ medical records and admission data. A checklist of the subjects and the images is shown in Table 1. Table 1. The checklist of subjects and images. Database Subjects (n, %) Images (n, %) Beijing Chest Hospital 43 (10.86) 89 (10.67) Beijing Cancer Hospital 106 (26.77) 228 (27.37) Xuanwu Hospital 96 (24.24) 204 (24.46) Beijing Physical Examination Center 79 (19.95) 175 (20.98) TCGA Public Database 26 (6.57) 51 (6.12) LIDC-IDRI 46 (11.62) 87 (10.44) Total 396 (100.00) 834 (100.00) It was further analyzed whether there was a difference in age and gender between patients with benign or malignant lung nodules. However, the results showed that no statistical difference was found between the two, which is shown in Table 2. Table 2. Comparison of clinical information between benign group and malignant group. Clinic Information Benign (n = 154) Malignant (n = 242) p Age (years, mean SD) 61.43 12.38 68.42 10.29 0.057 Gender (n, %) 0.903 Male 81 (0.53) 135 (0.56) Female 73 (0.47) 107 (0.44) a b t-test was used for the distribution difference of continuous variables. Chi-square test was used for the distribution difference of categorical variables. 2.1.1. Inclusion/Exclusion Criteria Inclusion criteria: The subjects of this study should be adults (age 18 years); Cancers 2023, 15, x FOR PEER REVIEW 4 of 14 a b t-test was used for the distribution diﬀerence of continuous variables. Chi-square test was used for the distribution diﬀerence of categorical variables. 2.1.1. Inclusion/Exclusion Criteria Cancers 2023, 15, 5417 4 of 13 Inclusion criteria: • The subjects of this study should be adults (age ≥ 18 years); • In order to ensure the integrity of the information in the lung nodule images, the In order to ensure the integrity of the information in the lung nodule images, the number of CT images containing nodules should not be less than 2 per patient; number of CT images containing nodules should not be less than 2 per patient; • Clear physician’s diagnostic report was available; Clear physician’s diagnostic report was available; • Small pulmonary nodules less than 20 mm in size for which a deﬁnitive pathologic Small pulmonary nodules less than 20 mm in size for which a deﬁnitive pathologic diagnosis cannot be made. diagnosis cannot be made. Exclusion criteria: Exclusion criteria: • Patients treated with chemo-radiotherapy or surgery; Patients treated with chemo-radiotherapy or surgery; • Images of nodules that were diﬃcult to segment; Images of nodules that were difﬁcult to segment; • The size of the lung nodule was above 20 mm. The size of the lung nodule was above 20 mm. 2.1.2. Diagnostic Criteria 2.1.2. Diagnostic Criteria This study utilized the gold standard for lung nodules with a clear pathologic diag- This study utilized the gold standard for lung nodules with a clear pathologic diagnosis nosis of the nature of small lung nodules, and in instances when a pathologic diagnosis of the nature of small lung nodules, and in instances when a pathologic diagnosis could could not be obtained due to the small size of the lung nodule, the diagnostic report based not be obtained due to the small size of the lung nodule, the diagnostic report based on the on the clinician’s a priori knowledge prevailed. The Chinese Expert Consensus on the Di- clinician’s a priori knowledge prevailed. The Chinese Expert Consensus on the Diagnosis agnosis and Treatment of Lung Nodules (2018 edition) contains detailed diagnostic crite- and Treatment of Lung Nodules (2018 edition) contains detailed diagnostic criteria for ria for lung nodules. lung nodules. 2.2. Research Design Process 2.2. Research Design Process In this study, data from the six aforementioned databases of patients with small pul- In this study, data from the six aforementioned databases of patients with small monary nodules were collected and acquired from ﬁnished CT scans of small pulmonary pulmonary nodules were collected and acquired from ﬁnished CT scans of small pulmonary nodules using the criteria. Image preprocessing techniques such as normalization were nodules using the criteria. Image preprocessing techniques such as normalization were used after initially identifying the region of interest (ROI) of lung nodules according to used after initially identifying the region of interest (ROI) of lung nodules according to expert clinicians. Feature extraction was performed on the acquired CT images of the lung expert clinicians. Feature extraction was performed on the acquired CT images of the lung nodule region of interest, mostly using ResNet50 and VGG16. The nodules were then cat- nodule region of interest, mostly using ResNet50 and VGG16. The nodules were then egorized as benign or malignant using ﬁve diﬀerent classiﬁers. The dataset was divided categorized as benign or malignant using ﬁve different classiﬁers. The dataset was divided into two parts: the training set (80%) and the validation set (20%). Finally, the model was into two parts: the training set (80%) and the validation set (20%). Finally, the model was evaluated in terms of accuracy, AUC value, speciﬁcity, and sensitivity. The speciﬁc pro- evaluated in terms of accuracy, AUC value, speciﬁcity, and sensitivity. The speciﬁc process cess is shown in Figure 1. is shown in Figure 1. Figure 1. Flowchart for the design of a machine diagnostic model for benign and malignant pul- Figure 1. Flowchart for the design of a machine diagnostic model for benign and malignant pulmo- monary nodules. nary nodules. 2.3. Image Preprocessing In this study, each CT image of the small lung nodule was taken as the object of this study. Semi-automatic segmentation of the whole CT image was performed by two experienced radiologists using MATLAB 2017 to segment the region of interest (ROI) using region growing method. As a result, one ROI image was obtained from each CT image. The image was also resized on the basis of the sub-base, and the resizing was set to be 32 32. Processing of the already intercepted images of small lung nodules was performed Cancers 2023, 15, 5417 5 of 13 by means of the Adaptive Histogram Equalization (AHE) algorithm. The parameters of the AHE algorithm were set to clipLimit = 2.0 and tileGridSize = (8, 8). clipLimit controls the degree of limitation of the contrast enhancement, and tileGridSize deﬁnes the equalization region of the image. The method is based on conventional histogram equalization, where the image is divided into small blocks and histogram equalization is performed within each block to avoid introducing discontinuities between blocks. Eventually, noise reduction was performed using median ﬁltering, which is a ﬁltering method based on sorting statistics that uses the median value in the neighborhood around the pixel to replace the current pixel value. Median ﬁltering is effective for removing pretzel noise or impulse noise, as well as preserving edges and details. 2.4. Deep Learning Algorithm Recognizing benign and malignant lung nodules remains a popular classiﬁcation job in machine vision. In general, image recognition consists of two crucial stages: picture feature extraction and feature categorization. The goal of image feature extraction is to convert the original picture data into a more expressive and identiﬁable feature representa- tion. Picture characteristics can be extracted to reduce the dimensionality of picture data, eliminate extraneous information, and choose important image information. Deep learning algorithms trained on large-scale datasets extract high-level semantic characteristics from photos. ResNet50 and VGG16, two common examples of convolutional neural networks, have a signiﬁcant advantage in visual feature extraction. 2.4.1. ResNet50 ResNet50 addresses the vanishing gradients problem in deep neural networks, which use residual connections, allowing the network to learn residual mappings [22]. The connections avoid layers, which reduce the deterioration in deep networks. It contains 50 layers, which include convolutional, pooling, fully connected, and shortcut layers. ResNet50 is composed of a number of residual blocks with convolutional layers and shortcuts. The direct gradient ﬂow is facilitated by the shortcut connectors [23]. The hyperparameters for extracting image features for the ResNet50 model mainly consist of two categories: weights and include_top. Weights set to ‘ImageNet’ indicates that weights pre-trained on the ImageNet dataset are used to help improve the performance and generalization of the model. Include_top set to False indicates that the top fully connected layer is not included. The hyperparameters of the ResNet50 model for extracting image features are shown speciﬁcally in Supplementary File, Table S1. 2.4.2. VGG16 VGG16 is a convolutional neural network (CNN) architecture designed to build a deep network with a consistent architecture composed of repeated convolutional layers followed by max-pooling layers for spatial downsampling. By gradually increasing the depth while keeping the ﬁlter size modest (3 3), the network intends to learn hierarchical representations of pictures [24]. When compared to larger ﬁlters, the usage of tiny ﬁlters allows for a deeper network with fewer parameters. VGG16 s structure is distinguished by its depth, as the name suggests. It includes 16 layers, including 13 convolutional layers and 3 fully linked layers. The convolutional layers are divided into ﬁve blocks, each with several convolutional layers followed by a max-pooling layer. The completely linked layers at the network’s conclusion are in charge of categorization [25]. The parameterization of the VGG16 model is consistent with ResNet50. It is also pre-trained by ImageNet. The speciﬁc settings of VGG16 are detailed in Supplementary File, Table S2. 2.5. Machine Learning Classiﬁers The classifiers setup is a vital task in machine learning that entails categorizing instances based on specified input data. ResNet50 and VGG16 have their own classification capabil- ities. However, these are frequently insufficient in categorizing finer pictures such as CT Cancers 2023, 15, 5417 6 of 13 scans of lung nodules. In this work, we used the following five approaches as ResNet50 and VGG16 classifiers to build a fusion model to increase the model’s classification capabilities. 2.5.1. Ensemble Voting Ensemble Voting, a machine learning technique that integrates the predictions of numerous models to produce a ﬁnal choice, was one of the methods utilized. It was founded on the idea that combining the predictions of many models might frequently result in better overall performance than using a single model alone. Ensemble Voting is widely utilized in machine learning problems like as classiﬁcation and regression [26]. There are different types of voting schemes, such as majority voting, weighted voting, and soft voting. In majority voting, each model in the ensemble casts a single vote for its predicted class label, and the class label with the majority of votes is chosen as the ﬁnal prediction [27]. The Ensemble Voting classiﬁers are composed of RandomForestClassiﬁer, XGBClassiﬁer, SVC (Support Vector Machine Classiﬁer), and GaussianNB. Voting = ‘soft’ indicates the utilization of a soft voting model, implying that when classiﬁcation is performed, the predictions of the base classiﬁer are converted into probability estimates for the categories, and the best of these probabilities are voted on as the ﬁnal classiﬁcation result. 2.5.2. Random Forest Random Forest is based on the principle of ensemble learning, in which decision trees are trained separately on various subsets of data. Each Random Forest decision tree is built with a random selection of features and a bootstrapped sample of the original data. The ﬁnal prediction is formed by collecting all of the individual tree forecasts via voting (for classiﬁcation) or averaging (for regression) [28]. This randomness helps to capture different aspects of the data and improves the overall performance of the ensemble. Random Forest consists of multiple decision trees. It constructs multiple independent decision trees through random sampling and feature selection, and then produces integrated predictions by voting or averaging. Random Forest reduces overﬁtting, has good generalization ability, and evaluates feature importance. It is suitable for classiﬁcation and regression problems and provides stable and accurate predictions. The parameter settings for the Random Forest classiﬁer were as follows: n_estimators, 100; min_samples_leaf, 1; min_samples_split, 2; and bootstrap, True. 2.5.3. XGBoost XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm known for its efﬁciency and performance in both regression and classiﬁcation tasks. XG- Boost builds an ensemble of decision trees sequentially, where each tree corrects the mis- takes made by the previous trees. The algorithm focuses on optimizing a speciﬁc loss function while regularizing the model to prevent overﬁtting [29]. In each iteration, XG- Boost calculates a gradient based on the difference between the current model’s prediction and the true value, and uses this gradient to adjust the model parameters. Each new decision tree tries to correct the errors of all the previous trees and is constructed taking into account the prediction errors of the previous trees. This iterative process reduces the error and improves the predictive performance of the model gradually. The parameters of the XGBoost classiﬁer were set as follows: binary using logisti regression; max_depth, 10; learning_rate, 0.01; and n_estimators, 100. 2.5.4. SVM Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classiﬁcation and regression tasks, which can handle both linearly separable and non-linearly separable data by using different kernel functions to transform the data into a higher-dimensional space [30]. SVM’s structure includes identifying support vectors, which are the data points closest to the decision border or hyperplane. These support vectors are critical in establishing the decision boundary and making forecasts. Depending Cancers 2023, 15, 5417 7 of 13 on the situation at hand, SVM might have a linear or non-linear decision boundary, which is performed by selecting an appropriate kernel function. The SVM classiﬁer (SVC) param- eters were set as follows: strength of regularization parameters C, 1.0; break_ties, False; cache_size, 200; degree, 3; and kernel, rbf (radial basis function). 2.5.5. Naïve Bayes Naïve Bayes is a simple yet powerful machine learning algorithm based on Bayesian probability. The principle behind Nave Bayes is to utilize Bayes’ theorem to estimate the likelihood of a speciﬁc class given the observed features [31]. Given the input characteristics, it estimates the conditional probability of each class and chooses the class with the highest probability as the predicted class. The Nave Bayes structure entails creating a probabilistic model based on the training data. It calculates the prior probability of each class as well as the probability of detecting each characteristic given each class. This assumption simpliﬁes probability computation and enables effective training and prediction. The GaussianNB parameters are set as follows: priors, None; var_smoothing, 1 10 . 2.6. Feature Visualization The collected feature information from ResNet50 and VGG16 was displayed in this study using t-SNE and feature ranking algorithms. The use of t-SNE lowered the di- mensionality of the global features from 256 to 2, allowing the features to be shown on a two-dimensional scatter plot. Each data point on the plot represented a sample, and examining their spatial arrangement revealed information about the samples’ grouping, closeness, or dispersion depending on their learning attributes. This visual analysis proved useful in determining the features’ discriminative strength and separability. 2.7. Statistical Analysis The statistical descriptions of patient information are presented as the mean and the standard deviation (SD) or percentage; R 4.0.3 software was used to perform the test or t-test for the basic clinical data of patients and images. The difference was statistically signiﬁcant at p < 0.05. Considering that deletion of observations containing missing values would result in loss of data and may affect the accuracy and reliability of subsequent analyses or modeling, the Morphological Characteristics Random Forest method was chosen to ﬁll in the missing values. To evaluate the classiﬁcation performance in the valid set, accuracy, sensitivity, speciﬁcity, positive predictive value (PPV), negative predictive value (NPV), and F1-score were calculated. In the present study, the threshold for sensitivity and speciﬁcity values was determined to be 0.5. Since we considered the current study to address the differentiation of the benign and malignant nature of lung nodules, regardless of the category to which they belonged, it is of great signiﬁcance. Mean Absolute Error (MAE) is presented to evaluate the average of the distances between the model predictions and the true values of the samples. Curves from the receiver operating characteristics (ROC) were plotted to visually compare the differences between the models. 3. Results 3.1. Combined Machine Learning Models In this study, we utilized ResNet50 and VGG16 as the basis of feature extraction classiﬁers, which were federated with Ensemble Voting, XGBoost, Random Forest, SVM, and Naïve Bayes to form a new machine learning classiﬁcation process. In this study, a total of 2048 features were extracted using the last convolutional layer (layer 6) of ResNet50, while a total of 512 features were extracted using the last convolutional layer (layer 5) of VGG16. The feature ﬁltering was performed through the XGBoost process and ﬁnally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, respectively. As shown in Table 3, ResNet50-Ensemble Voting achieved the best performance with an accuracy of 0.943 (0.938, 0.948) and sensitivity and speciﬁcity of 0.964 and 0.911, respectively. Cancers 2023, 15, x FOR PEER REVIEW 8 of 14 true values of the samples. Curves from the receiver operating characteristics (ROC) were plott ed to visually compare the diﬀerences between the models. 3. Results 3.1. Combined Machine Learning Models In this study, we utilized ResNet50 and VGG16 as the basis of feature extraction clas- siﬁers, which were federated with Ensemble Voting, XGBoost, Random Forest, SVM, and Naïve Bayes to form a new machine learning classiﬁcation process. In this study, a total of 2048 features were extracted using the last convolutional layer (layer 6) of ResNet50, while a total of 512 features were extracted using the last convolutional layer (layer 5) of VGG16. The feature ﬁltering was performed through the XGBoost process and ﬁnally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, respectively. As Cancers 2023, 15, 5417 8 of 13 shown in Table 3, ResNet50-Ensemble Voting achieved the best performance with an ac- curacy of 0.943 (0.938, 0.948) and sensitivity and speciﬁcity of 0.964 and 0.911, respectively. It was not only higher than that of the ResNet50 deep learning model, but also bett er than It was not only higher than that of the ResNet50 deep learning model, but also better than those of the comparative models with improved classiﬁers such as ResNet50-XGBoost. those of the comparative models with improved classiﬁers such as ResNet50-XGBoost. From a global perspective, the classiﬁcation levels of the fusion models with the ResNet50 From a global perspective, the classiﬁcation levels of the fusion models with the ResNet50 model as the feature extraction were signiﬁcantly superior to those of VGG16. Then, the model as the feature extraction were signiﬁcantly superior to those of VGG16. Then, the screening pass features were classiﬁed. The best AUC value (ResNet50-SVM) achieved screening pass features were classiﬁed. The best AUC value (ResNet50-SVM) achieved was was 0.91. In the ROC curve, each of the operating points was optimized, which also indi- 0.91. In the ROC curve, each of the operating points was optimized, which also indicates cates the comprehensive standard of the method. The ROC curves of classiﬁcation models the comprehensive standard of the method. The ROC curves of classiﬁcation models are are plott ed in Figure 2. plotted in Figure 2. Table 3. Classiﬁcation and diagnosis of diabetic nephropathy based on the migration model. Table 3. Classiﬁcation and diagnosis of diabetic nephropathy based on the migration model. Models Accuracy Sensitivity Specificity PPV NPV AUC MAE F1-Score Models Accuracy Sensitivity Speciﬁcity PPV NPV AUC MAE F1-Score ResNet50 0.75 (0.73, 0.77) 0.82 0.66 0.78 0.71 0.81 0.27 0.80 ResNet50 VGG16 0.75 0.61 (0 (0.73, .59, 0.63) 0.77) 0.82 0.37 0.66 0.90 0.82 0.78 0.710.54 0.810.61 0.270.40 0.800.51 VGG16 0.61 (0.59, 0.63) 0.37 0.90 0.82 0.54 0.61 0.40 0.51 VGG16-Ensemble Voting 0.88 (0.88, 0.89) 0.95 0.78 0.74 0.54 0.77 0.11 0.91 VGG16-Ensemble Voting 0.88 (0.88, 0.89) 0.95 0.78 0.74 0.54 0.77 0.11 0.91 VGG16-XGBoost 0.74 (0.72, 0.77) 0.86 0.57 0.73 0.68 0.76 0.25 0.80 VGG16-XGBoost 0.74 (0.72, 0.77) 0.86 0.57 0.73 0.68 0.76 0.25 0.80 VGG16-Random Forest 0.73 (0.71, 0.75) 0.89 0.49 0.72 0.74 0.79 0.27 0.80 VGG16-Random Forest 0.73 (0.71, 0.75) 0.89 0.49 0.72 0.74 0.79 0.27 0.80 VGG16-SVM 0.72 (0.70, 0.75) 0.90 0.46 0.72 0.76 0.78 0.27 0.80 VGG16-SVM 0.72 (0.70, 0.75) 0.90 0.46 0.72 0.76 0.78 0.27 0.80 VGG16-Naïve Bayes 0.63 (0.61, 0.66) 0.69 0.54 0.69 0.55 0.66 0.37 0.69 VGG16-Naïve Bayes 0.63 (0.61, 0.66) 0.69 0.54 0.69 0.55 0.66 0.37 0.69 ResNet50-Ensemble Voting 0.94 (0.93, 0.94) 0.96 0.91 0.85 0.63 0.88 0.06 0.95 ResNet50-Ensemble Voting 0.94 (0.93, 0.94) 0.96 0.91 0.85 0.63 0.88 0.06 0.95 ResNet50-XGBoost 0.82 (0.80, 0.83) 0.89 0.70 0.82 0.81 0.90 0.18 0.86 ResNet50-XGBoost 0.82 (0.80, 0.83) 0.89 0.70 0.82 0.81 0.90 0.18 0.86 ResNet50-Random Forest 0.82 (0.80, 0.84) 0.92 0.66 0.81 0.86 0.89 0.19 0.86 ResNet50-Random Forest 0.82 (0.80, 0.84) 0.92 0.66 0.81 0.86 0.89 0.19 0.86 ResNet50-SVM ResNet50-SVM 0.83 0.83 (0 (0.82, .82, 0.85) 0.85) 0.93 0.93 0.690.69 0.82 0.82 0.860.86 0.910.91 0.170.17 0.870.87 ResNet50-Naïve Bayes 0.71 (0.69, 0.73) 0.75 0.66 0.77 0.63 0.75 0.29 0.76 ResNet50-Naïve Bayes 0.71 (0.69, 0.73) 0.75 0.66 0.77 0.63 0.75 0.29 0.76 Figure 2. The ROC curves of different combinations of classiﬁcation models in the test set. (a) Features extracted by ResNet50. (b) Features extracted by VGG16. 3.2. Feature Visualization In this study, feature ﬁltering was performed through the XGBoost process and ﬁnally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, ﬁltered by importance of all features over 0.001. To further explore signiﬁcant results in the image feature extraction results of small lung nodules, we performed further visualization of speciﬁc lung nodules. The t-SNE results demonstrate the variability in the extraction results by ResNet50 and VGG16. As shown in Figure 3, the distinction between benign and malignant lung nodules with characteristics was not discernible, but the labeled region suggested that the ResNet50 model retrieved more differentiated locales, implying that the ﬁnal classiﬁcation result of this model was likewise superior. Cancers 2023, 15, x FOR PEER REVIEW 9 of 14 Figure 2. The ROC curves of diﬀerent combinations of classiﬁcation models in the test set. (a) Fea- tures extracted by ResNet50. (b) Features extracted by VGG16. 3.2. Feature Visualization In this study, feature ﬁltering was performed through the XGBoost process and ﬁ- nally 233 and 213 features were ﬁltered in favor of the model’s classiﬁcation ability, ﬁl- tered by importance of all features over 0.001. To further explore signiﬁcant results in the image feature extraction results of small lung nodules, we performed further visualization of speciﬁc lung nodules. The t-SNE results demonstrate the variability in the extraction results by ResNet50 and VGG16. As shown in Figure 3, the distinction between benign and malignant lung nodules with characteristics was not discernible, but the labeled re- gion suggested that the ResNet50 model retrieved more diﬀerentiated locales, implying Cancers 2023, 15, 5417 9 of 13 that the ﬁnal classiﬁcation result of this model was likewise superior. Figure 3. Differential feature visualization of small lung nodules. (a) Features extracted by ResNet50. Figure 3. Diﬀerential feature visualization of small lung nodules. (a) Features extracted by Res- Net50. ( (b) Featur b) Features extracted es extracted by VGG16. by VGG16. The black The black b box is the ox is th area of e area of differential diﬀer featur ential feature e clustering. clustering. To further demonstrate the feature extraction differences between the two methods, To further demonstrate the feature extraction diﬀerences between the two methods, we categorized and presented the differential feature locations using the Identity Mapping we categorized and presented the diﬀerential feature locations using the Identity Map- method. Through Figure 4, we discovered that there was little differentiation between the ping method. Through Figure 4, we discovered that there was litt le diﬀerentiation be- Cancers 2023, 15, x FOR PEER REVIEW 10 of 14 two methods of extracting features as a whole; however, for small malignant lung nodules, tween the two methods of extracting features as a whole; however, for small malignant ResNet50 discovered more diversiﬁed features, demonstrating the efﬁciency of the feature lung nodules, ResNet50 discovered more diversiﬁed features, demonstrating the eﬃ- extraction strategy. However, the less relevant particular aspects were considered. ciency of the feature extraction strategy. However, the less relevant particular aspects were considered. Figure 4. Identity Mapping of visualization of the effectiveness of the learned features. (a) Features Figure 4. Identity Mapping of visualization of the eﬀectiveness of the learned features. (a) Features extracted extracted by R by ResNet50. esNet50. ( (b b)) Fe Featur atures extracte es extracted d by VGG16. by VGG16. Based on our ﬁndings, we ordered the features from most important to least important Based on our ﬁndings, we ordered the features from most important to least im- and selected the top 20 most important features recovered by the ResNet50 and VGG16 portant and selected the top 20 most important features recovered by the ResNet50 and algorithms. Among the ResNet50 results, Feature 867, Feature 869, and Feature 438 were VGG16 algorithms. Among the ResNet50 results, Feature 867, Feature 869, and Feature determined to be the most important. In the instance of VGG16, Feature 228, Feature 277, 438 were determined to be the most important. In the instance of VGG16, Feature 228, and Feature 439 were selected as the most essential characteristics, in that order. The results Feature 277, and Feature 439 were selected as the most essential characteristics, in that are shown in Figure 5. order. The results are shown in Figure 5. Figure 5. Feature importance ranking for feature screening via XGBoost. (a) Features extracted by ResNet50 and (b) features extracted by VGG16. 4. Discussion Accurate evaluation of the benign and malignant nature of small lung nodules (<20 mm) detected in CT is essential for the early diagnosis and management of lung cancer, and it has remained a challenging undertaking during clinical practice [32]. In this study, we developed and validated a classiﬁcation diagnostic model combining deep learning and machine learning to distinguish between benign and malignant early lung nodules using CT images of small lung nodules from six diﬀerent databases. Our results demon- Cancers 2023, 15, x FOR PEER REVIEW 10 of 14 Figure 4. Identity Mapping of visualization of the eﬀectiveness of the learned features. (a) Features extracted by ResNet50. (b) Features extracted by VGG16. Based on our ﬁndings, we ordered the features from most important to least im- portant and selected the top 20 most important features recovered by the ResNet50 and VGG16 algorithms. Among the ResNet50 results, Feature 867, Feature 869, and Feature 438 were determined to be the most important. In the instance of VGG16, Feature 228, Feature 277, and Feature 439 were selected as the most essential characteristics, in that Cancers 2023, 15, 5417 10 of 13 order. The results are shown in Figure 5. Figure Figure 5. 5. Featur Feature importance ranking for fea e importance ranking for featu ture screening via XGBoost re screening via XGBoost.. ( (aa )) Features extracted by Features extracted by ResNet50 and (b) features extracted by VGG16. ResNet50 and (b) features extracted by VGG16. 4. Discussion 4. Discussion Accurate evaluation of the benign and malignant nature of small lung nodules Accurate evaluation of the benign and malignant nature of small lung nodules (<20 (<20 mm) detected in CT is essential for the early diagnosis and management of lung mm) detected in CT is essential for the early diagnosis and management of lung cancer, cancer, and it has remained a challenging undertaking during clinical practice [32]. In and it has remained a challenging undertaking during clinical practice [32]. In this study, this study, we developed and validated a classiﬁcation diagnostic model combining deep we developed and validated a classiﬁcation diagnostic model combining deep learning learning and machine learning to distinguish between benign and malignant early lung and machine learning to distinguish between benign and malignant early lung nodules nodules using CT images of small lung nodules from six different databases. Our results using CT images of small lung nodules from six diﬀerent databases. Our results demon- demonstrated that our proposed method, ResNet50-Ensemble Voting, achieved superior performance, reaching an accuracy of 0.943 (0.938, 0.948) along with a sensitivity of 0.964 and speciﬁcity of 0.911. In addition, ResNet50-SVM achieved an AUC of 0.91, and the accuracy attained 0.83 (0.82, 0.85). In the ROC curve, each of the operating points was optimized, which also indicates the comprehensive standard of the method. This result showed the competence of diagnosing the benign and malignant nature of small lung nod- ules in the validation set. This study further demonstrated the feature extraction capability of ResNet50 and VGG16, visualized the features, and compared the performance of the combined model in diagnosing the benign and malignant nature of lung nodules. The early detection and identiﬁcation of lung nodules are particularly critical and challenging, especially when the gold standard of pathological tissue is not available. In this context, the ResNet50-SVM and ResNet50-XGBoost models developed in this study made signiﬁcant contributions by selecting the best combination of feature extraction and classiﬁers. This could improve the diagnostic capabilities for small lung nodules and reduce the misdiagno- sis and missed diagnosis rates among clinicians. Ultimately, it provides clearer diagnostic guidance for patients in the early stages of lung cancer. In recent years, ResNet50 and VGG16 have been applied as the most classical CNN network models for diagnosis and recognition of diseases [22,33,34]. ResNet50 has a deeper network depth compared to traditional deep networks to better capture details and semantic information in images [35]. VGG16, on the other hand, is able to capture features at different scales by stacking multiple small convolutional kernels and pooling layers to increase the nonlinear expressiveness of the network [36]. Both methods have demonstrated competence in the diagnosis of the nature of pulmonary nodules. There are numerous studies that have utilized residual networks to classify lung cancer. One study excluded the results of a multilevel crossover residual network for lung nodule classiﬁcation, which could reach an 85.88% accuracy rate [37]. Zhang used ResNet as the basic framework combined with CBAM to classify conventional lung nodules, and the AUC could reach more than 0.95 [38]. Xie utilized the collaborative deep learning of knowledge in a staging chest CT of benign and malignant lung nodules with an accuracy of up to 95.70% [39]. Wang built a multi-scale residual network (MResNet) to accurately extract the Cancers 2023, 15, 5417 11 of 13 features of lung nodules and classiﬁed them in conjunction with deep learning, achieving an accuracy of 99.12% [40]. This shows that the research on regular lung nodules is well established, but the diagnosis of small, early lung nodules needs to be further clariﬁed. In addition, consideration and improvements should be made to the related research methods. The current study focused more on the nature of small lung nodules in the early stages of lung cancer. Size and growth are crucial factors in evaluating the malignant potential of a nodule. The likelihood of malignancy is positively correlated with nodule diameter, and therefore the importance of morphology in CT images should not be underestimated [41]. Farjah primarily discovered the relationship between lung cancer diagnosis and nodal features using multivariate analysis, and the created model had an AUC of 0.75, indicating that detection capacity needed to be improved further [42]. Wookjin classiﬁed the early imaging features of lung cancer by low-dose CT lung nodules with an AUC value of 0.89. Although this study targeted nodules in the early stages of lung cancer, it did not account for the speciﬁc size of the nodules [43]. DNA promoter hypermethylation was found to be diagnostic for early-stage lung cancer, and speciﬁc markers such as SOX17, TAC1, and HOXA7 were shown to be diagnostic at an AUC of 0.89 [44]. Tumor necrosis factor- receptor-associated protein (TRAP1) was also of signiﬁcance in the diagnostic process of lung nodules in the early stages of lung cancer, with an AUC value of approximately 0.835 [45]. Relevant biomarkers, despite displaying good performance, prevented screening from being applied to broad populations due to their expensive cost. On this premise, the current ﬁndings enhanced the diagnosis of early lung nodules. This study additionally demonstrated the characteristics from various viewpoints and attempted to investigate the capacity of various aspects to contribute. Not only is our proposed method noninvasive, but its cost is also readily acceptable compared to biomarkers. Nonetheless, our study had several drawbacks. First and foremost, because this was a study of small lung nodules, the gold standard could not be achieved. The aim was just to bring the method as close to the physician’s diagnostic level as possible. Second, the model was not combined and compared with radiologic features. Despite the fact that both were related to imaging, the method proposed in this study cannot directly provide information such as clinical indications such as the burr sign. Finally, one shortcoming of the technique was that it required a high number of precisely labeled counts, making data collection a greater challenge. 5. Conclusions In conclusion, the combined machine learning model ResNet50-Ensemble Voting showed remarkable performance in the identiﬁcation of benign and malignant small pulmonary nodules (<20 mm) from multiple centers. The combined feature visualization process further clariﬁes the variability in different features. The model can help clinicians accurately diagnose the nature of early-stage lung sub-nodules in clinical practice. Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers15225417/s1, Table S1: The ResNet50 model was pre-trained with ImageNet to extract hyper-parameter information of image features; Table S2: The VGG16 model was pre-trained with ImageNet to extract hyper-parameter information of image features. Author Contributions: Conceptualization, W.L., S.Y. and F.Z.; data curation, T.Z., H.L., D.J., Y.G. and Q.L.; formal analysis, S.Y., R.Y., Y.T. and L.T.; funding acquisition, X.G.; investigation, Y.T., T.Z. and D.J.; methodology, W.L. and F.Z.; project administration, J.Z. and X.G.; resources, T.Z., H.L., Y.G. and Q.L.; software, R.Y. and X.L.; supervision, X.L., J.Z. and X.G.; validation, X.L. and L.T.; visualization, W.L.; writing—original draft, W.L.; writing—review and editing, X.L. and L.T. All authors have read and agreed to the published version of the manuscript. Funding: This study was supported by the National Natural Science Foundation of China (grant number 82173617 and 82373683) and Beijing Medical Science and Technology Promotion Center (grant number KCZX-KT-002). Cancers 2023, 15, 5417 12 of 13 Institutional Review Board Statement: This study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Review Board of the Beijing Physical Examination Center (protocol code: 002, approval date: 24 March 2022). Informed Consent Statement: Considering the privacy of patient data, the section is not applicable. Data Availability Statement: The data presented in this study are available in this article (and Supplementary Materials). Conﬂicts of Interest: The authors declare no conﬂict of interest. References 1. Lung Cancer Screening Considerations During Respiratory Infection Outbreaks, Epidemics or Pandemics: An International Association for the Study of Lung Cancer Early Detection and Screening Committee Report—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/S1556086421033268 (accessed on 25 August 2023). 2. Zeng, H.; Chen, W.; Zheng, R.; Zhang, S.; Ji, J.S.; Zou, X.; Xia, C.; Sun, K.; Yang, Z.; Li, H.; et al. Changing Cancer Survival in China during 2003–15: A Pooled Analysis of 17 Population-Based Cancer Registries. Lancet Glob. Health 2018, 6, e555–e567. [CrossRef] 3. Wang, L.; Zhang, M.; Pan, X.; Zhao, M.; Huang, L.; Hu, X.; Wang, X.; Qiao, L.; Guo, Q.; Xu, W.; et al. Integrative Serum Metabolic Fingerprints Based Multi-Modal Platforms for Lung Adenocarcinoma Early Detection and Pulmonary Nodule Classiﬁcation. Adv. Sci. 2022, 9, 2203786. [CrossRef] 4. Eberhardt, R.; Ernst, A.; Herth, F.J.F. Ultrasound-Guided Transbronchial Biopsy of Solitary Pulmonary Nodules Less than 20 Mm. Eur. Respir. J. 2009, 34, 1284–1287. [CrossRef] [PubMed] 5. An Assisted Diagnosis System for Detection of Early Pulmonary Nodule in Computed Tomography Images|SpringerLink. Available online: https://link.springer.com/article/10.1007/s10916-016-0669-0?utm_source=xmol&utm_medium=afﬁliate& utm_content=meta&utm_campaign=DDCN_1_GL01_metadata (accessed on 25 August 2023). 6. Management of Small Lung Nodules in the Era of Lung Cancer Screening|Lung Cancer|JAMA Surgery|JAMA Network. Available online: https://jamanetwork.com/journals/jamasurgery/fullarticle/2719456 (accessed on 26 August 2023). 7. Huang, P.; Park, S.; Yan, R.; Lee, J.; Chu, L.C.; Lin, C.T.; Hussien, A.; Rathmell, J.; Thomas, B.; Chen, C.; et al. Added Value of Computer-Aided CT Image Features for Early Lung Cancer Diagnosis with Small Pulmonary Nodules: A Matched Case-Control Study. Radiology 2018, 286, 286–295. [CrossRef] [PubMed] 8. Kaliyugarasan, S.; Lundervold, A.; Lundervold, A.S. Pulmonary Nodule Classiﬁcation in Lung Cancer from 3D Thoracic CT Scans Using Fastai and MONAI. Int. J. Interact. Multimed. Artif. Intell. 2021, 6, 83. [CrossRef] 9. Zhao, X.; Liu, L.; Qi, S.; Teng, Y.; Li, J.; Qian, W. Agile Convolutional Neural Network for Pulmonary Nodule Classiﬁcation Using CT Images. Int. J. Comput. Ass. Rad. 2018, 13, 585–595. [CrossRef] 10. Cao, K.; Tao, H.; Wang, Z.; Jin, X. MSM-ViT: A Multi-Scale MobileViT for Pulmonary Nodule Classiﬁcation Using CT Images. J. X-ray Sci. Technol. 2023, 31, 731–744. [CrossRef] 11. Mkindu, H.; Wu, L.; Zhao, Y. Lung Nodule Detection of CT Images Based on Combining 3D-CNN and Squeeze-and-Excitation Networks. Multimed. Tools Appl. 2023, 82, 25747–25760. [CrossRef] 12. Mkindu, H.; Wu, L.; Zhao, Y. Lung Nodule Detection in Chest CT Images Based on Vision Transformer Network with Bayesian Optimization. Biomed. Signal Process. Control 2023, 85, 104866. [CrossRef] 13. Howard, B.A.; Morgan, R.; Thorpe, M.P.; Turkington, T.G.; Oldan, J.; James, O.G.; Borges-Neto, S. Comparison of Bayesian Penalized Likelihood Reconstruction versus OS-EM for Characterization of Small Pulmonary Nodules in Oncologic PET/CT. Ann. Nucl. Med. 2017, 31, 623–628. [CrossRef] 14. Incremental Beneﬁt of Maximum-Intensity-Projection Images on Observer Detection of Small Pulmonary Nodules Revealed by Multidetector CT|AJR. Available online: https://www.ajronline.org/doi/10.2214/ajr.179.1.1790149 (accessed on 25 August 2023). 15. Chae, K.J.; Jin, G.Y.; Ko, S.B.; Wang, Y.; Zhang, H.; Choi, E.J.; Choi, H. Deep Learning for the Classiﬁcation of Small (2 cm) Pulmonary Nodules on CT Imaging: A Preliminary Study. Acad. Radiol. 2020, 27, e55–e63. [CrossRef] [PubMed] 16. Mei, M.; Ye, Z.; Zha, Y. An Integrated Convolutional Neural Network for Classifying Small Pulmonary Solid Nodules. Front. Neurosci. 2023, 17, 1152222. [PubMed] 17. Liu, R.-S.; Ye, J.; Yu, Y.; Yang, Z.-Y.; Lin, J.-L.; Li, X.-D.; Qin, T.-S.; Tao, D.-P.; Song, W.; Wang, G.; et al. The Predictive Accuracy of CT Radiomics Combined with Machine Learning in Predicting the Invasiveness of Small Nodular Lung Adenocarcinoma. Transl. Lung Cancer Res. 2023, 12, 530–546. [CrossRef] [PubMed] 18. Guan, X.; Du, Y.; Ma, R.; Teng, N.; Ou, S.; Zhao, H.; Li, X. Construction of the XGBoost Model for Early Lung Cancer Prediction Based on Metabolic Indices. BMC Med. Inform. Decis. Mak. 2023, 23, 107. [CrossRef] 19. Jain, S. Computer-Aided Detection System for the Classiﬁcation of Non-Small Cell Lung Lesions Using SVM. Curr. Comput.-Aided Drug Des. 2021, 16, 833–840. [CrossRef] 20. Srivastava, V.; Gupta, S.; Chaudhary, G.; Balodi, A.; Khari, M.; García-Díaz, V. An Enhanced Texture-Based Feature Extraction Approach for Classiﬁcation of Biomedical Images of CT-Scan of Lungs. Int. J. Interact. Multimed. Artif. Intell. 2021, 6, 18. [CrossRef] Cancers 2023, 15, 5417 13 of 13 21. Rajinikanth, V.; Kadry, S.; Moreno-Ger, P. ResNet18 Supported Inspection of Tuberculosis in Chest Radiographs with Integrated Deep, LBP, and DWT Features. Int. J. Interact. Multimed. Artif. Intell. 2023, 8, 38. [CrossRef] 22. Sharma, A.K.; Nandal, A.; Dhaka, A.; Koundal, D.; Bogatinoska, D.C.; Alyami, H. Enhanced Watershed Segmentation Algorithm- Based Modiﬁed ResNet50 Model for Brain Tumor Detection. BioMed Res. Int. 2022, 2022, 7348344. [CrossRef] 23. Hossain, M.d.B.; Iqbal, S.M.H.S.; Islam, M.d.M.; Akhtar, M.d.N.; Sarker, I.H. Transfer Learning with Fine-Tuned Deep CNN ResNet50 Model for Classifying COVID-19 from Chest X-ray Images. Inform. Med. Unlocked 2022, 30, 100916. [CrossRef] 24. A New Model Based on Improved VGG16 for Corn Weed Identiﬁcation, Frontiers in Plant Science—X-MOL. Available online: https://www.x-mol.com/paper/1677428630847471616?adv (accessed on 29 August 2023). 25. Circuit Manufacturing Defect Detection Using VGG16 Convolutional Neural Networks. Available online: https://www.hindawi. com/journals/wcmc/2022/1070405/ (accessed on 29 August 2023). 26. Advanced Defensive Distillation with Ensemble Voting and Noisy Logits|SpringerLink. Available online: https://link.springer. com/article/10.1007/s10489-022-03495-3?utm_source=xmol&utm_medium=afﬁliate&utm_content=meta&utm_campaign= DDCN_1_GL01_metadata (accessed on 29 August 2023). 27. Shehab, M.A.; Kahraman, N. A Weighted Voting Ensemble of Efﬁcient Regularized Extreme Learning Machine. Comput. Electr. Eng. 2020, 85, 106639. [CrossRef] 28. Mantas, C.J.; Castellano, J.G.; Moral-García, S.; Abellán, J. A Comparison of Random Forest Based Algorithms: Random Credal Random Forest versus Oblique Random Forest. Soft Comput. 2019, 23, 10739–10754. [CrossRef] 29. Li, J.; An, X.; Li, Q.; Wang, C.; Yu, H.; Zhou, X.; Geng, Y. Application of XGBoost Algorithm in the Optimization of Pollutant Concentration. Atmos. Res. 2022, 276, 106238. [CrossRef] 30. Ding, S.; Shi, Z.; Tao, D.; An, B. Recent Advances in Support Vector Machines. Neurocomputing 2016, 211, 1–3. [CrossRef] 31. Redivo, E.; Viroli, C.; Farcomeni, A. Quantile-Distribution Functions and Their Use for Classiﬁcation, with Application to Naïve Bayes Classiﬁers. Statist. Comput. 2023, 33, 55. [CrossRef] 32. Kadara, H.; Tran, L.M.; Liu, B.; Vachani, A.; Li, S.; Sinjab, A.; Zhou, X.J.; Dubinett, S.M.; Krysan, K. Early Diagnosis and Screening for Lung Cancer. Cold Spring Harb. Perspect. Med. 2021, 11, a037994. [CrossRef] [PubMed] 33. Huang, H.; You, Z.; Cai, H.; Xu, J.; Lin, D. Fast Detection Method for Prostate Cancer Cells Based on an Integrated ResNet50 and YoloV5 Framework. Comput. Methods Programs Biomed. 2022, 226, 107184. [CrossRef] 34. Alshammari, A. Construction of VGG16 Convolution Neural Network (VGG16_CNN) Classiﬁer with NestNet-Based Segmenta- tion Paradigm for Brain Metastasis Classiﬁcation. Sensors 2022, 22, 8076. [CrossRef] 35. A Method for Detecting the Quality of Cotton Seeds Based on an Improved ResNet50 Model. Available online: https://pubmed. ncbi.nlm.nih.gov/36791128/ (accessed on 26 August 2023). 36. VGG16 Feature Extractor with Extreme Gradient Boost Classiﬁer for Pancreas Cancer Prediction. Available online: https: //pubmed.ncbi.nlm.nih.gov/37504815/ (accessed on 26 August 2023). 37. Lyu, J.; Bi, X.; Ling, S.H. Multi-Level Cross Residual Network for Lung Nodule Classiﬁcation. Sensors 2020, 20, 2837. [CrossRef] 38. Deep-Learning Model of ResNet Combined with CBAM for Malignant-Benign Pulmonary Nodules Classiﬁcation on Computed Tomography Images. Available online: https://pubmed.ncbi.nlm.nih.gov/37374292/ (accessed on 26 August 2023). 39. Xie, Y.; Xia, Y.; Zhang, J.; Song, Y.; Feng, D.; Fulham, M.; Cai, W. Knowledge-Based Collaborative Deep Learning for Benign- Malignant Lung Nodule Classiﬁcation on Chest CT. IEEE Trans. Med. Imaging 2019, 38, 991–1004. [CrossRef] 40. Wang, H.; Zhu, H.; Ding, L.; Yang, K. A Diagnostic Classiﬁcation of Lung Nodules Using Multiple-Scale Residual Network. Sci. Rep. 2023, 13, 11322. [CrossRef] 41. Evaluation of the Solitary Pulmonary Nodule: Size Matters, but Do Not Ignore the Power of Morphology | Insights into Imaging. Available online: https://link.springer.com/article/10.1007/s13244-017-0581-2?utm_source=xmol&utm_medium=afﬁliate& utm_content=meta&utm_campaign=DDCN_1_GL01_metadata (accessed on 26 August 2023). 42. Patient and Nodule Characteristics Associated with a Lung Cancer Diagnosis Among Individuals with Incidentally Detected Lung Nodules—ScienceDirect. Available online: https://www.sciencedirect.com/science/article/pii/S0012369222039009 (accessed on 26 August 2023). 43. Choi, W.; Oh, J.H.; Riyahi, S.; Liu, C.-J.; Jiang, F.; Chen, W.; White, C.; Rimner, A.; Mechalakos, J.G.; Deasy, J.O.; et al. Radiomics Analysis of Pulmonary Nodules in Low-Dose CT for Early Detection of Lung Cancer. Med. Phys. 2018, 45, 1537–1549. [CrossRef] [PubMed] 44. Early Detection of Lung Cancer Using DNA Promoter Hypermethylation in Plasma and Sputum|Clinical Cancer Re- search|American Association for Cancer Research. Available online: https://aacrjournals.org/clincancerres/article/23/8/1998/ 123278/Early-Detection-of-Lung-Cancer-Using-DNA-Promoter (accessed on 26 August 2023). 45. Li, X.; Li, X.; Chen, S.; Wu, Y.; Liu, Y.; Hu, T.; Huang, J.; Yu, J.; Pei, Z.; Zeng, T.; et al. TRAP1 Shows Clinical Signiﬁcance in the Early Diagnosis of Small Cell Lung Cancer. J. Inﬂamm. Res. 2021, 14, 2507–2514. [CrossRef] [PubMed] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Journal

Cancers – Multidisciplinary Digital Publishing Institute

Published: Nov 15, 2023

Keywords: ResNet50; ensemble voting; XGBoost; small pulmonary nodules; pulmonary cancer

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classification on Computed Tomography Images

Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classification on Computed Tomography Images

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classification on Computed Tomography Images

Machine Learning Model of ResNet50-Ensemble Voting for Malignant–Benign Small Pulmonary Nodule Classification on Computed Tomography Images

References (54)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies