PLOS ONE: [sortOrder=DATE_NEWEST_FIRST, sort=Date, newest first, q=subject:"Decision analysis"]PLOShttps://journals.plos.org/plosone/webmaster@plos.orgaccelerating the publication of peer-reviewed sciencehttps://journals.plos.org/plosone/search/feed/atom?sortOrder=DATE_NEWEST_FIRST&unformattedQuery=subject:%22Decision+analysis%22&sort=Date,+newest+firstAll PLOS articles are Open Access.https://journals.plos.org/plosone/resource/img/favicon.icohttps://journals.plos.org/plosone/resource/img/favicon.ico2024-03-28T14:29:02ZRandom forest model in tax risk identification of real estate enterprise income taxChunmei XuYan Kong10.1371/journal.pone.03009282024-03-26T14:00:00Z2024-03-26T14:00:00Z<p>by Chunmei Xu, Yan Kong</p>
The text describes improvements made to the random forest model to enhance its distinctiveness in addressing tax risks within the real estate industry, thereby tackling issues related to tax losses. Firstly, the paper introduces the potential application of the random forest model in identifying tax risks. Subsequently, the experimental analysis focuses on the selection of indicators for tax risk. Finally, the paper develops and utilizes actual taxpayer data to test a risk identification model, confirming its effectiveness. The experimental results indicate that the model’s output report includes basic taxpayer information, a summary of tax compliance risks, value-added tax refund situations, directions of suspicious items, and detailed information on common indicators. This paper comprehensively presents detailed taxpayer data, providing an intuitive understanding of tax-related risks. Additionally, the paper reveals the level of enterprise risk registration assessment, risk probability, risk value, and risk assessment ranking. Further analysis shows that enterprise risk points primarily exist in operating income, selling expenses, financial expenses, and total profit. Additionally, the results indicate significant differences between the model’s judgment values and declared values, especially in the high-risk probability of total operating income and profit. This implies a significant underreporting issue concerning corporate income tax for real estate enterprises. Therefore, this paper contributes to enhancing the identification of tax risks for real estate enterprises. Using the optimized random forest model makes it possible to accurately assess enterprises’ tax compliance risks and identify specific risk points.Color polymorphism and mating trends in a population of the alpine leaf beetle <i>Oreina gloriosa</i>Angela RoggeroDaniele AlùAlex LainiAntonio RolandoClaudia Palestrini10.1371/journal.pone.02983302024-03-26T14:00:00Z2024-03-26T14:00:00Z<p>by Angela Roggero, Daniele Alù, Alex Laini, Antonio Rolando, Claudia Palestrini</p>
The bright colors of Alpine leaf beetles (Coleoptera, Chrysomelidae) are thought to act as aposematic signals against predation. Within the European Alps, at least six species display a basal color of either blue or green, likely configuring a classic case of müllerian mimicry. In this context, intra-population color polymorphism is paradoxical as the existence of numerous color morphs might hamper the establishment of a search image in visual predators. Assortative mating may be one of the main factors contributing to the maintenance of polymorphic populations. Due to the marked iridescence of these leaf beetles, the perceived color may change as the viewing or illumination angle changes. The present study, conducted over three years, involved intensive sampling of a population of <i>Oreina gloriosa</i> from the Italian Alps and applied colorimetry and a decision tree method to identify the color morphs in an objective manner. The tertiary sex ratio of the population was biased in favor of males, suggesting that viviparous females hide to give birth. Seven color morphs were identified, and their frequencies varied significantly over the course of the study. Three different analyses of mating (JMating, QInfomating, and Montecarlo simulations) recognized a general trend for random mating which coexists with some instances of positive and negative assortative mating. This could help explain the pre-eminence of one morph (which would be favored because of positive selection due to positive assortative mating) in parallel with the persistence of six other morphs (maintained due to negative assortative mating).Machine learning-based prediction of rheumatoid arthritis with development of ACPA autoantibodies in the presence of non-HLA genes polymorphismsGrzegorz DudekSebastian SakowskiOlga BrzezińskaJoanna SarnikTomasz BudlewskiGrzegorz DraganMarta PoplawskaTomasz PoplawskiMichał BijakJoanna Makowska10.1371/journal.pone.03007172024-03-22T14:00:00Z2024-03-22T14:00:00Z<p>by Grzegorz Dudek, Sebastian Sakowski, Olga Brzezińska, Joanna Sarnik, Tomasz Budlewski, Grzegorz Dragan, Marta Poplawska, Tomasz Poplawski, Michał Bijak, Joanna Makowska</p>
Machine learning (ML) algorithms can handle complex genomic data and identify predictive patterns that may not be apparent through traditional statistical methods. They become popular tools for medical applications including prediction, diagnosis or treatment of complex diseases like rheumatoid arthritis (RA). RA is an autoimmune disease in which genetic factors play a major role. Among the most important genetic factors predisposing to the development of this disease and serving as genetic markers are HLA-DRB and non-HLA genes single nucleotide polymorphisms (SNPs). Another marker of RA is the presence of anticitrullinated peptide antibodies (ACPA) which is correlated with severity of RA. We use genetic data of SNPs in four non-HLA genes (PTPN22, STAT4, TRAF1, CD40 and PADI4) to predict the occurrence of ACPA positive RA in the Polish population. This work is a comprehensive comparative analysis, wherein we assess and juxtapose various ML classifiers. Our evaluation encompasses a range of models, including logistic regression, <i>k</i>-nearest neighbors, naïve Bayes, decision tree, boosted trees, multilayer perceptron, and support vector machines. The top-performing models demonstrated closely matched levels of accuracy, each distinguished by its particular strengths. Among these, we highly recommend the use of a decision tree as the foremost choice, given its exceptional performance and interpretability. The sensitivity and specificity of the ML models is about 70% that are satisfying. In addition, we introduce a novel feature importance estimation method characterized by its transparent interpretability and global optimality. This method allows us to thoroughly explore all conceivable combinations of polymorphisms, enabling us to pinpoint those possessing the highest predictive power. Taken together, these findings suggest that non-HLA SNPs allow to determine the group of individuals more prone to develop RA rheumatoid arthritis and further implement more precise preventive approach.Analysis of dermoscopy images of multi-class for early detection of skin lesions by hybrid systems based on integrating features of CNN modelsMohammed AlshahraniMohammed Al-JabbarEbrahim Mohammed SenanIbrahim Abdulrab AhmedJamil Abdulhamid Mohammed Saif10.1371/journal.pone.02983052024-03-21T14:00:00Z2024-03-21T14:00:00Z<p>by Mohammed Alshahrani, Mohammed Al-Jabbar, Ebrahim Mohammed Senan, Ibrahim Abdulrab Ahmed, Jamil Abdulhamid Mohammed Saif</p>
Skin cancer is one of the most fatal skin lesions, capable of leading to fatality if not detected in its early stages. The characteristics of skin lesions are similar in many of the early stages of skin lesions. The AI in categorizing diverse types of skin lesions significantly contributes to and helps dermatologists to preserve patients’ lives. This study introduces a novel approach that capitalizes on the strengths of hybrid systems of Convolutional Neural Network (CNN) models to extract intricate features from dermoscopy images with Random Forest (Rf) and Feed Forward Neural Networks (FFNN) networks, leading to the development of hybrid systems that have superior capabilities early detection of all types of skin lesions. By integrating multiple CNN features, the proposed methods aim to improve the robustness and discriminatory capabilities of the AI system. The dermoscopy images were optimized for the ISIC2019 dataset. Then, the area of the lesions was segmented and isolated from the rest of the image by a Gradient Vector Flow (GVF) algorithm. The first strategy for dermoscopy image analysis for early diagnosis of skin lesions is by the CNN-RF and CNN-FFNN hybrid models. CNN models (DenseNet121, MobileNet, and VGG19) receive a region of interest (skin lesions) and produce highly representative feature maps for each lesion. The second strategy to analyze the area of skin lesions and diagnose their type by means of CNN-RF and CNN-FFNN hybrid models based on the features of the combined CNN models. Hybrid models based on combined CNN features have achieved promising results for diagnosing dermoscopy images of the ISIC 2019 dataset and distinguishing skin cancers from other skin lesions. The Dense-Net121-MobileNet-RF hybrid model achieved an AUC of 95.7%, an accuracy of 97.7%, a precision of 93.65%, a sensitivity of 91.93%, and a specificity of 99.49%.Short-term power load forecasting method based on Bagging-stochastic configuration networksXinfu PangWei SunHaibo LiWei LiuChangfeng Luan10.1371/journal.pone.03002292024-03-19T14:00:00Z2024-03-19T14:00:00Z<p>by Xinfu Pang, Wei Sun, Haibo Li, Wei Liu, Changfeng Luan</p>
Accurate short-term load forecasting is of great significance in improving the dispatching efficiency of power grids, ensuring the safe and reliable operation of power grids, and guiding power systems to formulate reasonable production plans and reduce waste of resources. However, the traditional short-term load forecasting method has limited nonlinear mapping ability and weak generalization ability to unknown data, and it is prone to the loss of time series information, further suggesting that its forecasting accuracy can still be improved. This study presents a short-term power load forecasting method based on Bagging-stochastic configuration networks (SCNs). First, the missing values in the original data are filled with the average values. Second, the influencing factors, such as the weather- and week-type data, are coded. Then, combined with the data of influencing factors after coding, the Bagging-SCNs integration algorithm is used to predict the short-term load. Finally, by taking the daily load data of Quanzhou City, Zhejiang Province as an example, the program of the abovementioned method is compiled in Python language and then compared with the long short-term memory neural network algorithm and the single-SCNs algorithm. Simulation results show that the proposed method for medium- and short-term load forecasting has a high forecasting accuracy and a significant effect on improving the accuracy of load forecasting.Machine learning algorithm for ventilator mode selection, pressure and volume controlAnitha T.Gopu G.Arun Mozhi Devan P.Maher Assaad10.1371/journal.pone.02996532024-03-13T14:00:00Z2024-03-13T14:00:00Z<p>by Anitha T., Gopu G., Arun Mozhi Devan P., Maher Assaad</p>
Mechanical ventilation techniques are vital for preserving individuals with a serious condition lives in the prolonged hospitalization unit. Nevertheless, an imbalance amid the hospitalized people demands and the respiratory structure could cause to inconsistencies in the patient’s inhalation. To tackle this problem, this study presents an Iterative Learning PID Controller (ILC-PID), a unique current cycle feedback type controller that helps in gaining the correct pressure and volume. The paper also offers a clear and complete examination of the primarily efficient neural approach for generating optimal inhalation strategies. Moreover, machine learning-based classifiers are used to evaluate the precision and performance of the ILC-PID controller. These classifiers able to forecast and choose the perfect type for various inhalation modes, eliminating the likelihood that patients will require mechanical ventilation. In pressure control, the suggested accurate neural categorization exhibited an average accuracy rate of 88.2% in continuous positive airway pressure (CPAP) mode and 91.7% in proportional assist ventilation (PAV) mode while comparing with the other classifiers like ensemble classifier has reduced accuracy rate of 69.5% in CPAP mode and also 71.7% in PAV mode. An average accuracy of 78.9% rate in other classifiers compared to neutral network in CPAP. The neural model had an typical range of 81.6% in CPAP mode and 84.59% in PAV mode for 20 cm <i>H</i><sub>2</sub><i>O</i> of volume created by the neural network classifier in the volume investigation. Compared to the other classifiers, an average of 72.17% was in CPAP mode, and 77.83% was in PAV mode in volume control. Different approaches, such as decision trees, optimizable Bayes trees, naive Bayes trees, nearest neighbour trees, and an ensemble of trees, were also evaluated regarding the accuracy by confusion matrix concept, training duration, specificity, sensitivity, and F1 score.Combination prediction method of students’ performance based on ant colony algorithmHuan XuMin Kim10.1371/journal.pone.03000102024-03-11T14:00:00Z2024-03-11T14:00:00Z<p>by Huan Xu, Min Kim</p>
Students’ performance is an important factor for the evaluation of teaching quality in colleges. The prediction and analysis of students’ performance can guide students’ learning in time. Aiming at the low accuracy problem of single model in students’ performance prediction, a combination prediction method is put forward based on ant colony algorithm. First, considering the characteristics of students’ learning behavior and the characteristics of the models, decision tree (DT), support vector regression (SVR) and BP neural network (BP) are selected to establish three prediction models. Then, an ant colony algorithm (ACO) is proposed to calculate the weight of each model of the combination prediction model. The combination prediction method was compared with the single Machine learning (ML) models and other methods in terms of accuracy and running time. The combination prediction model with mean square error (MSE) of 0.0089 has higher performance than DT with MSE of 0.0326, SVR with MSE of 0.0229 and BP with MSE of 0.0148. To investigate the efficacy of the combination prediction model, other prediction models are used for a comparative study. The combination prediction model with MSE of 0.0089 has higher performance than GS-XGBoost with MSE of 0.0131, PSO-SVR with MSE of 0.0117 and IDA-SVR with MSE of 0.0092. Meanwhile, the running speed of the combination prediction model is also faster than the above three methods.Evaluation and estimation of compressive strength of concrete masonry prism using gradient boosting algorithmLanh Si HoVan Quan Tran10.1371/journal.pone.02973642024-03-05T14:00:00Z2024-03-05T14:00:00Z<p>by Lanh Si Ho, Van Quan Tran</p>
The compressive strength (CS) of the hollow concrete masonry prism is known as an important parameter for designing masonry structures. In general, the CS is determined using laboratory tests, however, laboratory tests are time-consuming and high-cost. Thus, it is necessary to evaluate and estimate the CS using different methods, for example, machine learning techniques. This study employed Gradient Boosting (GB) to evaluate and predict the CS of hollow masonry prism. The database consists of 102 hollow concrete specimens taken from different previous published literature used for modeling. The output is the CS of the hollow masonry prism, while the inputs include the compressive strength of mortar (f<sub>m</sub>), the compressive strength of blocks (f<sub>b</sub>), height-to-thickness ratio (h/t), the ratio of f<sub>m</sub>/f<sub>b</sub>. To reduce the overfitting problem, this study used K-Fold cross-validation, then particle swarm optimization (PSO) was employed to obtain the optimum hyperparameter. The GB model then was modeled using the optimum hyperparameters. The results showed that the GB model performed very well in evaluating and predicting the CS of the hollow masonry prims with a high prediction accuracy, the values of R<sup>2</sup>, RMSE, MAE, and MAPE are 0.977, 0.803 MPa, 0.612 MPa, and 0.036%, respectively. The performance of the GB model in this study outperformed in comparison to six different machine learning models (decision tree, linear regression, random forest regression, ridge regression, Artificial Neural network, and Extreme Gradient Boosting) used in previous studies. The results of sensitivity analysis using SHAP and PDP-2D indicate that the CS is strongly dependent on the f<sub>b</sub> (with a mean SHAP value of 3.2), h/t (with a mean SHAP value of 1.63), while the f<sub>m</sub>/f<sub>b</sub> (with a mean SHAP value of 0.57) had a small effect on the CS. Thus, it can be stated that this research provides a good method to evaluate and predict the CS of the hollow masonry prism, which can bring good knowledge for practical application in this field.Pro-cycling team cyclist assignment for an upcoming raceMaor SagiPaulo SaldanhaGuy ShaniRobert Moskovitch10.1371/journal.pone.02972702024-03-04T14:00:00Z2024-03-04T14:00:00Z<p>by Maor Sagi, Paulo Saldanha, Guy Shani, Robert Moskovitch</p>
Professional bicycle racing is a popular sport that has attracted significant attention in recent years. The evolution and ubiquitous use of sensors allow cyclists to measure many metrics including power, heart rate, speed, cadence, and more in training and racing. In this paper we explore for the first time assignment of a subset of a team’s cyclists to an upcoming race. We introduce RaceFit, a model that recommends, based on recent workouts and past assignments, cyclists for participation in an upcoming race. RaceFit consists of binary classifiers that are trained on pairs of a cyclist and a race, described by their relevant properties (features) such as the cyclist’s demographic properties, as well as features extracted from his workout data from recent weeks; as well additional properties of the race, such as its distance, elevation gain, and more. Two main approaches are introduced in recommending on each stage in a race and aggregate from it to the race, or on the entire race. The model training is based on binary label which represent participation of cyclist in a race (or in a stage) in past events. We evaluated RaceFit rigorously on a large dataset of three pro-cycling teams’ cyclists and race data achieving up to 80% precision@i. The first experiment had shown that using TP or STRAVA data performs the same. Then the best-performing parameters of the framework are using 5 weeks time window, imputation was effective, and the CatBoost classifier performed best. However, the model with any of the parameters performed always better than the baselines, in which the cyclists are assigned based on their popularity in historical data. Additionally, we present the top-ranked predictive features.Applicability of machine learning algorithm to predict the therapeutic intervention success in Brazilian smokersMiyoko MassagoMamoru MassagoPedro Henrique IoraSanderland José Tavares GurgelCelso Ivam ConegeroIdalina Diair Regla CarolinoMaria Muzanila MushiGiane Aparecida Chaves ForatoJoão Vitor Perez de SouzaThiago Augusto Hernandes RochaSamile BonfimCatherine Ann StatonOscar Kenji NiheiJoão Ricardo Nickenig VissociLuciano de Andrade10.1371/journal.pone.02959702024-03-04T14:00:00Z2024-03-04T14:00:00Z<p>by Miyoko Massago, Mamoru Massago, Pedro Henrique Iora, Sanderland José Tavares Gurgel, Celso Ivam Conegero, Idalina Diair Regla Carolino, Maria Muzanila Mushi, Giane Aparecida Chaves Forato, João Vitor Perez de Souza, Thiago Augusto Hernandes Rocha, Samile Bonfim, Catherine Ann Staton, Oscar Kenji Nihei, João Ricardo Nickenig Vissoci, Luciano de Andrade</p>
Smoking cessation is an important public health policy worldwide. However, as far as we know, there is a lack of screening of variables related to the success of therapeutic intervention (STI) in Brazilian smokers by machine learning (ML) algorithms. To address this gap in the literature, we evaluated the ability of eight ML algorithms to correctly predict the STI in Brazilian smokers who were treated at a smoking cessation program in Brazil between 2006 and 2017. The dataset was composed of 12 variables and the efficacies of the algorithms were measured by accuracy, sensitivity, specificity, positive predictive value (PPV) and area under the receiver operating characteristic curve. We plotted a decision tree flowchart and also measured the odds ratio (OR) between each independent variable and the outcome, and the importance of the variable for the best model based on PPV. The mean global values for the metrics described above were, respectively, 0.675±0.028, 0.803±0.078, 0.485±0.146, 0.705±0.035 and 0.680±0.033. Supporting vector machines performed the best algorithm with a PPV of 0.726±0.031. Smoking cessation drug use was the roof of decision tree with OR of 4.42 and importance of variable of 100.00. Increase in the number of relapses also promoted a positive outcome, while higher consumption of cigarettes resulted in the opposite. In summary, the best model predicted 72.6% of positive outcomes correctly. Smoking cessation drug use and higher number of relapses contributed to quit smoking, while higher consumption of cigarettes showed the opposite effect. There are important strategies to reduce the number of smokers and increase STI by increasing services and drug treatment for smokers.Stratum-specific health outcome estimation in Pakistan using double goal CARTMuhammad HamzaShakeel Ahmed10.1371/journal.pone.02947362024-02-29T14:00:00Z2024-02-29T14:00:00Z<p>by Muhammad Hamza, Shakeel Ahmed</p>
Post-stratification is applied when the subpopulation membership is observed only for sampled values and the goal is to estimate stratum-specific parameters which leads the survey statisticians towards primary goals i.e., classification of non-sampled units into different strata and prediction of the values of the study variables. Regression models, on one side, optimize the prediction of the study variable’s non-sampled values while the classification algorithms, on the other side, look for the classification of non-sampled cases into different strata. Hence, it is crucial to deal with these two goals simultaneously for the estimation of stratum-specific parameters. This study introduces the idea of a double-objective classification and regression trees (CARTs) approach for estimating stratum-specific parameters. Theoretical properties of the total estimator are derived. An application on the estimation of health outcomes in different domains is given to delineate the practical significance as well as the efficiency of the proposed CART-based method. The proposed estimator of population total performs better than the existing stratum-specific estimator in terms of relative efficiency for all choices of parameters. As an ensemble model, the random forest CART outperforms the other competing tree-based models and homogenous population model without using any auxiliary variable.Significant duration prediction of seismic ground motions using machine learning algorithmsXinle LiPei Gao10.1371/journal.pone.02996392024-02-28T14:00:00Z2024-02-28T14:00:00Z<p>by Xinle Li, Pei Gao</p>
This study aims to predict the significant duration (<i>D</i><sub>5-75</sub>, <i>D</i><sub>5-95</sub>) of seismic motion by employing machine learning algorithms. Based on three parameters (moment magnitude, fault distance, and average shear wave velocity), two additional parameters(fault top depth and epicenter mechanism parameters) were introduced in this study. The XGBoost algorithm is utilized for characteristic parameter optimization analysis to obtain the optimal combination of four parameters. We compare the prediction results of four machine learning algorithms (random forest, XGBoost, BP neural network, and SVM) and develop a new method of significant duration prediction by constructing two fusion models (stacking and weighted averaging). The fusion model demonstrates an improvement in prediction accuracy and generalization ability of the significant duration when compared to single algorithm models based on evaluation indicators and residual values. The accuracy and rationality of the fusion model are validated through comparison with existing research.Diagnostic performance of the WHO definition of probable dengue within the first 5 days of symptoms on Reunion IslandYves Marie DiarraOlivier MaillardAdrien VagueBertrand GuihardPatrick GérardinAntoine Bertolotti10.1371/journal.pone.02952602024-02-15T14:00:00Z2024-02-15T14:00:00Z<p>by Yves Marie Diarra, Olivier Maillard, Adrien Vague, Bertrand Guihard, Patrick Gérardin, Antoine Bertolotti</p>
The relevance of the World Health Organization (WHO) criteria for defining probable dengue had not yet been evaluated in the context of dengue endemicity on Reunion Island. The objective of this retrospective diagnostic study was to evaluate the diagnostic performance of the 2009 WHO definition of probable dengue and to propose an improvement thereof. From the medical database, we retrieved the data of subjects admitted to the emergency department of the University Hospital of Reunion Island in 2019 with suspected dengue fever (DF) within a maximum of 5 days post symptom onset, and whose diagnosis was confirmed by a Reverse Transcriptase Polymerase Chain Reaction (RT-PCR). The intrinsic characteristics of probable dengue definitions were reported in terms of sensitivity, specificity, positive and negative likelihood ratios (LR+ and LR-), using RT-PCR as the gold standard. Of the 1,181 subjects who exhibited a positive RT-PCR, 652 (55%) were classified as probable dengue. The WHO definition of probable dengue yielded a sensitivity of 64% (95%CI 60–67%), a specificity of 57% (95%CI 52–61%), a LR+ of 1.49 (95%CI 1.33–1.67), and a LR- of 0.63 (95%CI 0.56–0.72). The sensitivity and LR- for diagnosing and ruling out probable dengue could be improved by the addition of lymphopenia on admission (74% [95%CI: 71–78%] and 0.54 [95%CI: 0.46–0.63] respectively), at the cost of slight reductions of specificity and LR+ (48% [95%CI: 44–53%] and 1.42 [95%CI: 1.29–1.57], respectively). In the absence of, or when rapid diagnostic testing is unreliable, the use of the improved 2009 WHO definition of probable dengue could facilitate the identification of subjects who require further RT-PCR testing, which should encourage the development of patient management, while also optimizing the count and quarantine of cases, and guiding disease control.Ensemble learning based transmission line fault classification using phasor measurement unit (PMU) data with explainable AI (XAI)Simon Bin AkterTanmoy Sarkar PiasShohana Rahman DeebaJahangir HossainHafiz Abdur Rahman10.1371/journal.pone.02951442024-02-12T14:00:00Z2024-02-12T14:00:00Z<p>by Simon Bin Akter, Tanmoy Sarkar Pias, Shohana Rahman Deeba, Jahangir Hossain, Hafiz Abdur Rahman</p>
A large volume of data is being captured through the Phasor Measurement Unit (PMU), which opens new opportunities and challenges to the study of transmission line faults. To be specific, the Phasor Measurement Unit (PMU) data represents many different states of the power networks. The states of the PMU device help to identify different types of transmission line faults. For a precise understanding of transmission line faults, only the parameters that contain voltage and current magnitude estimations are not sufficient. This requirement has been addressed by generating data with more parameters such as frequencies and phase angles utilizing the Phasor Measurement Unit (PMU) for data acquisition. The data has been generated through the simulation of a transmission line model on ePMU DSA tools and Matlab Simulink. Different machine learning models have been trained with the generated synthetic data to classify transmission line fault cases. The individual models including Decision Tree (DT), Random Forest (RF), and K-Nearest Neighbor (K-NN) have outperformed other models in fault classification which have acquired a cross-validation accuracy of 99.84%, 99.83%, and 99.76% respectively across 10 folds. Soft voting has been used to combine the performance of these best-performing models. Accordingly, the constructed ensemble model has acquired a cross-validation accuracy of 99.88% across 10 folds. The performance of the combined models in the ensemble learning process has been analyzed through explainable AI (XAI) which increases the interpretability of the input parameters in terms of making predictions. Consequently, the developed model has been evaluated with several performance matrices, such as precision, recall, and f1 score, and also tested on the IEEE 14 bus system. To sum up, this article has demonstrated the classification of six scenarios including no fault and fault cases from transmission lines with a significant number of training parameters and also interpreted the effect of each parameter to make predictions of different fault cases with great success.Machine learning-based identification of contrast-enhancement phase of computed tomography scansSiddharth GuhaAbdalla IbrahimQian WuPengfei GengYen ChouHao YangJingchen MaLin LuDelin WangLawrence H. SchwartzChuan-miao XieBinsheng Zhao10.1371/journal.pone.02945812024-02-02T14:00:00Z2024-02-02T14:00:00Z<p>by Siddharth Guha, Abdalla Ibrahim, Qian Wu, Pengfei Geng, Yen Chou, Hao Yang, Jingchen Ma, Lin Lu, Delin Wang, Lawrence H. Schwartz, Chuan-miao Xie, Binsheng Zhao</p>
Contrast-enhanced computed tomography scans (CECT) are routinely used in the evaluation of different clinical scenarios, including the detection and characterization of hepatocellular carcinoma (HCC). Quantitative medical image analysis has been an exponentially growing scientific field. A number of studies reported on the effects of variations in the contrast enhancement phase on the reproducibility of quantitative imaging features extracted from CT scans. The identification and labeling of phase enhancement is a time-consuming task, with a current need for an accurate automated labeling algorithm to identify the enhancement phase of CT scans. In this study, we investigated the ability of machine learning algorithms to label the phases in a dataset of 59 HCC patients scanned with a dynamic contrast-enhanced CT protocol. The ground truth labels were provided by expert radiologists. Regions of interest were defined within the aorta, the portal vein, and the liver. Mean density values were extracted from those regions of interest and used for machine learning modeling. Models were evaluated using accuracy, the area under the curve (AUC), and Matthew’s correlation coefficient (MCC). We tested the algorithms on an external dataset (76 patients). Our results indicate that several supervised learning algorithms (logistic regression, random forest, etc.) performed similarly, and our developed algorithms can accurately classify the phase of contrast enhancement.