DOI QR코드

DOI QR Code

Intelligent System for the Prediction of Heart Diseases Using Machine Learning Algorithms with Anew Mixed Feature Creation (MFC) technique

  • Received : 2023.05.05
  • Published : 2023.05.30

Abstract

Classification systems can significantly assist the medical sector by allowing for the precise and quick diagnosis of diseases. As a result, both doctors and patients will save time. A possible way for identifying risk variables is to use machine learning algorithms. Non-surgical technologies, such as machine learning, are trustworthy and effective in categorizing healthy and heart-disease patients, and they save time and effort. The goal of this study is to create a medical intelligent decision support system based on machine learning for the diagnosis of heart disease. We have used a mixed feature creation (MFC) technique to generate new features from the UCI Cleveland Cardiology dataset. We select the most suitable features by using Least Absolute Shrinkage and Selection Operator (LASSO), Recursive Feature Elimination with Random Forest feature selection (RFE-RF) and the best features of both LASSO RFE-RF (BLR) techniques. Cross-validated and grid-search methods are used to optimize the parameters of the estimator used in applying these algorithms. and classifier performance assessment metrics including classification accuracy, specificity, sensitivity, precision, and F1-Score, of each classification model, along with execution time and RMSE the results are presented independently for comparison. Our proposed work finds the best potential outcome across all available prediction models and improves the system's performance, allowing physicians to diagnose heart patients more accurately.

Keywords

References

  1. J. Ross, Q. Morgan, and K. Publishers, "Book Review : C4 . 5 : Programs for Machine Learning," vol. 240, pp. 235-240, 1994.
  2. T. Shaikhina, D. Lowe, S. Daga, D. Briggs, R. Higgins, and N. Khovanova, "Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation," Biomed. Signal Process. Control, vol. 52, pp. 456-462, 2019, doi:10.1016/j.bspc.2017.01.012.
  3. R. Chandra, K. Chaudhary, and A. Kumar, "The Combination and Comparison of Neural Networks with Decision Trees for Wine Classification," no. January, pp. 10-17, 2007.
  4. M. I. Al-janabi, M. H. Qutqut, and M. Hijjawi, "Machine Learning Classification Techniques for Heart Disease Prediction : A Machine learning classification techniques for heart disease prediction : a review," no. October, 2018, doi: 10.14419/ijet.v7i4.28646.
  5. N. Kausar, S. Palaniappan, B. B. Samir, A. Abdullah, and N. Dey, "Systematic analysis of applied data mining based optimization algorithms in clinical attribute extraction and classification for diagnosis of cardiac patients," in Applications of intelligent optimization in biology and medicine, Springer, 2016, pp. 217-231.
  6. Y. Khan, U. Qamar, N. Yousaf, and A. Khan, "Machine Learning Techniques for Heart Disease Datasets : A Survey," Assoc. Comput. Mach. ACM ISBN, pp. 27-35, 2019.
  7. T. Mythili, D. Mukherji, N. Padalia, and A. Naidu, "A Heart Disease Prediction Model using SVM-Decision Trees-Logistic Regression ( SDL )," Int. J. Comput. Appl. (0975, vol. 68, no. 16, pp. 11-15, 2013. https://doi.org/10.5120/11662-7250
  8. S. Ghwanmeh, A. Mohammad, and A. Al-Ibrahim, "Innovative Artificial Neural Networks-Based Decision Support System for Heart Diseases Diagnosis," J. Intell. Learn. Syst. Appl., vol. 05, no. 03, pp. 176-183, 2013, doi: 10.4236/jilsa.2013.53019.
  9. Q. K. Al-Shayea, "Artificial neural networks in medical diagnosis," Int. J. Comput. Sci. Issues, vol. 8, no. 2, pp. 150-154, 2011.
  10. A. U. Haq, J. P. Li, M. H. Memon, S. Nazir, and R. Sun, "A Hybrid Intelligent System Framework for the Prediction of Heart Disease Using Machine Learning Algorithms," Hindawi Mob. Inf. Syst. accuracy, vol. 2018, 2018.
  11. K. Vanisree and J. Singaraju, "Decision support system for congenital heart disease diagnosis based on signs and symptoms using neural networks," Int. J. Comput. Appl., vol. 19, no. 6, pp. 6-12, 2011. https://doi.org/10.5120/2368-3115
  12. S. M. Saqlain et al., "Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines," Knowl. Inf. Syst., vol. 58, no. 1, pp. 139-167, 2019. https://doi.org/10.1007/s10115-018-1185-y
  13. S. Mohan, C. Thirumalai, and G. Srivastava, "Effective heart disease prediction using hybrid machine learning techniques," IEEE access, vol. 7, pp. 81542-81554, 2019. https://doi.org/10.1109/ACCESS.2019.2923707
  14. L. Ali et al., "An optimized stacked support vector machines based expert system for the effective prediction of heart failure," IEEE Access, vol. 7, pp. 54007-54014, 2019. https://doi.org/10.1109/ACCESS.2019.2909969
  15. M. S. Amin, Y. K. Chiam, and K. D. Varathan, "Identification of significant features and data mining techniques in predicting heart disease," Telemat. Informatics, vol. 36, pp. 82-93, 2019. https://doi.org/10.1016/j.tele.2018.11.007
  16. D. Singh and J. S. Samagh, "A comprehensive review of heart disease prediction using machine learning," J. Crit. Rev., vol. 7, no. 12, pp. 281-285, 2020. https://doi.org/10.31838/jcr.07.12.63
  17. F. Z. Abdeldjouad, M. Brahami, and N. Matta, "A hybrid approach for heart disease diagnosis and prediction using machine learning techniques," in International conference on smart homes and health telematics, 2020, pp. 299-306.
  18. M. Tarawneh and O. Embarak, "Hybrid approach for heart disease prediction using data mining techniques," in International Conference on Emerging Internetworking, Data & Web Technologies, 2019, pp. 447-454.
  19. C. B. C. Latha and S. C. Jeeva, "Informatics in Medicine Unlocked Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques," Informatics Med. Unlocked, vol. 16, no. July, p. 100203, 2019, doi: 10.1016/j.imu.2019.100203.
  20. S. Moturi, S. Vemuru, and S. N. Tirumala Rao, "Classification model for prediction of heart disease using correlation coefficient technique," Int. J. Adv. Trends Comput. Sci. Eng., vol. 9, no. 2, pp. 2116-2123, 2020, doi: 10.30534/ijatcse/2020/185922020.
  21. J. Chen, G. Xi, Y. Xing, J. Chen, and J. Wang, "Predicting Syndrome by NEI Specifications : A Comparison of Five Data Mining Algorithms in Coronary Heart Disease," pp. 129-130.
  22. A. Lakshmanarao, Y. Swathi, and P. S. S. Sundareswar, "Machine Learning Techniques For Heart Disease Prediction," Int. J. Sci. Technol. Res., vol. 8, no. 11, pp. 1-4, 2019.
  23. M. Tayefi et al., "hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm," Comput. Methods Programs Biomed., vol. 141, pp. 105-109, 2017, doi: https://doi.org/10.1016/j.cmpb.2017.02.001.
  24. F. S. Alotaibi, "Implementation of machine learning model to predict heart failure disease," Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 6, pp. 261-268, 2019. https://doi.org/10.14569/IJACSA.2019.0100637
  25. A. Sabay, L. Harris, and K. Jaceldo-siegl, "Overcoming Small Data Limitations in Heart Disease Prediction by Using Surrogate Data," vol. 1, no. 3, 2018.
  26. S. I. Sherly, "An Ensemble Basedheart Disease Predictionusing Gradient Boosting Decision Tree," Turkish J. Comput. Math. Educ., vol. 12, no. 10, pp. 3648-3660, 2021.
  27. V. Chaurasia and S. Pal, "Early prediction of heart diseases using data mining techniques," Caribb. J. Sci. Technol., vol. 1, pp. 208-217, 2013.
  28. K. Saxena and R. Sharma, "Efficient heart disease prediction system," Procedia Comput. Sci., vol. 85, pp. 962-969, 2016. https://doi.org/10.1016/j.procs.2016.05.288
  29. S. M. S. Shah, S. Batool, I. Khan, M. U. Ashraf, S. H. Abbas, and S. A. Hussain, "Feature extraction through parallel probabilistic principal component analysis for heart disease diagnosis," Phys. A Stat. Mech. its Appl., vol. 482, pp. 796-807, 2017. https://doi.org/10.1016/j.physa.2017.04.113
  30. J. Vijayashree and H. P. Sultana, "A machine learning framework for feature selection in heart disease classification using improved particle swarm optimization with support vector machine classifier," Program. Comput. Softw., vol. 44, no. 6, pp. 388-397, 2018. https://doi.org/10.1134/S0361768818060129
  31. A. Gupta, S. Member, and R. Kumar, "MIFH : A Machine Intelligence Framework for Heart Disease Diagnosis," no. Ml, pp. 14659-14674, 2020.
  32. P. Ghosh et al., "Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques," IEEE Access, vol. 9, pp. 19304-19326, 2021, doi: 10.1109/ACCESS.2021.3053759.
  33. M. Fatima and M. Pasha, "Survey of Machine Learning Algorithms for Disease Diagnostic," J. Intell. Learn. Syst. Appl., vol. 09, no. 01, pp. 1-16, 2017, doi: 10.4236/jilsa.2017.91001.
  34. D. H. Maulud and A. M. Abdulazeez, "A Review on Linear Regression Comprehensive in Machine Learning," vol. 01, no. 04, pp. 140-147, 2020, doi: 10.38094/jastt1457.
  35. R. Elarabi, F. Alqahtani, A. Balobaid, H. Zain, and N. Babiker, "COVID-19 Analysis and Predictions Evaluation for KSA Using Machine Learning," in 2021 International Conference on Software Engineering & Computer Systems and 4th International Conference on Computational Science and Information Management (ICSECS-ICOCSIM), 2021, pp. 261-266.
  36. R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
  37. J. P. Dominguez-Morales, A. F. Jimenez-Fernandez, M. J. Dominguez-Morales, and G. Jimenez-Moreno, "Deep neural networks for the recognition and classification of heart murmurs using neuromorphic auditory sensors," IEEE Trans. Biomed. Circuits Syst., vol. 12, no. 1, pp. 24-34, 2017. https://doi.org/10.1109/TBCAS.2017.2751545
  38. "Heart Disease UCI exp," UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Heart+Disease, (accessed Jul. 25, 2018).
  39. F. Developers, "feature _ engine Documentation," 2021.
  40. M. Umair et al., "Main Path Analysis to Filter Unbiased Literature," Intelligent Automation \& Soft Computing , vol. 32, no. 2. 2022, doi: 10.32604/iasc.2022.018952.
  41. A. Acharya, "Comparative study of machine learning algorithms for heart disease prediction," 2017.
  42. R. Tibshirani, "Regression shrinkage and selection via the lasso: a retrospective," J. R. Stat. Soc. Ser. B (Statistical Methodol., vol. 73, no. 3, pp. 273-282, 2011. https://doi.org/10.1111/j.1467-9868.2011.00771.x
  43. C. Zhou and A. Wieser, "Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity," in LBS 2018: 14th International Conference on Location Based Services, 2018, pp. 71-87.
  44. P. M. Granitto, C. Furlanello, F. Biasioli, and F. Gasperi, "Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products," Chemom. Intell. Lab. Syst., vol. 83, no. 2, pp. 83-90, 2006. https://doi.org/10.1016/j.chemolab.2006.01.007
  45. X. Lin et al., "A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection," Metabolomics, vol. 7, no. 4, pp. 549-558, 2011. https://doi.org/10.1007/s11306-011-0274-7
  46. F. E. Harrell, "Ordinal logistic regression," in Regression modeling strategies, Springer, 2015, pp. 311-325.
  47. K. Larsen, J. H. Petersen, E. Budtz-Jorgensen, and L. Endahl, "Interpreting parameters in the logistic regression model with random effects," Biometrics, vol. 56, no. 3, pp. 909-914, 2000. https://doi.org/10.1111/j.0006-341X.2000.00909.x
  48. V. Vapnik, The nature of statistical learning theory. Springer science & business media, 1999.
  49. P. Ghosh, M. Z. Hasan, and M. I. Jabiullah, "A comparative study of machine learning approaches on dataset to predicting cancer outcome," J. Bangladesh Electron. Soc., vol. 18, no. 1-2, pp. 81-86, 2018.
  50. X. Wu et al., "Top 10 algorithms in data mining," Knowl. Inf. Syst., vol. 14, no. 1, pp. 1-37, 2008. https://doi.org/10.1007/s10115-007-0114-2
  51. F. M. J. M. Shamrat, Z. Tasnim, P. Ghosh, A. Majumder, and M. Z. Hasan, "Personalization of job circular announcement to applicants using decision tree classification algorithm," in 2020 IEEE International Conference for Innovation in Technology (INOCON), 2020, pp. 1-5.
  52. K. Fawagreh, M. M. Gaber, and E. Elyan, "Random forests: from early developments to recent advancements," Syst. Sci. Control Eng. An Open Access J., vol. 2, no. 1, pp. 602-609, 2014. https://doi.org/10.1080/21642583.2014.956265
  53. R. Banerjee et al., "Time-frequency analysis of phonocardiogram for classifying heart disease," in 2016 Computing in Cardiology Conference (CinC), 2016, pp. 573-576.
  54. A. E. Karrar, "Investigate the Ensemble Model by Intelligence Analysis to Improve the Accuracy of the Classification Data in the Diagnostic and Treatment Interventions for Prostate Cancer," vol. 13, no. 1, pp. 181-188, 2022. https://doi.org/10.14569/IJACSA.2022.0130122
  55. A. E. Karrar, "A Proposed Model for Improving the Performance of Knowledge Bases in Real-World Applications by Extracting Semantic Information," Int. J. Adv. Comput. Sci. Appl., vol. 13, no. 2, pp. 116-123, 2022, doi: 10.14569/IJACSA.2022.0130214.
  56. Z. Zhang, "Introduction to machine learning: k-nearest neighbors," Ann. Transl. Med., vol. 4, no. 11, 2016.
  57. P. Cunningham and S. J. Delany, "k-Nearest neighbour classifiers-A Tutorial," ACM Comput. Surv., vol. 54, no. 6, pp. 1-25, 2021. https://doi.org/10.1145/3459665
  58. R. Rojas, "AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting," Freie Univ. Berlin, Tech. Rep, 2009.
  59. K. W. Walker and Z. Jiang, "Application of adaptive boosting (AdaBoost) in demand-driven acquisition (DDA) prediction: A machine-learning approach," J. Acad. Librariansh., vol. 45, no. 3, pp. 203-212, 2019. https://doi.org/10.1016/j.acalib.2019.02.013
  60. C. Bent, A. Cs, and G. Mart, "A Comparative Analysis of XGBoost arXiv : 1911 . 01914v1 [ cs . LG ] 5 Nov 2019," pp. 1-20.
  61. L. Bottou, "Stochastic gradient descent tricks," in Neural networks: Tricks of the trade, Springer, 2012, pp. 421-436.
  62. Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller, "Efficient backprop," in Neural networks: Tricks of the trade, Springer, 2012, pp. 9-48.