DOI QR코드

DOI QR Code

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil (Department of Information Technology, Malaysia University of Science and Technology) ;
  • Sellappan, Palaniappan (Department of Information Technology, Malaysia University of Science and Technology) ;
  • Sanjoy Kumar, Debnath (Chitkara University Institute of Engineering and Technology, Chitkara University) ;
  • Muhammad, Naseem (Department of Software Engineering, Syed University of Engineering and Technology) ;
  • Susama, Bagchi (Chitkara University Institute of Engineering and Technology, Chitkara University) ;
  • Asiah, Lokman (Department of Information Technology, Malaysia University of Science and Technology)
  • Received : 2023.01.05
  • Published : 2023.01.30

Abstract

Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

Keywords

Acknowledgement

We would like to express our sincere appreciation to the NHS Liverpool hospital in England for making the dataset available to us to enable us to conduct this study. We would like to extend our sincere gratitude to Dr. Shakil Ahmed, a consultant surgeon at the Royal Liverpool hospital.

References

  1. D. Jamil, "Diagnosis of Gastric Cancer Using Machine Learning Techniques in Healthcare Sector: A Survey," Informatica, vol. 45, 2022, doi: 10.31449/inf.v45i7.3633.
  2. L. Goshayeshi et al., "Predictive model for survival in patients with gastric cancer," Electron. physician, vol. 9, no. 12, p. 6035, 2017.
  3. A. Mortezagholi, O. Khosravizadehorcid, M. B. Menhaj, Y. Shafigh, and R. Kalhor, "Make intelligent of gastric cancer diagnosis error in Qazvin's medical centers: Using data mining method," Asian Pacific J. Cancer Prev., vol. 20, no. 9, pp. 2607-2610, 2019, doi: 10.31557/APJCP.2019.20.9.2607.
  4. M. S. Mohammad Reza Afrash and and H. Kazemi-Arpanahi, "Design and Development of an Intelligent System for Predicting 5-Year Survival in Gastric Cancer," Clin. Med. Insights Oncol., vol. 16, no. 1, pp. 1-13, 2022, doi: DOI:10.1177/11795549221116833.
  5. S. Bagchi, K. G. Tay, A. Huong, and S. K. Debnath, "Image processing and machine learning techniques used in compute-raided detection system for mammogram screening-A review," Int. J. Electr. Comput. Eng., vol. 10, no. 3, p. 2336, 2020. https://doi.org/10.11591/ijece.v10i3.pp3116-3124
  6. S. S. Z. Danish Jamil, Sellappan Palaniappan, Asiah Lokman, Muhammad Naseem, "Diagnosis of Gastric Cancer Using Machine Learning Techniques in Healthcare Sector: A Survey," Informatica, 2022.
  7. D. Jamil, S. Palaniappan, S. S. Zia, A. Lokman, and M. Naseem, "Reducing the Risk of Gastric Cancer Through Proper Nutrition-A Meta-Analysis.," Int. J. Online \& Biomed. Eng., vol. 18, no. 7, 2022.
  8. S. Shilaskar, A. Ghatol, and P. Chatur, "Medical decision support system for extremely imbalanced datasets," Inf. Sci. (Ny)., vol. 384, pp. 205-219, 2017. https://doi.org/10.1016/j.ins.2016.08.077
  9. A. Mahani and A. R. B. Ali, "Classification problem in imbalanced datasets," Recent Trends Comput. Intell., pp. 1-23, 2019.
  10. S. S. ZIA, P. AKHTAR, and T. J. A. MUGHAL, "Case Retrieval Process of CBR Technique Implements on Knowledge-Based Clinical Decision Support Systems (KBCDSS) for Diagnosis of Breast Cancer Disease," Sindh Univ. Res. Journal-SURJ (Science Ser., vol. 47, no. 2, 2015.
  11. C. Mazo, C. Aura, A. Rahman, W. M. Gallagher, and C. Mooney, "Application of Artificial Intelligence Techniques to Predict Risk of Recurrence of Breast Cancer: A Systematic Review," J. Pers. Med., vol. 12, no. 9, p. 1496, 2022.
  12. J. Yoon, J. Lee, S. Park, W. J. Hyung, and M.-K. Choi, "Semisupervised learning for instrument detection with a class imbalanced dataset," in Interpretable and Annotation-Efficient Learning for Medical Image Computing, Springer, 2020, pp.266-276.
  13. P. Melek Akcay, MD, Durmus Etiz, MD, and Ozer Celik, "Prediction of Survival and Recurrence Patterns by Machine Learning in Gastric Cancer Cases Undergoing Radiation Therapy and Chemotherapy," Adv. Radiat. Oncol.
  14. S. A. Mahmoodi, K. Mirzaie, M. S. Mahmoodi, and S. M. Mahmoudi, "A medical decision support system to assess risk factors for gastric cancer based on fuzzy cognitive map," Comput. Math. Methods Med., vol. 2020, 2020.
  15. M. A. Mohammed et al., "Retraction Note: Decision support system for nasopharyngeal carcinoma discrimination from endoscopic images using artificial neural network," J. Supercomput., pp. 1-2, 2022.
  16. S. A. Mahmoodi, K. Mirzaie, and S. M. Mahmoudi, "A new algorithm to extract hidden rules of gastric cancer data based on ontology," Springerplus, vol. 5, no. 1, p. 312, 2016.
  17. P. Sahu, P. K. Sarangi, S. K. Mohapatra, and B. K. Sahoo, "Detection and Classification of Encephalon Tumor Using Extreme Learning Machine Learning Algorithm Based on Deep Learning Method," in Biologically Inspired Techniques in Many Criteria Decision Making, Springer, 2022, pp. 285-295.
  18. J. Yuan, Q. Wang, Z. Li, C. Dong, P. Zhang, and X. Ding, "Domain-knowledge-oriented data pre-processing and machine learning of corrosion-resistant $γ$-U alloys with a small database," Comput. Mater. Sci., vol. 194, p. 110472, 2021.
  19. V. et al Lysaght, T., Lim, H.Y., Xafis, "AI-Assisted Decisionmaking in Healthcare," Asian Bioeth. Rev., no. 11, pp. 299-314, 2019, doi: https://doi.org/10.1007/s41649-019-00096-0.
  20. K. J. Cios, B. Krawczyk, J. Cios, and K. J. Staley, "Uniqueness of Medical Data Mining: How the new technologies and data they generate are transforming medicine," arXiv Prepr. arXiv1905.09203, 2019.
  21. L. M. Terracciano et al., "Opportunities and Challenges for Machine Learning in Rare Diseases," Front. Med. | www.frontiersin.org, vol. 8, p. 747612, 2021, doi:10.3389/fmed.2021.747612.
  22. W. Albattah, R. U. Khan, M. F. Alsharekh, and S. F. Khasawneh, "Feature Selection Techniques for Big Data Analytics," Electronics, vol. 11, no. 19, p. 3177, 2022.
  23. J. Yang, J. Zhou, Z. Zhu, X. Ma, and Z. Ji, "Iterative ensemble feature selection for multiclass classification of imbalanced microarray data," J. Biol. Res., vol. 23, no. 1, pp. 1-9, 2016. https://doi.org/10.1186/s40709-016-0045-8
  24. R. Chauhan, R. Jangade, and R. Rekapally, "Classification model for prediction of heart disease," in Soft Computing: Theories and Applications, Springer, 2018, pp. 707-714.
  25. J. L. Leevy, T. M. Khoshgoftaar, R. A. Bauder, and N. Seliya, "A survey on addressing high-class imbalance in big data," J. Big Data, vol. 5, no. 1, pp. 1-30, 2018. https://doi.org/10.1186/s40537-017-0110-7
  26. H. et al. Iqbal, M.J., Javed, Z., Sadia, "Clinical applications of artificial intelligence and machine learning in cancer diagnosis: looking into the future," Cancer Cell Int 21, vol. 270, 2021.
  27. S. Sharma, A. Gosain, and S. Jain, "A Review of the Oversampling Techniques in Class Imbalance Problem," in International Conference on Innovative Computing and Communications, 2022, pp. 459-472.
  28. N. V Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer, "SMOTEBoost: Improving prediction of the minority class in boosting," in European conference on principles of data mining and knowledge discovery, 2003, pp. 107-119.
  29. J. Zhang, L. Chen, and F. Abid, "Prediction of breast cancer from imbalance respect using cluster-based undersampling method," J. Healthc. Eng., vol. 2019, 2019.
  30. Z. Z. R. Al-Shamaa, S. Kurnaz, A. D. Duru, N. Peppa, A. H. Mirnezami, and Z. Z. R. Hamady, "The use of hellinger distance undersampling model to improve the classification of disease class in imbalanced medical datasets," Appl. Bionics Biomech., vol. 2020, 2020.
  31. R. Raja, K. K. Nagwanshi, S. Kumar, and K. R. Laxmi, Data Mining and Machine Learning Applications. John Wiley \& Sons, 2022.
  32. S. K. Mohapatra, R. K. Kanna, G. Arora, P. K. Sarangi, J. Mohanty, and P. Sahu, "Systematic Stress Detection in CNN Application," in 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), 2022, pp. 1-4.
  33. D.-C. Li, C.-W. Liu, and S. C. Hu, "A learning method for the class imbalance problem with medical data sets," Comput. Biol. Med., vol. 40, no. 5, pp. 509-518, 2010. https://doi.org/10.1016/j.compbiomed.2010.03.005
  34. C.-C. Kuo, H.-H. Wang, and L.-P. Tseng, "Using data mining technology to predict medication-taking behaviour in women with breast cancer: A retrospective study," Nurs. Open, vol. 9, no. 6, pp. 2646-2656, 2022. https://doi.org/10.1002/nop2.963
  35. A. Sheidaei, A. R. Foroushani, K. Gohari, and H. Zeraati, "A novel dynamic Bayesian network approach for data mining and survival data analysis," BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1-15, 2022. https://doi.org/10.1186/s12911-021-01695-4
  36. P.-H. Niu, L.-L. Zhao, H.-L. Wu, D.-B. Zhao, and Y.-T. Chen, "Artificial intelligence in gastric cancer: Application and future perspectives," World J. Gastroenterol., vol. 26, no. 36, p. 5408, 2020. https://doi.org/10.3748/wjg.v26.i36.5498
  37. R. Hasan, S. Palaniappan, A. R. A. Raziff, S. Mahmood, and K. U. Sarker, "Student Academic Performance Prediction by using Decision Tree Algorithm," in 2018 4th International Conference on Computer and Information Sciences (ICCOINS), Aug. 2018, pp. 1-5, doi: 10.1109/ICCOINS.2018.8510600.
  38. Q. Gu, J. Tian, X. Li, and S. Jiang, "A novel Random Forest integrated model for imbalanced data classification problem," Knowledge-Based Syst., p. 109050, 2022.
  39. M. Das and R. Dash, "A Comparative Study on Performance of Classification Algorithms for Breast Cancer Data Set Using WEKA Tool," in Intelligent Systems, Springer, 2022, pp. 289-297.