DOI QR코드

DOI QR Code

Hepatitis C Stage Classification with hybridization of GA and Chi2 Feature Selection

  • Received : 2021.12.05
  • Published : 2022.01.30

Abstract

In metaheuristic algorithms such as Genetic Algorithm (GA), initial population has a significant impact as it affects the time such algorithm takes to obtain an optimal solution to the given problem. In addition, it may influence the quality of the solution obtained. In the machine learning field, feature selection is an important process to attaining a good performance model; Genetic algorithm has been utilized for this purpose by scientists. However, the characteristics of Genetic algorithm, namely random initial population generation from a vector of feature elements, may influence solution and execution time. In this paper, the use of a statistical algorithm has been introduced (Chi2) for feature relevant checks where p-values of conditional independence were considered. Features with low p-values were discarded and subject relevant subset of features to Genetic Algorithm. This is to gain a level of certainty of the fitness of features randomly selected. An ensembled-based learning model for Hepatitis has been developed for Hepatitis C stage classification. 1385 samples were used using Egyptian-dataset obtained from UCI repository. The comparative evaluation confirms decreased in execution time and an increase in model performance accuracy from 56% to 63%.

Keywords

References

  1. I. Cherki, A. Chaker, Z. Djidar, and N. Khalfallah, "A Sequential Hybridization of Genetic Algorithm and Particle Swarm Optimization for the Optimal Reactive Power Flow," 2019.
  2. K. Drachal, "A Review of the Applications of Genetic Algorithms to Forecasting Prices of Commodities," 2021.
  3. A. M. Aibinu, B. S. H, and M. N. C. M. Akachukwu, "A Novel Clustering based Genetic Algorithm ( CGA ) for Robot Route and Functions Optimization."
  4. X. Zhou, F. Miao, and H. Ma, "Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data," pp. 1-23, 2018, doi: 10.3390/info9040101.
  5. Y. Deng, Y. Liu, and D. Zhou, "An Improved Genetic Algorithm with Initial Population Strategy for Symmetric TSP," vol. 2015, 2015.
  6. A. B. Hassanat, V. B. S. Prasath, M. A. Abbadi, S. A. Abuqdari, and H. Faris, "An Improved Genetic Algorithm with a New Initialization Mechanism Based on Regression Techniques," doi: 10.3390/info9070167.
  7. T. G. Dietterich, "Ensemble methods in machine learning," Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 1857 LNCS, pp. 1-15, 2000, doi: 10.1007/3-540-45014-9_1.
  8. R. Richman and M. V. Wuthrich, "Nagging predictors," Risks, vol. 8, no. 3, pp. 1-26, 2020, doi: 10.3390/risks8030083.
  9. Y. Freund and R. E. Schapire, "Experiments with a New Boosting Algorithm," Proc. 13th Int. Conf. Mach. Learn., pp. 148-156, 1996, doi: 10.1.1.133.1040.
  10. Y. L. Pavlov, "Random forests," Random For., pp. 1-122, 2019, doi: 10.1201/9780429469275-8.
  11. V. M. Cowton, J. B. Singer, R. J. Gifford, and A. H. Patel, "Predicting the effectiveness of hepatitis C virus neutralizing antibodies by bioinformatic analysis of conserved epitope residues using public sequence data," Front. Immunol., vol. 9, no. JUN, pp. 1-14, 2018, doi: 10.3389/fimmu.2018.01470.
  12. WHO, "WHO Global Hepatitis Report," 2017, [Online]. Available: http://apps.who.int/iris/bitstream/10665/255016/1/9789241565455-eng.pdf?ua=1.
  13. CDC, "Hepatitis C," Osp. Magg., p. 2, 2015, doi: 10.1016/j.disamonth.2014.04.002.
  14. FIND, "Strategy for Hepatitis C 2015-2020," 2014.
  15. C. W. Shepard, L. Finelli, and M. J. Alter, "Global epidemiology of hepatitis C virus infection.," Lancet. Infect. Dis., vol. 5, no. 9, pp. 558-67, 2005, doi: 10.1016/S1473-3099(05)70216-4.
  16. Global hepatitis report, 2017. 2017.
  17. A. Roos et al., "Investigations, findings, and follow-up in patients with chest pain and elevated high-sensitivity cardiac troponin T levels but no myocardial infarction," Int. J. Cardiol., vol. 232, no. June, pp. 111-116, 2017, doi: 10.1016/j.ijcard.2017.01.044.
  18. M. Jefferies, B. Rauff, H. Rashid, T. Lam, and S. Rafiq, "Update on global epidemiology of viral hepatitis and preventive strategies," World J. Clin. Cases, vol. 6, no. 13, pp. 589-599, 2018, doi: 10.12998/wjcc.v6.i13.589.
  19. C. T. Wai et al., "A simple noninvasive index can predict both significant fibrosis and cirrhosis in patients with chronic hepatitis C," Hepatology, vol. 38, no. 2, pp. 518-526, 2003, doi: 10.1053/jhep.2003.50346.
  20. P. Halfon et al., "Accuracy of hyaluronic acid level for predicting liver fibrosis stages in patients with hepatitis C virus," Comp. Hepatol., vol. 4, pp. 1-7, 2005, doi: 10.1186/1476-5926-4-6.
  21. R. Tinati, X. Wang, I. Brown, T. Tiropanis, and W. Hall, "A Streaming Real-Time Web Observatory Architecture for Monitoring the Health of Social Machines," Proc. 24th Int. Conf. World Wide Web - WWW '15 Companion, pp. 1149-1154, 2015, doi: 10.1145/2740908.2743977.
  22. R. Umar, A. David, and A. Adesiyun, "Observatory system for monitoring hepatitis c development in Nigeria," 2019 15th Int. Conf. Electron. Comput. Comput. ICECCO 2019, no. Icecco, pp. 1-6, 2019, doi: 10.1109/ICECCO48375.2019.9043245.
  23. A. H. Observatory, "Health Observatories," no. April, 2016.
  24. J. S. Sartakhti, M. H. Zangooei, and K. Mozafari, "Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA)," Comput. Methods Programs Biomed., vol. 108, no. 2, pp. 570-579, 2012, doi: 10.1016/j.cmpb.2011.08.003.
  25. T. M. Ghazal et al., "Hep-pred: Hepatitis C staging prediction using fine gaussian SVM," Comput. Mater. Contin., vol. 69, no. 1, pp. 191-203, 2021, doi: 10.32604/cmc.2021.015436.
  26. D. Sarma et al., "Artificial Neural Network Model for Hepatitis C Stage Detection," EDU J. Comput. Electr. Eng., vol. 1, no. 1, pp. 11-16, 2020, doi: 10.46603/ejcee.v1i1.6.
  27. A. M. Hashem, M. E. M. Rasmy, K. M. Wahba, and O. G. Shaker, "Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection," Comput. Methods Programs Biomed., vol. 105, no. 3, pp. 194-209, 2012, doi: 10.1016/j.cmpb.2011.10.005.
  28. N. H. Barakat, S. H. Barakat, and N. Ahmed, "Prediction and staging of hepatic fibrosis in children with hepatitis c virus: A machine learning approach," Healthc. Inform. Res., vol. 25, no. 3, pp. 173-181, 2019, doi: 10.4258/hir.2019.25.3.173.
  29. M. A. Khan, J. E. Soh, M. Maenner, W. W. Thompson, and N. P. Nelson, "A machine-learning algorithm to identify hepatitis C in health insurance claims data," Online J. Public Health Inform., vol. 11, no. 1, pp. 98-99, 2019, doi: 10.5210/ojphi.v11i1.9685.
  30. D. M. Journal, "An application of multilayer neural network on hepatitis disease diagnosis using approximations of sigmoid activation function Hepatitis disease dataset," vol. 42, no. 2, pp. 150-157, 2015, doi: 10.5798/diclemedj.0921.2015.02.0550.
  31. W. Mostert and K. M. Malan, "Comparative Analysis," pp. 1-16, 2021.
  32. S. Wu, Y. Hu, W. Wang, X. Feng, and W. Shu, "Application of Global Optimization Methods for Feature Selection and Machine Learning," vol. 2013, 2013.
  33. P. L. Lanzi, "Fast feature selection with genetic algorithms: A filter approach," Proc. IEEE Conf. Evol. Comput. ICEC, pp. 537-540, 1997, doi: 10.1109/icec.1997.592369.
  34. X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma, "A survey on ensemble learning," Front. Comput. Sci., vol. 14, no. 2, pp. 241-258, 2020, doi: 10.1007/s11704-019-8208-z.
  35. "Ensemble Methods, Foundations and Algorithms.pdf." .
  36. A. K. Seewald, "Towards a theoretical framework for ensemble classification," IJCAI Int. Jt. Conf. Artif. Intell., no. 3, pp. 1443-1444, 2003.
  37. P. Pintelas and I. E. Livieris, "Special issue on ensemble learning and applications," Algorithms, vol. 13, no. 6, 2020, doi: 10.3390/A13060140.
  38. R. Umar, M. M. Boukar, S. Adeshina, and S. Dane, "Machine Learning Approaches for Optimal Parameter Selection for Hepatitis Disease Classification."
  39. F. E. H. Tay and L. Shen, "A modified Chi2 algorithm for discretization," IEEE Trans. Knowl. Data Eng., vol. 14, no. 3, pp. 666-670, 2002, doi: 10.1109/TKDE.2002.1000349.
  40. H. Liu and R. Setiono, "Feature Selection via Discretization," vol. 9, no. 4, pp. 1995-1998, 1997.
  41. H. Liu and R. Setiono, "Chi2: feature selection and discretization of numeric attributes," Proc. Int. Conf. Tools with Artif. Intell., pp. 388-391, 1995, doi: 10.1109/tai.1995.479783.