DOI QR코드

DOI QR Code

Finding Biomarker Genes for Type 2 Diabetes Mellitus using Chi-2 Feature Selection Method and Logistic Regression Supervised Learning Algorithm

  • Alshamlan, Hala M (Information Technology Department College of Computer and Information King Saud University)
  • Received : 2021.02.05
  • Published : 2021.02.28

Abstract

Type 2 diabetes mellitus (T2D) is a complex diabetes disease that is caused by high blood sugar, insulin resistance, and a relative lack of insulin. Many studies are trying to predict variant genes that causes this disease by using a sample disease model. In this paper we predict diabetic and normal persons by using fisher score feature selection, chi-2 feature selection and Logistic Regression supervised learning algorithm with best accuracy of 90.23%.

Keywords

References

  1. International Diabetes Federation, IDF DIABETES ATLAS, 8th ed. 2017.
  2. "Diabetes." .
  3. D. J. Hunter, "Gene-environment interactions in human diseases," Nat. Rev. Genet., vol. 6, no. 4, pp. 287-298, Apr. 2005. https://doi.org/10.1038/nrg1578
  4. Z. Punthakee, R. Goldenberg, and P. Katz, "Definition, Classification and Diagnosis of Diabetes, Prediabetes and Metabolic Syndrome," Can. J. Diabetes, vol. 42, no. Supplement 1, pp. S10-S15, Apr. 2018.
  5. "International Diabetes Federation - What is diabetes." .
  6. L. Panawala, "Difference Between Gene and Genome," Feb. 2017.
  7. I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, "Machine Learning and Data Mining Methods in Diabetes Research," Comput. Struct. Biotechnol. J., vol. 15, Jan. 2017.
  8. N. Sneha and T. Gangil, "Analysis of diabetes mellitus for early prediction using optimal features selection," J. Big Data, 2019.
  9. H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, "Informatics in Medicine Unlocked Type 2 diabetes mellitus prediction model based on data mining," Informatics Med. Unlocked, vol. 10, no. August 2017, pp. 100-107, 2018. https://doi.org/10.1016/j.imu.2017.12.006
  10. R. C. Anirudha, R. Kannan, and N. Patil, "Genetic Algorithm Based Wrapper Feature Selection on Hybrid Prediction Model for Analysis of High Dimensional Data," 2014 9th Int. Conf. Ind. Inf. Syst., pp. 1-6.
  11. V. V. V, "Study of Data Mining Algorithms for Prediction and Diagnosis of Diabetes Mellitus," vol. 95, no. 17, pp. 12-16, 2014. https://doi.org/10.5120/16685-6801
  12. S. Habibi, M. Ahmadi, and S. Alizadeh, "Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree : Results of Data Mining," vol. 7, no. 5, pp. 304-310, 2015.
  13. W. Chen, S. Chen, and H. Zhang, "A Hybrid Prediction Model for Type 2 Diabetes Using K-means and Decision Tree," no. 61272399.
  14. A. Ramezankhani, O. Pournik, and J. Shahrabi, "The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes," no. 24, pp. 137-144, 2016.
  15. H. Ban, J. Y. Heo, K. Oh, and K. Park, "Identification of Type 2 Diabetes-associated combination of SNPs using Support Vector Machine," 2010.
  16. A. Kumar, D. J. Sundara, and S. Singh, "Genomics Data SVMRFE based approach for prediction of most discriminatory gene target for type II diabetes," Genomics Data, vol. 12, pp. 28-37, 2017. https://doi.org/10.1016/j.gdata.2017.02.008
  17. C. S. Vasamsetty, I. Member, S. R. Peri, A. A. Rao, and K. Srinivas, "Gene Expression Analysis for Type-2 Diabetes Mellitus - A Case Study on Healthy vs Diabetes with Parental History," vol. 3, no. 3, 2011.
  18. H. Liu et al., "Detection of type 2 diabetes related modules and genes based on epigenetic networks," vol. 8, no. Suppl 1, pp. 1-16, 2014. https://doi.org/10.1186/1752-0509-8-1
  19. GEO, "GSE38642." [Online]. Available: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38642. [Accessed: 05-Oct-2019].
  20. GEO, "GSE13760." [Online]. Available: 5-10-2019.