Performance Analysis of Real-Time Big Data Search Platform Based on High-Capacity Persistent Memory

대용량 영구 메모리 기반 실시간 빅데이터 검색 플랫폼 성능 분석

  • 이은서 (숙명여자대학교 컴퓨터과학과) ;
  • 박동철 (중앙대학교 산업보안학과)
  • Received : 2023.05.02
  • Accepted : 2023.07.11
  • Published : 2023.08.31

Abstract

The advancement of various big data technologies has had a tremendous impact on many industries. Diverse big data research studies have been conducted to process and analyze massive data quickly. Under these circumstances, new emerging technologies such as high-capacity persistent memory (PMEM) and Compute Express Link (CXL) have lately attracted significant attention. However, little investigation into a big data "search" platform has been made. Moreover, most big data software platforms have been still optimized for traditional DRAM-based computing systems. This paper first evaluates the basic performance of Intel Optane PMEM, and then investigates both indexing and searching performance of Elasticsearch, a widely-known enterprise big data search platform, on the PMEM-based computing system to explore its effectiveness and possibility. Extensive and comprehensive experiments shows that the proposed Optane PMEM-based Elasticsearch achieves indexing and searching performance improvement by an average of 1.45 times and 3.2 times respectively compared to DRAM-based system. Consequently, this paper demonstrates the high I/O, high-capacity, and nonvolatile PMEM-based computing systems are very promising for big data search platforms.

다양한 빅데이터 기술의 발전은 많은 산업에 큰 영향을 미치고 있으며, 방대한 양의 데이터를 빠르게 처리하고 분석하기 위해 여러 연구가 진행되고 있다. 이러한 상황에서 인텔 차세대 대용량 영구 메모리 모듈이나 CXL과 같은 새로운 형태의 메모리와 컴퓨팅 기술이 크게 주목받고 있다. 그러나, 현존하는 대부분의 빅데이터 소프트웨어 플랫폼들은 여전히 기존의 전통적인 DRAM 환경을 기반으로 최적화되어 있으며, 특히 빅데이터 실시간 검색 플랫폼 관련 연구는 상대적으로 미흡한 실정이다. 본 연구에서는 차세대 영구 메모리인 인텔 옵테인 영구 메모리의 기본 성능을 평가하고, 옵테인 영구 메모리 기반 시스템에서 빅데이터 실시간 검색 플랫폼으로 유명한 Elasticsearch의 다양한 성능 분석 결과를 통해 대용량 영구 메모리의 효용성과 가능성을 검증한다. 본 논문은 대용량 영구 메모리 기반 시스템이 기존 DRAM 기반 시스템에 비하여 색인과 검색 측면에서 각각 1.45배, 3.2배의 성능 향상을 확인하였고, 이를 통해 고성능 I/O와 대용량, 비휘발성 등의 다양한 이점을 가진 차세대 영구 메모리가 Elasticsearch와 같은 빅데이터 검색 플랫폼에서 좋은 대안이 될 수 있음을 확인하였다.

Keywords

Acknowledgement

본 연구는 과학기술정보통신부 및 정보통신기획평가원의 대학 ICT 연구센터지원사업의 연구결과로 수행되었음 (IITP-2023-2018-0-01799).

References

  1. Z. Zheng, P. Wang, J. Liu, and S. Sun. (2015). Real-time big data processing framework: challenges and solutions. [Online]. Available: https://www.naturalspublishing.com/files/published/v6910010rnl56m.pdf.
  2. Intel Optane Persistent Memory, https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent- memory/overview.html.
  3. Compute Express Link (CXL), https://www.computeexpresslink.org/.
  4. Elasticsearch, https://www.elastic.co/.
  5. Scale Up Memory for SAP HANA Performance, https://www.intel.com/content/www/us/en/big-data/partners/sap/intel-it-sap-hana-persistent-memory-infographic.html
  6. H.J. Oh. (2022). The performance analysis of big data processing platforms using next-generation large-capacity persistent memory (Master's thesis). Sookmyung Women's University, Department of Computer Science.
  7. Y. Wu, K. Park, R. Sen, B. Kroth, and J. Do, "Lessons learned from the early performance evaluation of Intel optane DC persistent memory in DBMS," in Proc. of the 16th International Workshop on Data Management on New Hardware (DaMoN), No. 14, pp. 1-3, June. 2020.
  8. M. Weiland, et al. "An early evaluation of Intel's optane DC persistent memory module and its impact on high-performance scientific applications," in Proc. of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), No. 76, pp. 1-19, Nov. 2019.
  9. S. Akram, "Exploiting Intel Optane Persistent Memory for Full Text Search," ISMM 2021, pp. 80-93, June. 2021.
  10. Intel Corporation. (2019). Memory Tiering: Improving Data Management (White Paper). Available: https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistentmemory/memory-tiering-improving-data-management-paper.html.
  11. Intel Optane Persistent Memory 200 Series Brief, https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/optane-persistent-memory-200-series-brief.html.
  12. "Intel Optane DC Persistent Memory Readies for Widespread Deployment," Intel Newsroom, October 30, 2018. Available: https://newsroom.intel.com/news/intel-optane-dc-persistent-memory-readies-widespread-deployment/.
  13. Izraelevitz, Joseph, et al. "Basic performance measurements of the intel optane DC persistent memory module," arXiv preprint arXiv:1903.05714. 2019.
  14. Y.S. Lee, Y.J. Woo, and S.I. Jung, "Trend of Intel Nonvolatile Memory Technology," Electronics and Telecommunications Trends (ETRI), Vol. 35, No. 3, pp. 55-65. June. 2020.
  15. S.H. Park, et al. (2021). Practical Know-how for Elasticsearch Operations from the Basics. Insight. pp 1-20, pp 75-107.
  16. What is Elasticsearch?, https://www.elastic.co/guide/en/elasticsearch/reference/7.17/elasticsearch-intro.html.
  17. I. B. Peng, M. B. Gokhale, E. W. Green, "System Evaluation of the Intel Optane Byte-addressable NVM," In Proceedings of the International Symposium on Memory Systems. 2019, pp. 304-315.
  18. Rally, https://github.com/elastic/rally/.
  19. Rally-eventdata-track, https://github.com/elastic/rally-eventdata-track.
  20. Important Elasticsearch configuration, https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html.
  21. E. Lee, H. Oh, and D. Park, "Big data processing on single board computer clusters: Exploring challenges and possibilities," IEEE Access, vol. 9, pp. 142551-142565, 2021. https://doi.org/10.1109/ACCESS.2021.3120660
  22. Elasticsearch Bulk API, https://www.elastic.co/guide/en/elasticsearch/reference/7.17/docsbulk.html.
  23. S. Lim, and D. Park, "Efficient stack distance approximation based on workload characteristics," IEEE Access, vol. 10, pp. 59792-59805, 2022.  https://doi.org/10.1109/ACCESS.2022.3180327