DOI QR코드

DOI QR Code

A Survey on the Performance Comparison of Map Reduce Technologies and the Architectural Improvement of Spark

  • Raghavendra, GS (Computer Science and Engineering, RVR & JC College of Engineering) ;
  • Manasa, Bezwada (Computer Science and Engineering, RVR & JC College of Engineering) ;
  • Vasavi, M. (Computer Science and Engineering, RVR & JC College of Engineering)
  • 투고 : 2022.05.05
  • 발행 : 2022.05.30

초록

Hadoop and Apache Spark are Apache Software Foundation open source projects, and both of them are premier large data analytic tools. Hadoop has led the big data industry for five years. The processing velocity of the Spark can be significantly different, up to 100 times quicker. However, the amount of data handled varies: Hadoop Map Reduce can process data sets that are far bigger than Spark. This article compares the performance of both spark and map and discusses the advantages and disadvantages of both above-noted technologies.

키워드

과제정보

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors would like to thank the editor and anonymous reviewers for their comments that help improve the quality of this work

참고문헌

  1. "Apache Map Reduce" IBM technologies 2020.
  2. "Apache Spark Tutorial for Beginners" Data Flair 2020.
  3. "Real Time Cluster Computing Framework" Sandeep Dayananda, 2020
  4. "Hadoop MapReduce vs Spark: A Comprehensive Analysis "Nicholas Samuel on Data Integration, ETL
  5. "Apache Spark Pros and Cons" Knowledge Hut. 2020
  6. "Limitations of Apache Spark" techvidvan 2020
  7. Adesh Chimariya B. Professor Mika Mantyla, "Streaming Data AnalyticsBackground, Technologies, and Outlook," Master's Thesis, University of Oulu
  8. Ovidiu-Cristian Marcu , Alexandru Costan , Gabriel Antoniu , Maria S. Perez-Hernandez Bogdan Nicolae† , Radu Tudoran, Stefano Bortoli 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS) ,pp.1480-1485.
  9. UnGyu Han and Jinho Ahn, "Dynamic Load Balancing Method for Apache Flume Log Processing," in Advanced Science and Technology Letters, Vol.79 (IST 2014), pp.83-86
  10. Yang Ruan, Zhenhua Guo, Yuduo Zhou, Judy Qiu, Geoffrey Fox, "HyMR: a Hybrid MapReduce Workflow System," ACM 978-1-4503-1339-1/12/06.
  11. Gautam Pal, Gangmin Li, Katie Atkinson "Multi-Agent Big-Data Lambda Architecture Model for E-Commerce Analytics" ,,mdpi ,pp.1-15.
  12. Gautam Pal, Gangmin Li, Katie Atkinson "Big Data Real Time Ingestion and Machine Learning", IEEE Second International Conference on Data Stream Mining & Processing,
  13. Gunturi S Raghavendra,Prof Shanthi Mahesh, Prof MVP Chandrasekhara Raohttps://www.ijrte.org/portfolioitem/e6045018520/