DOI QR코드

DOI QR Code

A Fault-Tolerant Scheme Based on Message Passing for Mission-Critical Computers

임무지향 컴퓨터를 위한 메시지패싱 고장감내 기법

  • Kim, Taehyon (The 1st Research and Development Institute, Agency for Defense Development) ;
  • Bae, Jungil (The 1st Research and Development Institute, Agency for Defense Development) ;
  • Shin, Jinbeom (The 1st Research and Development Institute, Agency for Defense Development) ;
  • Cho, Kilseok (The 1st Research and Development Institute, Agency for Defense Development)
  • 김태현 (국방과학연구소 제1기술연구본부) ;
  • 배정일 (국방과학연구소 제1기술연구본부) ;
  • 신진범 (국방과학연구소 제1기술연구본부) ;
  • 조길석 (국방과학연구소 제1기술연구본부)
  • Received : 2015.06.25
  • Accepted : 2015.11.06
  • Published : 2015.12.05

Abstract

Fault tolerance is a crucial design for a mission-critical computer such as engagement control computer that has to maintain its operation for long mission time. In recent years, software fault-tolerant design is becoming important in terms of cost-effectiveness and high-efficiency. In this paper, we propose MPCMCC which is a model-based software component to implement fault tolerance in mission-critical computers. MPCMCC is a fault tolerance design that synchronizes shared data between two computers by using the one-way message-passing scheme which is easy to use and more stable than the shared memory scheme. In addition, MPCMCC can be easily reused for future work by employing the model based development methodology. We verified the functions of the software component and analyzed its performance in the simulation environment by using two mission-critical computers. The results show that MPCMCC is a suitable software component for fault tolerance in mission-critical computers.

Keywords

References

  1. J. Shin, S. Kim, "Reliability Analysis of The Mission-Critical Engagement Control Computer Using Active Sparing Redundancy," The KIPS Transactions: Part A, Vol. 15, No. 6, pp. 309-316, 2008.
  2. R. E. Lyons, W. Vanderkulk, "The Use of Triple- Modular Redundancy to Improve Computer Reliability," IBM Journal of Research and Development, Vol. 6, No. 2, pp. 200-209, 1962. https://doi.org/10.1147/rd.62.0200
  3. J. Gray, D. P. Siewiorek, "High-Availability Computer Systems," Computer, Vol. 24, No. 9, pp. 39-48, 1991. https://doi.org/10.1109/2.84898
  4. A. Avizienis, "The N-Version Approach to Fault- Tolerant Software," Software Engineering, IEEE Transactions on, Vol. SE-11, No. 12, pp. 1491-1501, 1985. https://doi.org/10.1109/TSE.1985.231893
  5. K. Shin, Y. Lee, "Evaluation of Error Recovery Blocks Used for Cooperating Processes," Software Engineering, IEEE Transactions on, Vol. SE-10, No. 6, pp. 692-700, 1984. https://doi.org/10.1109/TSE.1984.5010298
  6. I. Lee, R. K. Iyer, "Software Dependability in the Tandem GUARDIAN System," Software Engineering, IEEE Transactions on, Vol. 21, No. 5, pp. 455-467, 1995. https://doi.org/10.1109/32.387474
  7. D. Song, C. Lee, "An Implementation of Fault- Tolerant Message Passing Interface on Parallel Computers," Journal of KIISE : Computing Practices and Letters, Vol. 6, No. 3, pp. 319-328, 2000.
  8. M. Yoo, et. al., "Development of the Engagement Control Software Architecture Based on UML 2.0 Model," Journal of the Korea Institute of Military Science and Technology, Vol. 10, No. 4, pp. 20-29, 2007.
  9. B. Rajappa, Y. Motiwala, "Message Based Redundancy Approach using Totem Protocol for Telecom Applications and Protocol Stacks," Communication Systems Software and Middleware, 2nd International Conference on, pp. 1-6, Jan. 2007.
  10. R. Batchu, et. al., "MPI/FT: A Model-Based Approach to Low-Overhead Fault Tolerant Message- Passing Middleware," Cluster Computing, Vol. 7, No. 4, pp. 303-315, 2004. https://doi.org/10.1023/B:CLUS.0000039491.64560.8a