DOI QR코드

DOI QR Code

Software Pipeline-Based Partitioning Method with Trade-Off between Workload Balance and Communication Optimization

  • Huang, Kai (Institute of VLSI Design, Zhejiang University) ;
  • Xiu, Siwen (College of Optical and Electronic Technology, China Jiliang University) ;
  • Yu, Min (Institute of VLSI Design, Zhejiang University) ;
  • Zhang, Xiaomeng (Institute of VLSI Design, Zhejiang University) ;
  • Yan, Rongjie (Institute of Software, Chinese Academy of Sciences) ;
  • Yan, Xiaolang (Institute of VLSI Design, Zhejiang University) ;
  • Liu, Zhili (C-Sky Microsystem Co., Ltd.)
  • Received : 2014.04.22
  • Accepted : 2015.01.02
  • Published : 2015.05.01

Abstract

For a multiprocessor System-on-Chip (MPSoC) to achieve high performance via parallelism, we must consider how to partition a given application into different components and map the components onto multiple processors. In this paper, we propose a software pipeline-based partitioning method with cyclic dependent task management and communication optimization. During task partitioning, simultaneously considering computation load balance and communication optimization can cause interference, which leads to performance loss. To address this issue, we formulate their constraints and apply an integer linear programming approach to find an optimal partitioning result - one that requires a trade-off between these two factors. Experimental results on a reconfigurable MPSoC platform demonstrate the effectiveness of the proposed method, with 20% to 40% performance improvements compared to a traditional software pipeline-based partitioning method.

Keywords

References

  1. C. Bienia and K. Li, "Characteristics of Workloads Using the Pipeline Programming Model," Comput. Archit., vol. 6161, 2012, pp. 161-171.
  2. S. Eyerman and L. Eeckhout, "Modeling Critical Sections in Amdahl's Law and its Implications for Multicore Design," ACM SIGARCH Comput. Archit. News, New York, NY, USA, vol. 38, no. 3, June 2010, pp. 362-370.
  3. R. Yan et al., "Communication Pipelining for Code Generation from Simulink Models," IEEE Int. Conf. Trust, Security Privacy Comput. Commun., Melbourne, Australia, July 16-18, 2013, pp. 1893-1900.
  4. G. Kahn and D. MacQueen, "Information Processing: Coroutines and Networks of Parallel Processes," Amsterdam, Netherlands: Gilchrist, B. eds., 1977, pp. 993-998.
  5. E.A. Lee and T.M. Parks, "Dataflow Process Networks," Proc. IEEE, vol. 83, no. 5, May 1995, pp. 773-801. https://doi.org/10.1109/5.381846
  6. UML, Object Management Group, Inc. Accessed Apr. 1, 2014. http://www.uml.org/
  7. Simulink, Mathworks. Accessed Apr. 1, 2014. http://www.mathworks.com
  8. Real-Time Workshop, Mathworks. Accessed Apr. 1, 2014. http://www.mathworks.com
  9. RTI-MP, dSPACE, Inc. Accessed Apr. 1, 2014. http://www.spaceinc.com/ww/en/inc/home/products/sw/impsw/rtimpblo.cfm
  10. A. Canedo, T. Yoshizawa, and H. Komatsu, "Automatic Parallelization of Simulink Applications," Proc. Annual IEEE/ACM Int. Symp. Code Generation Optimization, Toronto, Canada, Apr. 24-28, 2010, pp. 151-159.
  11. A. Canedo, T. Yoshizawa, and H. Komatsu, "Skewed Pipelining for Parallel Simulink Simulations," Des., Automation Test Europe Conf. Exhibition, Dresden, Germany, Mar. 8-12, 2010, pp. 891-896.
  12. S.-I. Han et al., "Memory-Efficient Multithreaded Code Generation from Simulink for Heterogeneous MPSoC," Des., Autom. Embedded Syst., vol. 11, no. 4, Dec. 2007, pp. 249-283. https://doi.org/10.1007/s10617-007-9009-4
  13. H. Orsila et al., "Automated Memory-Aware Application Distribution for Multi-processor System-on-Chips," J. Syst. Archit., vol. 5, no. 11, Nov. 2007, pp. 795-815.
  14. Y. Yi et al., "An ILP Formulation for Task Mapping and Scheduling on Multi-core Architectures," IEEE Des., Autom. Test Europe Exhibition, Nice, France, Apr. 20-24, 2009, pp. 33-38.
  15. A.K. Singh et al., "Mapping on Multi/Many-core Systems: Survey of Current and Emerging Trends," ACM/EDAC/IEEE Des., Autom. Conf., Austin, TX, USA, Article no. 1, May 29-June 7, 2013, pp. 1-10.
  16. Y. Wang et al., "Overhead-Aware Energy Optimization for Real-Time Streaming Applications on Multiprocessor System-onChip," ACM Trans. Des. Autom. Electron. Syst., vol. 16, no. 2, Mar. 2011, Article 14.
  17. H. Yang and S. Ha, "Pipelined Data Parallel Task Mapping/Scheduling Technique for MPSoC," Des., Autom. Test Europe Exhibition, Nice, France, Apr. 20-24, 2009, pp. 69-74.
  18. J. Cong, G. Han, and W. Jiang, "Synthesis of an ApplicationSpecific Soft Multiprocessor System," Proc. ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, Monterey, CA, USA, Feb. 18-20, 2007, pp. 99-107.
  19. M.I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs," Proc. Int. Conf. Archit. Support Programming Language Operation Syst., San Jose, CA, USA, Oct. 21-25, 2006, pp. 151-162.
  20. M. Kudlur and S. Mahlke, "Orchestrating the Execution of Stream Programs on Multicore Platforms," ACM SIGPLAN Notices (PLDI'08), vol. 43, no. 6, June 2008, pp. 114-124. https://doi.org/10.1145/1379022.1375596
  21. H. Javid and S. Parameswaran, "A Design Flow for Application Specific Heterogeneous Pipelined Multiprocessor Systems," Proc. Annual Des. Autom. Conf., San Francisco, CA, USA, July 26-31, 2009, pp. 250-253.
  22. D. Cordes et al., "Automatic Extraction of Pipeline Parallelism for Embedded Software Using Linear Programming," IEEE Int. Conf. Parallel Distrib. Syst., Tainan, Taiwan, Dec. 7-9, 2011, pp. 699-706.
  23. S.-I. Han, S.-I. Chae, and A.A. Jerraya, "Functional Modeling Techniques for Efficient SW Code Generation of Video Codec Applications," Proc. Asia South Pacific Des. Autom. Conf., Yokohama, Japan, Jan. 24-27, 2006, pp. 935-940.
  24. S.-I. Han et al., "Simulink-Based Heterogeneous Multiprocessor SoC Design Flow for Mixed Hardware/Software Refinement and Simulation," J. Integr. VLSI, vol. 42, no. 2, Feb. 2009, pp. 227-245. https://doi.org/10.1016/j.vlsi.2008.08.003
  25. C.E. Leiserson and J.B. Saxe, "Retiming Synchronous Circuitry," J. Algorithmica, vol. 6, no. 1-6, June 1991, pp. 5-35. https://doi.org/10.1007/BF01759032
  26. L. Brisolara et al., "Reducing Fine-Grain Communication Overhead in Multithread Code Generation for Heterogeneous MPSoC," Proc. Int. Workshop Softw. Compilers Embedded Syst., Nice, France, Apr. 20, 2007, pp. 81-89.
  27. S.-I. Han et al., "Buffer Memory Optimization for Video Codec Application Modeled in Simulink," Proc. Annual Des. Autom. Conf., San Francisco, CA, USA, July 24-28, 2006, pp. 689-694.
  28. C-SKY Inc. Accessed Apr. 1, 2014. http://www.c-sky.com
  29. S.-I. Han et al., "An Efficient Scalable and Flexible Data Transfer Architecture for Multiprocessor SoC with Massive Distributed Memory," Proc. Annual Des. Autom. Conf., San Diego, CA, USA, June 7-11, 2004, pp. 250-255.

Cited by

  1. Adaptive Internet of Things and Web of Things convergence platform for Internet of reality services vol.72, pp.1, 2015, https://doi.org/10.1007/s11227-015-1489-6