Study of an In-order SMT Architecture and Grouping Schemes

  • Moon, Byung-In (SP Division of System IC, Hynix Semiconductor Inc.,) ;
  • Kim, Moon-Gyung (Department of Electrical & Electronic Engineering, Yonsei University) ;
  • Hong, In-Pyo (Department of Electrical & Electronic Engineering, Yonsei University) ;
  • Kim, Ki-Chang (School of Information & Communication Engineering, Inha University) ;
  • Lee, Yong-Surk (Department of Electrical & Electronic Engineering, Yonsei University)
  • Published : 2003.09.01

Abstract

In this paper, we propose a simultaneous multithreading (SMT) architecture that improves instruction throughput by exploiting instruction level parallelism (ILP) and thread level parallelism (TLP). The proposed architecture issues and completes instructions belonging to the same thread in exact program order. The issue and completion policy greatly reduces the design complexity and hardware cost of our architecture, compared with others that employ out-of-order issue and completion. On the other hand, when the instructions belong to different threads, the issue and completion orders for those instructions may not necessarily be identical to the fetch order. The processor issues instructions simultaneously from multiple threads to functional units by exploiting ILP and TLP, and by dynamic resource sharing. That parallel execution notably improves performance and resource utilization with minimal additional hardware cost over the conventional superscalar processors. This paper proposes an SMT architecture with grouping as well as one without grouping. Without grouping, all threads dynamically and flexibly share most resources. On the other hand, in the SMT architecture with grouping, in which resources and threads are divided into several groups for design simplification, resources are shared only among threads belonging to the same group as those resources. Simulation results show that our processors with four and eight threads improve performance by three or more times over the conventional superscalar processor with comparable execution resources and policies, and that reasonable grouping reduces the design complexity of SMT processors with little negative effect on performance.

Keywords

References

  1. Microprocessor Report v.11 no.9 Multithreading comes of age P. Song
  2. Proc. 5th International Parallel Processing Symposium Special features of a VLIW architecture A. Abnous;N. Bagherzadeh
  3. Proc. 22nd International Symposium on Computer Architecture Simultaneous multithreading: maximizing on-chip parallelism D. M. Tullsen;S. J. Eggers;H. M. Levy
  4. Microprocessor Report v.13 no.16 Compaq chooses SMT for Alpha K. Diefendorff
  5. Proc. 23rd International Symposium on Computer Architecture Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor D. M. Tuilsen;S. J. Eggers;J. S. Emer;H.M. Levy;J. L. Lo;R. L. Stamm
  6. Computer Architecture: A Quantitative Approach(Second Edition) D. A. Patterson;J. L. Hennesy
  7. Superscalar Microprocessor Design M. Johnson
  8. Proc. 19th International Symposium on Computer Architecture An elementary processor architecture with simultaneous instruction issuing from multiple threads H. Hirata;K. Kimura;S. Nagamine;Y Mochizuki;A. Nishimura;Y. Nakase;T. Nishizawa
  9. IEEE Trans. Comput. v.42 no.1 High-bandwidth interleaved memories for vector processors-a simulation study G. S. Sohi;M. Flanklin
  10. IEEE Computer v.33 no.7 SPEC CPU2000: measuring CPU performance in the New Millennium J. L. Henning
  11. ARM Developer Suite: Compiler Linker and Utilities Guide
  12. Proc. IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques Increasing superscalar performance through multistreaming W. Yamamoto;M. Nemirovsky
  13. Proc. International Conference on Parallel Processing v.1 A benchmark evaluation of a multi-threaded RISC processor architecture R. Prasadh;C.-L. Wu