Exploiting instruction level parallelism in geometry processing for three dimensional graphics applications
Three dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geometry processing of 3D graphics on the host CPU and have specialized hardware handle the rendering task. In this paper, we analyze microarchitecture and SIMD instruction set enhancements to a RISC superscalar processor for exploiting instruction level parallelism (ILP) in geometry processing for 3D computer graphics. Our results show that 3D geometry processing has inherent parallelism. When ignoring cycle time effects, an 8-issue processor can achieve up to 60% performance improvement over a 4-issue. However, certain application attributes can hinder the exploitation of ILP on a super-scalar processor. Adding SIMD operations improves performance from 8% to 28% on a 4-issue processor that can issue at most 2 floating-point operations. If processor cycle time scales with the number of ports to the register file, doubling only the floating-point issue width of a 4-issue processor with SIMD instructions gives the best performance among the architecture configurations that we examine (the most aggressive configuration is an 8-issue processor with SIMD instructions).