Slide View : Parallel Programming

Previous | Next --- Slide 30 of 52

dhrubamahanta

Sometimes for performance boost, we compile our programs with specific switches (eg, AVX-512 for Intel). Does it work as directive to Superscalar execution?

ILP seems to be a functionality designed and implemented for a CPU core. Does it works in the same way for a virtual core, when hyper threading is enabled?

kayvonf

@dhrubamahanta. We'll get into answers to both of those question in Lecture 2. I'll refer to superscalar execution as an implementation where the processor executes multiple instructions from a single instruction stream in parallel (these instructions that are logically executed sequentially, but have no dependencies and thus can be executed in parallel with no impact on program output).

On an x86 CPU, a SIMD instruction, such as AVX2, AVX-512 (on ARM CPUs feature "Neon" SIMD instructions) is a single instruction that explicitly describes an operation performed on a fixed-width vector of values. For example, an AVX add instruction adds two 256-bit registers, each containing eight floating point values to produce an 8-vector as a result. Executing an explicit, 8-wide vector instruction present in the instruction stream is a different approach to achieving parallel execution than a superscalar processor implementation that dynamically inspects a sequence of eight instructions in the instruction stream, determines that they are independent, and then executes them in parallel on eight execution units.

truenorthwest

When optimizing code, do compilers attempt to "superscalarfy" the code when choosing how to translate a high level language into assembly? If so, would the compiler also be taking into account the number of available execution units on the underlying hardware?