Previous | Next --- Slide 17 of 63
Back to Lecture Thumbnails
danieltc

My understanding is that for ISPC, we can summarize the following:

Abstractions: Within the C code, is a function call sinx() that computes sine for every element in the input by spawning a gang of ISPC program instances, where each program instance runs ISPC code. Upon return, all instances have completed and the result is correct.

Within the ISPC function, we use abstractions such as programCount, programIndex, and uniform to explicitly define assignment of inputs to program instances (interleaved vs. blocked), or we can raise the level of abstraction and use foreach to allow for any assignment.

Implementation: The number of instances in a gang is equal to the SIMD width of the underlying hardware. The ISPC compiler generates SIMD vector instructions and performs masking as necessary. This is executed all on one core.

If the program does not specify assignment (e.g. if the program uses forall), then the implementation decides the assignment of inputs to program instances. By default, ISPC uses the interleaved assignment.

Is there anything I'm missing / misunderstanding?