Previous | Next --- Slide 28 of 88

kayvonf

A fun, and very useful site for performance programmers.

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Drew

I am wondering if there is a common way to leverage SIMD processors as a programmer without such intense intrinsics. As I'm writing this comment, I am realizing that languages can provide syntax (similar to the forall loop from slide 24) to show that the same instructions are being run on vectorized input. However, I'm still wondering if this is common in practice, or if SIMD pretty much always requires intrinsics like this. Or, do compilers/processors usually just recognize element-wise computations and use SIMD without any additional input from the programmer?

itoen

Related to the question by Drew, but is there a portable way to write SIMD code? It seems that whoever is compiling this code will need to know exactly what type of processor and what its SIMD-width is (8 here) the code will be run.

wanze

I repeat the video recording of this slide for a couple times but I am not sure if I understand in what cases we would want to use the types defined in AVX instrinsics?

For example, is _mm256_mul_ps explicitly telling processor to execute all the _mm256_mul_ps multiplication functions in parallel within this single loop?

kayvonf

@wanze: Use of the _mm256_mul_ps(a,b) intrinsic function tells the compiler to emit a _mm256_mul_ps vector multiply instruction that multiples the 8-wide FP32 vector a with the 8-wide FP32 vector b.

Please log in to leave a comment.