Previous | Next --- Slide 77 of 88

mziv

I'm not sure I'm quite properly conceptualizing where and how this computation is broken up. Is it that each iteration of the forall loop can be run on a different core, and then the additional ALUs help us with the inner for loop? Or is it that the ALUs let us chunk up parts of the outer for loop (since we'll be executing the same sequence of instructions anyways) and we just divide our forall loop in 4 to utilize the 4 cores?

ufxela

@mziv my guess that since each of the loop iterations is independent, the compiler may be able to bundle 8 iterations (of the outer for loop) together using simd, and put a bundle to each core, so that there are 32 iterations being executed at once. I don't think simd can help us with the inner for loop b/c each iteration is dependent on prior iterations.

Please log in to leave a comment.