Previous | Next --- Slide 6 of 63

felixw17

I forget whether this was mentioned in lecture, but I think it'd be great to make the observation that the actual computation of the sinx function (the computation being performed by the inner loop) has approximately the same runtime for each input element x[i]. This means that an 8-way parallelization through SIMD/ISPC will actually achieve an 8x speedup, since no amount of vector operations are wasted. This idea is seen very clearly in the first assignment. For example, even though the sqrt function can be parallelized, it can fail to achieve an 8x speedup when the 8 different calculations being performed take very different amounts of time.

Please log in to leave a comment.