Previous | Next --- Slide 15 of 50

jxx1998

So is it safe to say that this parallelization paradigm is never efficient, both in a CPU context and in a GPU context?

trip

That's a good question. I think it's tricky, because we saw later on that this seemingly-work inefficient algorithm can actually perform better on chips that allow for higher workload due to its smaller span. I think it's conditional on both your chip capabilities and the amount of work you're processing.

kayvonf

Correct, in a few slides I show that on a SIMD machine, a work inefficient algorithm can take fewer instructions than a work efficient one, because the work-efficient ones leads many vector lanes idle.

So asymptotic efficiency is not the whole story when working with finite sized N (in particular small N).

Please log in to leave a comment.