Back to Lecture Thumbnails
jxx1998
trip
That's a good question. I think it's tricky, because we saw later on that this seemingly-work inefficient algorithm can actually perform better on chips that allow for higher workload due to its smaller span. I think it's conditional on both your chip capabilities and the amount of work you're processing.
kayvonf
Correct, in a few slides I show that on a SIMD machine, a work inefficient algorithm can take fewer instructions than a work efficient one, because the work-efficient ones leads many vector lanes idle.
So asymptotic efficiency is not the whole story when working with finite sized N (in particular small N).
Please log in to leave a comment.
Copyright 2020 Stanford University
So is it safe to say that this parallelization paradigm is never efficient, both in a CPU context and in a GPU context?