Previous | Next --- Slide 21 of 50

jxx1998

Why are we still calling this "work-efficient" formulation, when it has a higher constant term for the work big-O wise?

MasonLlewellyn

@jxx1998 I think that the "work-efficient" here is referring to the fact that each thread in this case handles fewer elements in the array than the two-processor sequential scan implementation.

weimin

Because of lane masking, even if we implemented the work efficient version, there would still be work done except the result is masked. So in SIMT we do not save work but if we were actually using pthreads then the work efficient version would make sense.

xhe17

My current understanding of the reason that this implementation with run time O(nlog(n)) is better compared to the work-efficient formulation of scan mentioned in page 16 with run time O(n), in this specific problem, is that the data processed here is only 32 bytes wide, which are able to be processed completely parallely due to 32-wide SIMD computation. In more general cases, the work-efficient version would be better.

Please log in to leave a comment.