Previous | Next --- Slide 20 of 50

icebear101

Why don't we need sync threads between each if? Is it due to the specific condition provided here that the function is executed inside of a wrap (SIMD) so that it is guaranteed that the first if will complete execution before the second if starts?

pmp

So is the work still O(N log N)? But because in reality we can run lots in parallel, then there is an important scale factor (e.g. 1/16 N log N) that disappears when we use Big-Oh notation.

mziv

@icebear101 Yes, I'm pretty sure this is 32-wide SIMD computation so we know that every line will be completed in parallel.

suninhouse

So a more general way would be to have barrier between every if, e.g., when the number of elements is larger than one SIMD computation? Or we break down into multiple segments that can be computed by one 32-wide SIMD computation as on the next slide.

sagoyal

@icebear101 I think this is due to the volatile keyword used to define *ptr since this makes sure that change made in memory is visible to all other threads. Also we don't need sync threads since that is only necessary when we have multiple warps, in this case we are handling this function call on a per warp basis.

sagoyal

Also I think that this function could easily return the inclusive scan instead of exclusive if we just replaced the last if condition for (lane >0) and just returned ptr[idx]?

Please log in to leave a comment.