Here the warp-level and block-level pictures are algorithmically accomplishing similar subtasks right? We aren't switching back and forth between different strategies for the two scales?
kaiang
If there were more blocks than the number of elements that can fit in a block, would we do this recursively?
If this were a larger application where each block was a machine, how would we handle failures in single machines?
suninhouse
Block here does not mean thread block, and it is instead of block of the elements to scan.
Here the warp-level and block-level pictures are algorithmically accomplishing similar subtasks right? We aren't switching back and forth between different strategies for the two scales?