Previous | Next --- Slide 78 of 82
Back to Lecture Thumbnails
ufxela

how is atomic add implemented? How many clocks does an atomic add take?

nickbowman

This is not reasonable CUDA code to run on a single code GPU that only has resources for one thread block per core. If thread block 0 runs first then there are no issues – the value gets properly incremented/set before thread block 1 runs and evaluates its while loop condition to be complete. However, if thread block 1 runs first, then we have a major problem because it will enter its while loop and never exit. Because threads in CUDA are not preemptible, thread block 0 will never get a chance to run and set the flag, causing thread block 1 to be stuck in its while loop forever.

Please log in to leave a comment.