^As a follow up question, does that mean that each CUDA thread is effectively just a different piece of work (task, for lack of a better word) that will at some point be executed by the processor? Constrained by whatever dependencies exist for those tasks?
@yonkus I think threads are actual hardware on the chip but we can also think of them as "tasks" when some workload is running on them.
Why is the answer to the question "no"? It seems we would have 1 million local variables and per-thread stacks for the 1 million threads and 8K instances of shared variables (the support
variable). Am I missing something?
@donquixote No because threads in a warp are executing on the same set of shared variables, not individual variables per thread. Similar to "uniform" variables in ISPC
Please log in to leave a comment.
To answer the question on the slide, no, running the CUDA program will not simultaneously create 1 million instances of local variables/per-thread stacks. Instead, we will see similar behavior to when we launched ISPC tasks, where each task defines an independent portion of work that can be parallelized by the processor. The processor is then responsible for allocating work to its cores in the optimal way to make use of available hardware resources.