Previous | Next --- Slide 62 of 82
Back to Lecture Thumbnails
Nian

Is there anyone who can explain to me why is the number 163,840 cudo threads/chip? Thanks.

nickbowman

I believe that number comes from the fact that warp is composed of 32 CUDA threads – given we have 5120 warps/chip, multiplying by 32 gives us 163,840 CUDA threads/chip.

anon33

Is it correct to understand, that of these 163840 threads, only 5120 can be executing a fp16 operation at once? (80 SMs * 4 sub-cores / SM * 16-wide fp16 SIMT)

itoen

I think that's correct. Another way to compute that is that we know only one warp out of 32 in each SM is running at any moment, which is (1/32)*163840 = 5120 threads when all the warps are simultaneously choosing to execute fp16 rather than any other operation.

ishangaur

How do they decide the sizes for these various sub categories, like threads per warp, warps per core, etc? It seems like they dumped as much stuff as they could reasonably control at each level of abstraction and then made it powers of two... Looking back at the early lectures where normal programs only had about 4/5-way ILP, are there metrics about at each granularity of memory access and instruction replication how many execution contexts are actually useful? Or is it the other way around that workloads are scaling to the capacity of these GPUs?

Please log in to leave a comment.