Previous | Next --- Slide 59 of 82
Back to Lecture Thumbnails
blipblop

64KB registers per sub core, each sub subcore has 16 warps, and each warp has 32 threads, this means that each thread has 128 bytes of private memory?

rosalg

After pondering for quite a while, I came to this conclusion about the connection between CUDA threads, CUDA thread blocks, warps and execution contexts:

Warps are a part of each core and each contain a certain number of execution contexts (32 in this case). Each warp can have a thread mapped to it, but instead of mapping individual threads, we map threads in thread blocks, since that allows us to create a concept of shared thread block memory. We map our thread blocks to warps.

Ethan

A sanity check: for a single V100 SM, can it support up to 64 warps * 32 = 2048 cuda threads? And on extreme cases you can map kernel<<<64 numBlocks, 32 threadsPerBlock>>> on a single SM?

pintos

This slide shows shared memory and L1 cache storage to be a single unit. Does that mean that a program that uses more of the shared memory will have a smaller L1 cache?

Please log in to leave a comment.