Previous | Next --- Slide 67 of 82

nickbowman

In this case, because threads in a block are completely independent from one another (no communication required) it is better to schedule the second block onto the empty CPU to take advantage of as many compute resources as possible.

weimin

Within the core, does it further divide the block into 4 warps and interleave the warps on the execution units?

kevtan

Is there a way to upvote @weimin's answer? I'm very curious as well.

blipblop

@weimin On the Nvidia V100 subcore, I think yes, to take advantage of pipelining..But on this fictional simple GPU without the core-subcore-warp structure, I don't think warps are even part of our consideration.

chamusyuan

It was a bit confusing to me before to think whether a thread block which can be broken down into several warps will be put onto the same core or different cores but then I realized a thread block will have a chunk of shared thread-block level memory so they should be put on the same core.

Please log in to leave a comment.