Back to Lecture Thumbnails
nickbowman
weimin
Within the core, does it further divide the block into 4 warps and interleave the warps on the execution units?
kevtan
Is there a way to upvote @weimin's answer? I'm very curious as well.
blipblop
@weimin On the Nvidia V100 subcore, I think yes, to take advantage of pipelining..But on this fictional simple GPU without the core-subcore-warp structure, I don't think warps are even part of our consideration.
chamusyuan
It was a bit confusing to me before to think whether a thread block which can be broken down into several warps will be put onto the same core or different cores but then I realized a thread block will have a chunk of shared thread-block level memory so they should be put on the same core.
Please log in to leave a comment.
Copyright 2020 Stanford University
In this case, because threads in a block are completely independent from one another (no communication required) it is better to schedule the second block onto the empty CPU to take advantage of as many compute resources as possible.