Previous | Next --- Slide 42 of 45

fzh

The left program has worse performance due to false sharing: different element of the array counter may reside in the same cache line.

sagoyal

I was having a hard to time understanding how this content overlapped with what Kunle was talking about with shared memory bank conflicts, and I realized in CUDA these two issues can actually overlap. This video (at 11:47) provides a good explanation about how padding can reduce bank conflicts with padding.

Please log in to leave a comment.