Memory Consistency + Implementation Synchronization

Previous | Next --- Slide 10 of 90

haiyuem

It is called "false sharing" because the processors are not actually sharing anything. It is because they happen to write to the same cache line.

nickbowman

Kayvon's key takeaway in the chat for this series of slides on false sharing is the distinction that cache coherence works on the granularity of caches lines (64 bytes) and not on the granularity of ints (4 bytes). This major difference is what can be the cause of a lot of artifactual communication in a seemingly normal memory access pattern, which is why programmers must be aware of the specifics of the implementation of the underlying caches on a chip.

nickbowman

Also there was the note made in class that there is no performance impact when multiple cores are reading different data from the same cache line, due to the "multiple reader" component of the SWMR protocol. The false sharing issues only arise when different cores are writing to different data that is part of the same cache line.

potato

Clarification question: "The same cache line" refers to the same address chunk, and not just to any two addresses that happen to map to the same place in the cache? (ie. different tag but same index/offset)

kayvonf

@potato. Correct. When I write "cache line" here, I am referring to a cache line sized chunk of the address space. On x86, thats 64 bytes.

andykhuu

The key take away I got from this portion of the lecture is that false sharing arises as a problem due to the existence of a cache. Similar to how cache coherency would not be a problem without any caches, false sharing as a negative impact on perf would also not exist without the use of caches. A common scenario in which false sharing may occur and hinder performance would be the manipulation of something like a partial result array whose elements are uniquely modified by their corresponding thread. False sharing can occur in this scenario if the entire partial result array is stored within one cache line which means that the multiple threads modifying the array would constantly be contending for the exclusive state to modify the resource hence causing perf slowdown as they invalidate one another needlessly.

tspint

Another real example use case where false sharing could occur is you could have a global array of "statistics" for each thread (think an array of structs, where one entry contains statistics for one thread). If these structs are not cache-aligned (a multiple of 64 bytes on x86), then there could be noticeable performance hit from threads that access their own stats struct that happen to be on the same cache line. An easy fix for this is to cache-align the size of the struct for each thread.

Please log in to leave a comment.