Memory Consistency + Implementation Synchronization

Previous | Next --- Slide 42 of 90

suninhouse

To note: between memory fences, any reordering could happen, to allow the memory system to fully optimize for performance.

nickbowman

In this case, a fence is similar to a "barrier" that we say for synchronizing multiple threads (all threads must reach the barrier before any proceed), except it now forces all memory operations issued by a single thread to complete before the thread continues execution/issuance of more memory operations.

gpu

In lecture, it was mentioned that fences can have significant overhead that would make the execution in code with more relaxed memory consistency become slower than code with more strict memory consistency. It was explained that it is challenging to know exactly when and when not to fence code segments, and as a result, the final code could take a performance hit. This is why it's generally recommended to use a synchronization library: experts will have already implemented the complex fencing operations and largely abstracted those calls away from the applications programmer.

Why would fencing impart such a performance hit over strict memory consistency? Would it not be a spectrum with full memory consistency on one end and no memory consistency on the other?

Please log in to leave a comment.