Memory Consistency + Implementation Synchronization

Previous | Next --- Slide 56 of 90

nickbowman

An important concept that is illustrated on this slide is that the core that owns the lock is not the processor that has the cache line corresponding to the lock's memory address. In particular, since test-and-set is an atomic operation that has a potential for a write, an attempt to acquire a lock that is already held by another core is simply treated like a normal write by the cache coherence protocol (results in the send of a BusRdX). This means that on every attempted acquire by cores not currently holding the rock, there is a BusRdX sent that invalidates the cache line for all caches that currently store that line. As illustrated in the slide, this sort of spin lock can result in a lot of unnecessary cache coherence bus traffic!

fizzbuzz

Could one design a cache coherence system where a test and set instruction only tried to upgrade to the shared state on success?

chii

In a poorly-designed system, after the update line in cache if the execution is not transferred to another processor it is possible that Processor 1 will acquire the lock again, and again, leading to a loss of parallelism.

jchen

@fizzbuzz Do you mean request to shared state to test and then upgrade to modify state on success? That seems like it would require splitting the test and set instruction into two parts, since you would need to send two messages on the bus, so it would probably be difficult to define a single atomic instruction to make that work. The idea with test-and-set is that it groups the check and the set into one atomic operation, so for the whole instruction to execute, it would need to have the cache line in the modified state.

sagoyal

@chii I believe that in this model all there processors are running concurrently so I don't think that it's possible for a single processor to keep acquiring the lock over and over again. And for the sake of argument even if the same processor kept acquiring the lock over and over again, in the case the lock was attached to some thread pool, then the thread would be making progress doing good work. So I'm not sure the if the same processor acquiring the lock over and over again would even be a problem.

danieljm

@sagoyal actually I believe that the issue of whether the same processor can keep acquiring the lock over and over again is not inherently solved by having all processors run concurrently. This is a matter of how the lock decides its next owner. If the lock uses a priority-based scheme where one thread has higher priority than the rest, I believe a single thread could potentially starve all other threads (though whether this actually happens in practice is a different story).

Please log in to leave a comment.