Other processors now will be busy-reading the value of lock which does not invalidate each other's cache line as test_and_set does, so bus contention is reduced.
yayoh
In this implementation, test_and_set is only called if there's a reasonable chance that another processor doesn't the lock. This greatly reduces bus/interconnect traffic stemming from repeated test_and_set calls.
Other processors now will be busy-reading the value of lock which does not invalidate each other's cache line as test_and_set does, so bus contention is reduced.