Performance Optimization II: Locality, Communication, and Contention

Previous | Next --- Slide 79 of 92

cmchiang

If there is not enough processors, each processor's working set might be bigger than the per-processor cache. If we keep iterating the data in a loop, when we reach the end of the data, the beginning of the data has been evicted from the cache. This cause cache miss in each iterations. That is why we can have super-linear speedup with enough processors.

Please log in to leave a comment.