If there is not enough processors, each processor's working set might be bigger than the per-processor cache. If we keep iterating the data in a loop, when we reach the end of the data, the beginning of the data has been evicted from the cache. This cause cache miss in each iterations. That is why we can have super-linear speedup with enough processors.
If there is not enough processors, each processor's working set might be bigger than the per-processor cache. If we keep iterating the data in a loop, when we reach the end of the data, the beginning of the data has been evicted from the cache. This cause cache miss in each iterations. That is why we can have super-linear speedup with enough processors.