Slide View : Parallel Programming

Previous | Next --- Slide 57 of 87

dkumazaw

I'm not quite convinced why having fewer threads necessarily means low latency hiding ability. I suppose having a larger cache space per thread implies there is less chance of cache miss. So yes, we won't have as many threads to be interleaved with when a current thread gets stalled by memory access, but don't we also gain from a higher likelihood of not having to fetch data from main memory (i.e. less chance that the thread gets stalled)? Is the implicit assumption that the relative loss from having less threads to be interleaved with is larger?

kayvonf

@dkumazaw: Your description above is correct, but what you are saying is that you can envision a situation where fewer threads and caches result in fewer stalls due to memory latency. Caches don't hide latency, they reduce it. (Let's ignore the fact that a large cache line effectively prefetches all the elements in a line.)

However, you'll need a workload with temporal locality for the system to realize benefit from the caches. For example, a workload like the sinx() running example from this class has plenty of parallelism but zero temporal locality in its accesses to the input and output arrays.

Multi-threading hides latency of one thread with work done by others. So more hardware threads definitely yields more latency hiding. However, as you suggest, it's possible to have workloads that do better running on a system with fewer threads and big caches, since the caches reduce latency, and thus there's no latency to hide.