Slide View : Parallel Programming

Previous | Next --- Slide 49 of 87

belce

I know that in this case, there is a tradeoff between preventing the core from sitting idle (throughput) and the average (or overall) time to completion for each thread running. However, I don't understand why we are distinguishing between latency hiding vs latency reducing. Can someone help explain?

truenorthwest

I believe the distinction is that by interleaving the threads, we are able to "hide" the latency from the core by allowing it to continue doing meaningful work. We are not reducing the latency, because the time it takes to fetch that original data has not physically changed.

kayvonf

@truenorthwest. That is correct.

In general, let's go back to first principles. The overall goal is to minimize processor stalls, since if a processor stalls it's not doing useful work, and thus it is not running at peak efficiency.

One way to reduce stalls is to reduce the time it takes to perform long latency operations, such as a memory fetch. A cache is one technique to reduce the time to fetch a value from a memory address, and thus caches can reduce the number of cycles a processor is stalled waited on data from memory. Caches are effective when a process exhibits high temporal locality in its data accesses -- that is, it accesses the same memory address multiple times in a short period of time.

Hardware multi-threading is another technique for preventing processor stalls due to waiting on memory. However, instead of reducing the latency of a memory access, hardware multi-threading provides the opportunity for the processor to do other useful work when a high latency operation, such as a memory access, is taking place. No stall occurs, even though the latency of memory access remains high. It's common to say multi-threading is a mechanism to "hide memory latency", but what we really mean is that multi-threading is a mechanism to avoid high memory access latency result in processor stalls. In other words, the high latency of memory access is still there, but its effect on performance is "hidden".

Hardware multi-threading can reduce stalls even if a program does not have temporal locality, but using this technique comes at the cost of needing to have two threads of work available in a program (the programmer must expose additional concurrency). There are also hardware costs such as the need to maintain execution contents for two threads on chip at once.