Previous | Next --- Slide 68 of 88

x86tso

Simpler cores: historically, processor designers overly optimistic about instruction-level parallelism, but practicalities of its maximum speedup at ~4-5 instructions, as well as limits on clock speed (due to power and heating), encouraged development of multicore thread-level parallelism and SIMD, which require the software to be specifically coded for parallelism (where as superscalar execution of ILP does not)

SIMD: for programs with high instruction stream coherence (same instruction sequence run on different data), the programmer and/or the compiler (either is referred to as "explicit SIMD") inserts instructions for processor to run same instruction sequence of different data on multiple ALUs

(Interleaved or temporal) Multi-threading: each core keeps track of an execution context per thread -- when one thread is accessing memory, core interleaves execution of other threads to hide (but not reduce) latencies for operations such as memory access. Note that this might increase latency of a single thread, though the overall throughput is also increased.

Load/store of data from/to RAM can be at least two orders of magnitude slower than performing arithmetic, so if fetching amount of data larger than L1 or L2 or L3 cache, have bottleneck.

Please log in to leave a comment.