Previous | Next --- Slide 12 of 88
Back to Lecture Thumbnails
fizzbuzz

if we don't have ILP in our program but do have coherence, can we use one of the fetch/decode units and issue SIMD anyway?

suninhouse

What is out-of-order control logic responsible for? Switching/managing between threads cached in execution context? Does this out-of-order control logic require its own (small) part of dedicated hardware on the chip?

wooloo

@suninhouse, if I remember correctly, yes and yes, because Kayvon pointed out the areas indicated in black on slide 13 as used in out-of-order execution.

l-henken

@suninhouse Using the word thread here is misleading. ILP is a property of a single instruction stream (a single thread) that allows the processor to run multiple instructions. The out of order control logic is used to determine if the current single instruction stream (read thread) has any exploitable ILP, and then arrange for the independent instructions to be run in their own fetch/decode => exec => context path.

l-henken

@fizzbuzz SIMD is not something issued by multiple fetch-decode stages. The notion of "single instruction" means that a single instruction is fetched and decoded, and then dispatched to >1 execution units (ALU, etc). If you recall the vectorized instructions in lecture (__m256 or something), these are examples of SIMD. Each is a single instruction in the overall instruction stream. Each gets fetched/decoded as one instruction, but then that instruction is propagated to multiple ALUs. So even if the stream is coherent, an idle fetch/decode stage wouldn't really get us anything. I assume there could be a way to utilize the extra ALU in the case of minimal ILP, but I don't know how chips do that.

kevtan

@suninhouse @wooloo I don't think the out-of-order control logic is in charge of implementing hardware-level multithreading. This slide is under the section where we talk about superscalar processors, and the out-of-order control logic is (I think) in charge of constructing—at least implicitly—the dependency graph of instructions and discovering independent branches of code that could be executed "out of order". Then, it feeds this information to the fetch/decode units later in the pipeline which will actually fetch those independent branches of instructions and execute them.

kevtan

Also, there is more discussion about this exact topic on slide 14.

Please log in to leave a comment.