kayvonf

Answering these questions here is a good opportunity for student comments. Note that comments support Markdown syntax.

kostun

software POV: ILP has been tapped out. hardware POV: processor clock freqs have stopped increasing. also, the power draw as function of frequency becomes really tough to sustain
running out of ILP

weimin

Self check 1: single-instruction stream is sequential instructions from 1 thread Self Check 2: single-instruction stream by running multiple instructions where there are no hazards or hazards may be bypassed.

tspint

Is the term "thread" that Kayvon used in lecture synonymous with "single-instruction stream"

ishangaur

Self Check 1: The main speedup mechanism of single-instruction stream superscalar architectures was to identify instructions that could be independently run at the same time, to increase the number of instructions completed per clock cycle. However, this only exploits the amount of instruction level parallelism of a user's programs. Realistically, these programs see a plateau in the runtime speedup once the processor can issue 4 or 5 instructions per clock cycle. So even if more transistors were used to build fancier control logic and get more instructions out, most programs wouldn't benefit from the extra capability. The only option is to then speed up the clock. This introduces a problem, because now the same amount of energy used to toggle current on the chip's wires from before needs to be expended faster to keep up with the clock. This raises the chips power consumption, and similarly its heat production. Further innovation here then depended on material science and cooling innovations.

kevtan

There was a comment above that the answer to self check #2 was that ILP tapped out. I don't think that this was what the question was getting at. The reason why we weren't able to obtain the maximum speedup was because of (1) communication/synchronization overhead and (2) unbalanced workloads.

haiyuem

Regarding a comment about whether "thread" = "single-instruction stream" above: I think thread is more of an available resource that can handle one instruction at a time, while the stream is the workload that depends on program input.

Aitous

Could someone please explain again the difference between a single-instruction stream and a thread? From my understanding, a process can have multiple threads so I tend to think of the single instruction stream as being able to juggle between different threads, is it right?

blipblop

Self check 1: Because a) Most programs have limited ILP, superscalar processors do not benefit from being able to issue more than 4 instructions per clock tick. and b) Power density scales superlinearly to clock frequencym and we have become power density limited.

Self check 2: a) Communications overhead and b) Unequal workloads

blipblop

@Aitous from Kayvon's lecture verbatim: "you can think of single instruction stream as instructions in a single thread of program/control". So apparently, one thread corresponds to one instruction stream. But on the other hand, there is SIMT which just seems to be marketing speak for SIMD and is rather misleading. Please correct me if I'm wrong!

donquixote

@Aitous, I think the view that a single instruction stream is able to juggle between different threads is correct. Specifically, it's controlling which threads get to execute their arithmetic operations on an available ALU at any given time.

I'm assuming you mean "processor," not "process." CS110 processes and threads are completely independent from the processors and hardware threads we talk about here. Here's how I'm thinking about it: one instruction stream corresponds to one core, while one thread corresponds to one execution context. I think these correspondences hold regardless of what features exist in the core (e.g. SIMD, simultaneous vs. interleaved multithreading, superscalar, etc.). Let's take a simple bare-bones example of a processor with two execution contexts (two hardware threads), one Fetch/Decode (so 1 instruction per clock), and one ALU (to execute that instruction). The processor receives the instruction stream and issues independent instructions to each of the two threads. At any one point in time (measured in clocks), only one of the two threads can be using the single ALU available. In this way, the threads are pulling in instructions from the instruction stream and managing those instructions in their own execution contexts.

This means threads and instructions streams are not synonymous, and instruction streams are higher than threads in the hierarchy. The slide titled "Review: four SIMD, multi-threaded cores" implies this is not the case ("can switch to processing the other instruction stream when faced with stall"). I'm going to ask about this on Piazza.

donquixote

@blipblop, I think that usage of "thread" is different from the hardware threads we discussed in last class. Here, I think it's being used as more of an English phrase: "thread of control." That makes three usages of "thread"! Unless I'm missing something..

donquixote

@blipblop, as for SIMT, it seems it just refers to SIMD + multithreading, which is distinct from only SIMD. The difference between SIMD and SIMT would be the difference between the "four, 8-wide SIMD cores" slide and the "four SIMD, multi-threaded cores" slide, towards the end of last lecture.

Nian

ILP is porperty of a program, superscalar execution is the technique that computer architecture uses.

rubensan

@donquixote, you explained the difference between SIMD and SIMT very well. By the way, Problem 3 on Assignment 1 goes more into depth about this difference (i.e speedup, resource usage, etc.)!