yhgkm

How to interpret this line: "(when one thread has insufficient ILP)" and this line: "Requires additional independent work ... than ALUs)"?

weimin

I think when one thread has insufficient ILP, the superscalar processor tries to execute instructions from another thread.

For the Requires additional independent work, if the program is a process then the threads running at the same time in the different ALU's need to have no data dependencies between them. If the number of independent threads are less than the number of ALU's then some ALU's will be idle.

nickbowman

I can take a crack at trying to explain the first line "(when one thread has insufficient ILP)", but I'm not so sure about the second.

Assume you have a chip that has architecture that enables superscalar execution. I believe the in-class example of this that we did breakout rooms for was a chip with one core, composed of two fetch/decode units, two ALUs, and one large region dedicated to storing execution context. Then, assume you're running a program that has very little ILP (not many instructions can be executed in parallel due to dependencies). Without hardware threads, only 1 fetch/decode unit and one ALU would be getting utilized at a time. By utilizing multiple hardware threads, you could context switch between different threads of execution within the program and have both ALUs being utilized at the same time, even though there is no potential for ILP/superscalar processing in that program.

nickbowman

Oops, looks like @weimin beat me to it!

yhgkm

Thank you but I am still a bit confused as how instructions distributed among ALUs vs. threads. Is it like all threads shared ALUs?

nickbowman

@yhgkm I think that's a good question, I think the definition of a "thread" is still pretty hazy to me because I think we can use it to mean a couple of different things. There's the idea of a hardware thread, which is a single set of execution context and sequential instructions being executed by the core and the idea of a software thread or "program" (which also consists of of a set of sequential operations being executed) and I'm not exactly sure if those map 1:1 in this discussion or not.

l-henken

@nickbowman Hardware and software threads are both abstractions over the architecture of the hardware. The meaning of both depend on the architecture at hand (single core vs multi core, hyperthreaded vs non-hyperthreaded, etc).

Regardless of the architecture, at the highest level of abstraction you have a software thread (a pthread, etc) which is an OS-controlled structure used to encapsulate some computation.

The next level down you have hardware threads. With N non-hyperthreaded (HT) cores, you can say that you have N hardware threads (N possible concurrent threads of execution). If you have N 2-way HT cores, you have 2N hardware threads (2N possible "concurrent" threads of execution). Hardware threads are thought of as logical cores because to the OS, there are 2N possible logical threads of execution. But the hardware controls the context switching that actually creates this abstraction of a "logical" core over the physical core.

There can be more software threads than hardware threads because of OS scheduling and the abstraction it provides. There can be more hardware threads than physical cores because of hyperthreading and the abstraction it provides.

The OS schedules software threads to run on the exposed hardware threads.