kayvonf

A good opportunity for comments is to try and define these terms in your own words.

fizzbuzz

memory latency: the amount of time it takes to get data from memory to the processor memory bandwidth: the amount of data we can get per fetch from memory bandwidth bound application: an application that requires high memory bandwidth to execute efficiently

kaiang

Bandwidth bound application: Programs can take a long time to execute, and I imagine a "diagnostic tree" of reasons for this to occur. At the top level, I've heard people often use the broad categories of "compute-bound" and "I/O-bound." In the latter category, the program may spend a lot of time trying to read/write memory, e.g. because each request takes a long time or there are many requests being made. A bandwidth bound program exhibit the latter characteristic: a single request can be fast, but we are asking for much more memory than we can actually access concurrently.

yonkus

SIMD execution: Single Instruction, Multiple Data. As I understand it, this is a form of parallel computing that requires the same set of instructions or operations being performed on multiple data points. As shown in previous lecture slides, this is incompatible with conditional execution because it violates the "single instruction" aspect- with conditional execution, we have multiple sets of possible instructions.

thread17

Simultaneous multi-threading: When multiple threads can issue independent instructions in one cycle to be processed by the processor. This can be achieved by having the ability to fetch multiple instructions from threads and having a memory context large enough to store data used by multiple threads at the same time. The processor also needs to be superscalar so that it can identify independent instructions to be executed.

Interleaved multi-threading: When instructions from multiple threads are interleaved and the core chooses an instruction from one thread to be executed by the ALUs in one clock cycle. The core can then choose to execute an instruction from a different thread in the next clock cycle.

kostun

arithmetic intensity: how much a core is using its ALU(s) (doing computations on data which is already available) vs. how much the core is fetching new data.

pslui88

Coherent control flow: In general, control flow refers to the flow of instruction execution as the program runs. Coherent control flow is a type of control flow where the same instruction sequence applies to many data elements. Another way to think of it, is the control flow of a program that doesn't contain a lot of if statements, which would create branches where only the portion of code in the branch taken would be executed, rather than all.

pslui88

To extend my previous comment, coherent execution is necessary for SIMD to work efficiently because the point of SIMD is to apply a single instruction to multiple data simultaneously. If you had in-coherent control flow with many branches, each dealing with different data elements, then as the control flow takes one branch, it would only work with a portion of the data elements, not all together -- SIMD would not work well here.

rosalg

Going to try and say each meaning in my own words without looking back at notes:

Instruction Stream: Stream of instructions that is fetched and decoded by a processor and performed. Multi-core Processor: A processor with multiple cores (or multiple workers to do work) SIMD execution: Single Instruction, multiple data where multiple ALU can perform the same instruction at the same time on different data. Coherent Control flow: A property of a program where the same instructions are ran on different data Interleaved multi-threading: When waiting for memory (stalling), you perform work on something else Simultaneous multi-threading: Working on multiple tasks at the same time (multi core) Memory Latency: The amount of time before a memory request gets served by memory Memory Bandwidth: The rate memory can provide data to a processor Bandwidth bound application: An application that can't speedup anymore because there's not enough bandwidth (data can't be sent to memory fast enough) Arithmetic Intensity: How many mathematical operations one must do in code relative to however much memory you need to do those operations.

jle

Really like @rosalg’s idea and am trying it on my own too:

Instruction stream: stream of instructions to be fetched and decoded and performed by the processor; Multi-Core Processor: a processor with multiple cores where each core fetches/decodes, has its execution units and own execution context. So each core runs its own instruction stream; SIMD execution: with more execution unit, the processor can parallelize the instructions within the processor. So this allows multiple instructions to be performed at once, and it’s particularly helpful with coherence; Coherent control flow: when instructions all share lots of the same available data; Hardware multi-threading: multiple threads being used within one core; interleaved multi-threading: a way to hide memory latency, other threads run when one thread stalls; simultaneous multi-threading: when multiple threads are executing at once; Memory Latency: the amount of time it takes to request memory; Memory bandwidth: rate at which data memory is transferred between a system and a processor; Bandwidth bound application: when there’s a limitation of a low bandwidth rate compared to the speed of everything else, particularly when there’s lots of requesting data; arithmetic intensity: relationship of number of math operations over data access operations (in an instruction stream)

yayoh

A bandwidth bound application is one in which the the amount of time taken to fetch data from memory is the application's bottleneck. Improvements to computational efficiency won't benefit bandwidth-bound applications.

bayfc

Arithmetic intensity is the ratio of mathematical operations to memory fetching operations in a computer program. Because memory operations are so much slower than arithmetic ones, both because it can take dozens of cycles to get a response to a memory access (memory latency) and because the rate at which data can be accessed by the processor (memory bandwidth) is less than the rate at which data can be acted on by ALUs, programs with low arithmetic intensity will likely fail to take advantage of much of the hardware resources that are available to them. They are likely to become bandwidth bound, meaning that their execution time is limited by the ability of the memory system to provide them access to data. Interleaved multithreading can help address some of these issues by allowing other threads to make progress while a thread is waiting on data accesses.