jyeung27

The stalling that is on the screen is to help prevent a backlog of work (like the loads of laundry waiting to dry) because the memory bus is fully utilized and is the limiting factor in executing operations.

Drew

We are limited here by bandwidth from memory, so if we increased the ratio of math operations to load instructions, we would potentially no longer be bandwidth limited because we could get closer to matching the time the memory bus is occupied to instruction time. So, utilization would go up.

suninhouse

One basic question: why does the "Stall!" happen before loading the instructions instead of following the instructions?

Drew

suninhouse - I think it's because just like the laundry analogy, we want to not have a huge queue of load requests (wet laundry) piling up in front of the memory bus (dryer). By waiting to make the load request, we ensure this queue is bounded in size.

swkonz

I think you can tell from the diagram that the memory bus is fully utilized since the there are stalls being inserted when our arithmetic instructions need to wait for memory to arrive in order to continue

thread17

I think we can see that memory is fully utilized for each load so another memory access will need to wait for the previous one to complete.

One question that comes to my mind is that do processors in reality balance between sharing the memory bus between several load instructions so that each of them are making progress at a slower rate or it prioritizes specific instruction that occupies the memory bus at full.

wzz

In the diagram as provided, how can we execute the first two "add" instructions here before the first "load" finishes fetching memory?

wanze

I might be missing something but I am not sure if I understand the point of the question here.

It seems to me that the question here is suggesting if the ratio of math instructions to load instructions is increased, then we would have a better processor utilization because there will be shorter stalls. But isn't the time we need in total will still be always the same?

danieljm

@wanze I think the point is that, as you said, the total time is the same but the number of math instructions has increased so we are improving the percentage of that time which is actually spent executing instructions instead of waiting for the memory bus.

bayfc

Does the statement that the occupancy of the memory bus equals the size of the cache line divided by the memory bus bandwidth imply there is no relevant constant time required for data to be fetched from memory?

gpu

To confirm: the actual time taken for the "add" operations is not shown here, right? Wouldn't the add instructions after a load ostensibly require the value from load to have arrived in the register?

haofeng

If the ratio of math instructions to load instructions is increased, the processor takes more time to compute after the data is loaded, so the processor utilization increases. If the math instructions takes significantly more time than load instructions, it is likely that memory won't be fully utilized and thus memory utilization will decrease.

xhe17

I think we could see that this instruction stream is memory bounded since there is no waiting period between two adjacent memory loading. (i.e. the "blue squares" are "continuous" in terms of time.)

tp

@wzz I'm pretty sure this diagram is missing some parts of the instruction pipeline, since normally it would be impossible to perform operations on data before it has been loaded from memory. However, some of the first steps of the pipeline can be executed before the required data has been loaded, so as the instruction fetch that actually determines that the instruction depends on memory that must be loaded.

msere

Why doesn't load miss 1 stall like the later loads? Is it to still allow instructions to still operate after the load, up to a certain point?

Ethan

@msere Just by observing the pattern, seems like the implementation here is okay to issue an instruction when a send request to memory is executed. Hence there would be no stall of instructions between load miss 0 and load miss 1.