fizzbuzz

Another major motivation for caches is that data and (especially) instruction fetches from memory are often very predictable. If you load an instruction you're almost always going to be executing instructions near that one in memory so we can prefetch all of those. If you doing array processing the data you want to work with will be spatially local in memory. If data and instruction accesses were totally random, caching wouldn't be of much use.

atad

How does the system select what data to store in the caches? The instruction prefetching is interesting, but I want to figure out what non-instruction data gets cached. Some of the other slides seem to suggest the entire execution context effectively is the cache, but this slide seems to suggest that they are independent (which is what I would have thought). If I have a load instruction, is that data copied into the all caches and cleared out at different times or is there something else going on?

dishpanda

@atad During a memory request, the first layer of cache will be checked for the corresponding data. If it hits, the data is sent back immediately. If it misses, it triggers a request to the next layer (L2). The process continues up the chain of cache layers until it need to request main memory. When a memory response is received, the corresponding cache line is filled. The system generally has no way of deciding what data is stored in caches and what is not. Everything is treated the same way and is stored in cache layers.

The process of forwarding requests up cache levels until it hits is definitely faster than accessing main-memory, but it still causes the hardware to stall. Given the predictable nature of programs (ex: looping through an array) it makes a lot of sense to pre-fetch a block of data...

Caches can be costly + add area to the chip so they have to stay small. This means that the mapping of address -> cache line will have conflicts. Depending on the cache type (direct vs set-associative vs fully associative), the hardware will decide when to evict the previous cache entry and replace it with the new data. This means that cache entries may be cleared at different times for the different layers.

The concept of caching is not restricted only to data. Virtual address => physical address translations require lookups which can be expensive. The TLB is a hardware cache mechanism built specifically to speed up this translation process.

steliosr

I understand that cache dramatically decreases memory latency, but is it meaningful to talk about cache bandwidth? Is the rate of transfer so quick that we assume it's instantaneous/enough for the CPU to perform computations at maximum rate? Is it something that we would talk more about in a hardware class, or is there an obvious answer I am missing?