Previous | Next --- Slide 44 of 88

mhchin

While reviewing this slide, I got curious about what the actual stalls would take, and here is the one of google search results I got. https://stackoverflow.com/questions/4087280/approximate-cost-to-access-various-caches-and-main-memory

Core i7 Xeon 5500 Series Data Source Latency (approximate) [Pg. 22]

local L1 CACHE hit, ~4 cycles ( 2.1 - 1.2 ns ) local L2 CACHE hit, ~10 cycles ( 5.3 - 3.0 ns ) local L3 CACHE hit, line unshared ~40 cycles ( 21.4 - 12.0 ns ) local L3 CACHE hit, shared line in another core ~65 cycles ( 34.8 - 19.5 ns ) local L3 CACHE hit, modified in another core ~75 cycles ( 40.2 - 22.5 ns )

remote L3 CACHE (Ref: Fig.1 [Pg. 5]) ~100-300 cycles ( 160.7 - 30.0 ns )

local DRAM ~60 ns remote DRAM ~100 ns

Note that there are differences in L3 CACHE hits, depending on the previous operations and whether a line is shared or not. From network standpoint these numbers are very small, so I guess it make sense to run locally if we need delay below hundreds of nanoseconds.

Please log in to leave a comment.