Previous | Next --- Slide 18 of 62
Back to Lecture Thumbnails
haiyuem

For each element of C, we need to access entire row of A and entire column of B. Since these are too big and cannot fit into cache, every time we reuse an element in A/B, we need to re-fetch it.

blipblop

The matrix-matrix multiplication algorithm fundamentally has high arithmetic intensity. We load and store O(N^2) data, but need to do O(N^3) work, so fundamentally, we do O(N) arithmetic ops per output element. And if the entire matrices A and B both fit in cache, that is indeed the case, since we only ever need to load elements of A and B from memory once, and repeated accesses to the same elements are cache hits. But as we discussed in class, for large matrices this algorithm suffers from cache thrashing and thus has low arithmetic intensity in practice. So it seems "arithmetic intensity" depends on both properties of the algorithm and on the underlying system implementation.

Please log in to leave a comment.