Back to Lecture Thumbnails
haiyuem
blipblop
The matrix-matrix multiplication algorithm fundamentally has high arithmetic intensity. We load and store O(N^2) data, but need to do O(N^3) work, so fundamentally, we do O(N) arithmetic ops per output element. And if the entire matrices A and B both fit in cache, that is indeed the case, since we only ever need to load elements of A and B from memory once, and repeated accesses to the same elements are cache hits. But as we discussed in class, for large matrices this algorithm suffers from cache thrashing and thus has low arithmetic intensity in practice. So it seems "arithmetic intensity" depends on both properties of the algorithm and on the underlying system implementation.
Please log in to leave a comment.
Copyright 2020 Stanford University
For each element of C, we need to access entire row of A and entire column of B. Since these are too big and cannot fit into cache, every time we reuse an element in A/B, we need to re-fetch it.