Previous | Next --- Slide 21 of 62
Back to Lecture Thumbnails
lonelymoon

As described in the next slide, if BLOCKSIZE I is small (extremely smaller than SIMD_WIDTH), we cannot fully utilize all ALUs for SIMD. Thus, with small BLOCKSIZE_I, it is better to use another scheme described in the next slide. By transposing the matrix B, we can still utilize SIMD fully by catching targets along the BLOCKSIZE_K.

Please log in to leave a comment.