In practice, what kind of performance advantages do we get by using this direct implementation, as opposed to a highly optimised matrix multiplication? Obviously the lecture explained the linear improvements in memory footprints, but how does this translate to real-world speed?
In practice, what kind of performance advantages do we get by using this direct implementation, as opposed to a highly optimised matrix multiplication? Obviously the lecture explained the linear improvements in memory footprints, but how does this translate to real-world speed?