Previous | Next --- Slide 29 of 73

suninhouse

The cost of this approach is 1) need to allocate tmp buffer; 2) more writes and reads from memory.

This approach is also only possible when it is a separable convolution, and for more tutorial on this: https://towardsdatascience.com/a-basic-introduction-to-separable-convolutions-b99ec3102728

jgrace

In this case, we essentially want to reduce the arithmetic intensity because computation is the greater limitation for such computations on high resolution images. Typically so far, we have seen this the other way around where we want to increase arithmetic intensity to maximise throughput.

mziv

The main issue with this strategy is that for big images our caching is going to be much worse; that extra buffer certainly won't fit in cache and thus we're going to have tons more cache misses / have to reload parts multiple times. This idea sets us up for the approach in the next slide.

Please log in to leave a comment.