If we fuse those three computations, we don't need to store and load the temporary variable twice and hence increase the arithmetic intensity.
It is also a way to implement and improve parallelism
Please log in to leave a comment.
If we fuse those three computations, we don't need to store and load the temporary variable twice and hence increase the arithmetic intensity.