I was wondering the same thing and tried to learn about this. I could be incorrect (somebody please correct me) but I think it means that either one multiplication or one addition operation can be computed per clock cycle on each execution unit.
@suninhouse @jgrace It seems that (from the algebra) one mul and one add (= 2 operations) can both be executed simultaneously in one clock cycle. How the ALU achieves this, I am not sure.
Can anybody show the exact calculation procedures for 268 GFLOPs? I'm trying to replicate the numbers but cannot succeed yet.
Is it 4x8x2x4.2 = 268.8 GFLOPs?
I think mul+add refers to the fused multiply-add instruction set in the intel processor. It is the fused multiple-add instruction which performs both a multiplication and an addition in one clock cycle. For instance: _mm256_fmadd_ps(a, b, c) performs: dst[i] = (a[i] * b[i]) + c[i]
Fused multiply-add saves cycles compared to one multiply+one add since multiply and addition share some of the same circuits. If the compiler can find enough multiply-add instructions, the program can run faster.
Please log in to leave a comment.
This may have come up as a question in the lecture but why (mul + add)? Does it mean that per clock mul and add can be executed at the same time?