Previous | Next --- Slide 4 of 50

zecheng

So, how we get the 12.7 TFLOPS? It seems that (80 * 4 * 16) * 1.245 * 10^9 = 6.37 TFLOPS. Why we need to multiply an extra 2?

itoen

@zecheng It's because mul-add is a floating-point multiplication and addition, which counts as two separate floating-point operations.

blipblop

The red text seems to have typo. I think 163,840 threads per chip is right.

Please log in to leave a comment.