Previous | Next --- Slide 35 of 88
Back to Lecture Thumbnails
jxx1998

I actually did not understand the Kayvon's comments on this slide about modifying the if statement to (i == 0). Why does this change allow maximum efficiency?

kayvonf

@jxx1998. I'll pose this as question to the group. In class I asked for a single if/else statement that led to worst case performance. Can someone given an example? (recall we support Markdown code syntax)

lonelymoon

@kayvonf I am not sure if I understood it correctly. I would like to elaborate my understanding and check it with you guys. In today's lecture, from my understanding, the worst case scenario can like this: only one T with seven F and one T takes extraordinary computational time. Due to only one expensive computation on "T", overall process becomes slower. Do I understand it correctly?

l-henken

@jxxx1998 I think of the worst case scenario in terms of what each vectorized lane is doing at any given time. The worst case scenario would be when most lanes are doing useless work (ie, the computation they are doing will not end being stored because their lane needs the other half of the conditional). So doing:

if (i == 0) {
    ... lots and lots of work (1000000 cycles)...
} else {
    ... minimal work (1 cycle) ...
}

would force 7 of the 8 lanes to do useless work for 1000000 of the 1000001 cycles and only one of the lanes to do useless work for 1 of the 1000001 cycles. The ratio of useless work to useful work approaches 1/8.

l-henken

*ratio of useful work to useless work

user1234

Is it possible for SIMD processors to be less efficient than processors with one single ALU? My understanding is that in the worst case scenario SIMD behaves almost the same as single ALU. For example, if the instructions depend on the index, only one ALU is useful for each index. If that's the case, is SIMD default for modern processors? If not, what are the concerns? (power consumption? CPU space usage?)

ipadpro

@user1234 I agree with you that in the worse case scenario, the SIMD approach should have the same runtime performance with one single ALU

I'm not sure about the default but almost all modern CPU has SIMD nowadays, see Intel Instruction Set Extensions Technology as an example

pmp

Would it be right to answer @jxx1998's comment with the following? I separated my reasoning out so it is easier to say if one of my steps is wrong.

  1. All of the ALUs have to be working on the conditional at the same time (unsure about this).

  2. Imagine ALUs 1-7 evaluated the conditional to False and took one cycle to do their job.

  3. Imagine ALU 8 evaluated to True and takes aaaages to do its job.

  4. This means that ALUs 1-7 have to wait a long time without doing useful work until ALU 8 is ready, so that they can move onto the unconditional code.

Please log in to leave a comment.