kevtan

To answer the questions on this slide:

I don't think CUDA is a data-parallel programming model. From my understanding, data-parallel programming models include constructs like foreach or something that allows the programmer to think in terms of parallel operations. As far as I know, CUDA C/C++ doesn't have these kinds of features.
I think CUDA has support for both the shared address space and message passing model. For the former, the __shared__ attribute in front a variable makes it accessible between all the threads in a block. For the latter, I found this documentation that makes explicit reference to message passing.

bmo

I think CUDA threads are similar to ISPC instances (gang) and CUDA blocks are similar to ISPC tasks.

haiyuem

@kevtan I think CUDA is a data-parallel programming model. It uses a similar programming syntax as "foreach" to launch parallel hardware threads (SIMT - but similar to SIMD as Kayvon said in lecture). The process of passing the input and output matrix to a function seems like a "foreach" statement. For example, "add<<<1, 256>>>(N, x, y);" in the CUDA programming guide here: https://developer.nvidia.com/blog/even-easier-introduction-cuda/

viklassic

Also answering the questions on this slide: I think CUDA is a data-parallel programming model because its purpose is to run programs on the GPU using concepts such as SIMT. In addition, I think that CUDA is an example of a message-passing model in the sense that if you are trying to use CUDA to run your program on the GPU, you must pass your data to the GPU using CUDA from your local computer. Similarly, CUDA is used to return the results computed on your GPU back to your computer. However, I also agree with @kevtan about the shared attribute in that CUDA also supports some form of a shared address space model within the GPU for thread blocks.

rosalg

I would say CUDA is a data parallel programming model because it runs a prescribed process on vast numbers of processors, with users having the ability to define the process.

donquixote

The difference between a pthread and a CUDA thread seems to be subtle. For the similarities, both types of threads get an execution context on hardware and represent a single thread of control. The difference seems to be that a pthread runs on a CPU core and there is no construct that runs many threads in a SIMD manner on a CPU. Instead, one thread on a CPU can run SIMD instructions utilizing many ALUs / one SIMD-wide ALU, but it's all within one thread. In contrast, GPU hardware can figure out if the instructions on many CUDA threads are the same, and if so, it can run them all concurrently in 1-2 cycles as part of one warp. They are run in a SIMD manner here, but each thread is still running its own instruction stream -- with its own associated execution context. It just happens that all those instructions streams are the same instructions operating on different data!