We would be able to parallelize this by running the iterations of the body of each reduce controller in parallel and then also running the reduce controllers as parallel reduce. We could also choose to pipeline our operations, including loads from DRAM, which would give us some additional parallelism.
We would be able to parallelize this by running the iterations of the body of each reduce controller in parallel and then also running the reduce controllers as parallel reduce. We could also choose to pipeline our operations, including loads from DRAM, which would give us some additional parallelism.