Stanford CS149, Fall 2020
PARALLEL COMPUTING

This page contains lecture slides and recommended readings for the Fall 2020 offering of CS149.

(Motivations for parallel chip decisions, challenges of parallelizing code)
Further Reading:
(Forms of parallelism: multicore, SIMD, threading + understanding latency and bandwidth)
Further Reading:
(Ways of thinking about parallel programs, and their corresponding hardware implementations, ISPC programming)
Further Reading:
(Thought process of parallelizing a program in data parallel and shared address space models)
(Achieving good work distribution while minimizing overhead, scheduling Cilk programs with work stealing)
(CUDA programming abstractions, and how they are implemented on modern GPUs)
Further Reading:
(Data parallel thinking: map, reduce, scan, prefix sum, groupByKey)
(Producer-consumer locality, RDD abstraction, Spark implementation and scheduling)
(Definition of memory coherence, invalidation-based coherence using MSI and MESI, false sharing)
(Consistency vs. coherence, relaxed consistency models and their motivation, acquire/release semantics, implementing locks and atomic operations)
(Fine-grained snychronization via locks, basics of lock-free programming: single-reader/writer queues, lock-free stacks, the ABA problem, hazard pointers)
(Motivation for transactions, design space of transactional memory implementations, lazy-optimistic HTM)
(Energy-efficient computing, motivation for heterogeneous processing, fixed-function processing, FPGAs, mobile SoCs)
(Motivation for DSLs, case study on Halide image processing DSL)
(GraphLab, Ligra, and GraphChi, streaming graph processing, graph compression)
(Performance programming for FPGAs and CGRAs)
(Scheduling convlayers, exploiting precision and sparsity, DNN acelerators (e.g., GPU TensorCores, TPU))
(Enjoy your Winter holiday break!)