About the Course
Module 1: Parallel Computing Architectures
Advanced CPU/GPU/TPU architectures (SIMD, MIMD, vectorization)
Cluster design (InfiniBand, high-throughput networking)
Performance benchmarking (FLOPs, latency, scalability)
Module 2: Parallel Programming Models
MPI-4 standards (non-blocking communication, one-sided RMA)
CUDA optimization (kernel fusion, memory coalescing)
Hybrid programming (OpenMP + MPI for multi-node GPUs)
Module 3: Large-Scale Simulations
Distributed memory algorithms for CFD/FEA
Quantum computing integration with HPC workflows
Fault tolerance in exascale systems
Hands-on Labs:
Optimizing matrix multiplication on GPU clusters
Benchmarking HPC workloads on Slurm-managed systems
Your Instructor
Camilla Jones

This is placeholder text. To change this content, double-click on the element and click Change Content. To manage all your collections, click on the Content Manager button in the Add panel on the left.