Systems for ML
Published:
CSE 599W: Systems for ML by Tianqi Chen, Haichen Shen, and Arvind Krishnamurthy at University of Washington.
Lecture 1: Introduction to Deep Learning
Not about Learning aspect of Deep Learning (except for the first two); System aspect of deep learning: faster training, efficient serving, lower memory consumption.
Lecture 3: Components Overview of Deep Learning System
Typical Deep Learning System Stack
User API: Programming API; Gradient Calculation (Differentiation API)
System Components: Computational Graph Optimization and Execution; Runtime Parallel Scheduling
Architecture: GPU Kernels, Optimizing Device Code; Accelerators and Hardwares
Lecture 4: Backprop and Automatic Differentiation
Numerical differentiation
- Tool to check the correctness of implementation
Backpropagation
- Easy to understand and implement
- Bad for memory use and schedule optimization
Automatic differentiation
- Generate gradient computation to entire computation graph
- Better for system optimization
Lecture 5: Hardware backends: GPU
Tips for high performance
- Use existing libraries, which are highly optimized, e.g. cublas, cudnn.
- Use nvprof or nvvp (visual profiler) to debug the performance.
- Use high level language to write GPU kernels.
Lecture 6: Optimize for hardware backends
Optimizations = Too Many Variant of Operators
- Different tiling patterns
- Different fuse patterns
- Different data layout
- Different hardware backends