CUDA Memory

#CUDA

CUDA is widely used in deep learning. Though many of deep learning professionals are not exposed to CUDA directly, most people are already using CUDA as frameworks like PyTorch are providing GPU support through CUDA.

To optimize the computational efficiency of our models, knowledge about the data transfer inside the devices is crucial. In this note, we build up the fundamentals of memory transfer for CUDA.

Segmented Memory and Paged Memory

CUDA Can not Use Paged Memory

A CPU host uses paged memory. However, GPU can not directly take data from paged memory on the host¹. Before accessing the data, CUDA has to pin the memory so that the memory is page-locked². Pinned memory stays on the physical memory and won’t be moved to secondary memory so that GPU doesn’t need CPU to page-in/out memory.

Harris M. How to Optimize Data Transfers in CUDA C/C++. In: NVIDIA Technical Blog [Internet]. 5 Dec 2012 [cited 19 Oct 2022]. Available: https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/

Pinned Memory is Fast

I took two screenshots from a video by CoffeeBeforeArch.

Unpinned Momory
CoffeeBeforeArch. CUDA Crash Course (v2): Pinned Memory. YouTube. 2019. Available: https://www.youtube.com/watch?v=ShT7raBPP8k

Why don’t we pin memory all the time in PyTorch DataLoader

The DataLoader in PyTorch provide the option pin_memory. By default this option is set to False. It is tempting to set this to True all the time.

However, memory pinning also takes time and computing capacity, and may cause issues³⁴.

Planted: 2022-10-19 by L Ma;

References:

Dynamic Backlinks to cards/machine-learning/practice/cuda-memory:

Pytorch Data Parallelism

Data parallelism in pytorch

L Ma (2022). 'CUDA Memory', Datumorphism, 10 April. Available at: https://datumorphism.leima.is/cards/machine-learning/practice/cuda-memory/.