Practices in Machine Learning

Introduction: My Knowledge Cards

CUDA Memory

Published:
Category: { ML Practice }
Tags:
References: - Mohan A. Pipelining data processing and host-to-device data transfer. In: Telesens [Internet]. [cited 17 Oct 2022]. Available: https://www.telesens.co/2019/02/16/efficient-data-transfer-from-paged-memory-to-gpu-using-multi-threading/ - Harris M. How to Optimize Data Transfers in CUDA C/C++. In: NVIDIA Technical Blog [Internet]. 5 Dec 2012 [cited 19 Oct 2022]. Available: https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/ - Contributors to Wikimedia projects. Memory paging. In: Wikipedia [Internet]. 7 Oct 2022 [cited 19 Oct 2022]. Available: https://en.wikipedia.org/wiki/Memory_paging - Computer Science. Segmented, Paged and Virtual Memory. YouTube. 2019. Available: https://www.youtube.com/watch?v=p9yZNLeOj4s - CoffeeBeforeArch. CUDA Crash Course (v2): Pinned Memory. YouTube. 2019. Available: https://www.youtube.com/watch?v=ShT7raBPP8k - torch.utils.data — PyTorch 1.8.1 documentation. [cited 19 Oct 2022]. Available: https://pytorch.org/docs/1.8.1/data.html#memory-pinning - Mao L. Page-Locked Host Memory for Data Transfer. In: Lei Mao’s Log Book [Internet]. 26 Jun 2021 [cited 19 Oct 2022]. Available: https://leimao.github.io/blog/Page-Locked-Host-Memory-Data-Transfer/ - Gao Y. What is the disadvantage of using pin_memory? In: PyTorch Forums [Internet]. 6 Apr 2017 [cited 19 Oct 2022]. Available: https://discuss.pytorch.org/t/what-is-the-disadvantage-of-using-pin-memory/1702
Summary: Optimizing memory operations for CUDA
Pages: 3

Pytorch Data Parallelism

Published:
Category: { ML Practice }
References: - Wolf T. 💥 Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups. HuggingFace. 2 Sep 2020. Available: https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255. Accessed 19 Oct 2022. - Mao L. Data Parallelism VS Model Parallelism in Distributed Deep Learning Training. In: Lei Mao’s Log Book [Internet]. 23 May 2019 [cited 19 Oct 2022]. Available: https://leimao.github.io/blog/Data-Parallelism-vs-Model-Paralelism/ - Effective Training Techniques — PyTorch Lightning 1.7.7 documentation. In: PyTorch Lightning [Internet]. [cited 19 Oct 2022]. Available: https://pytorch-lightning.readthedocs.io/en/stable/advanced/training_tricks.html#accumulate-gradients - Jia Z, Zaharia M, Aiken A. Beyond Data and Model Parallelism for Deep Neural Networks. arXiv [cs.DC]. 2018. Available: http://arxiv.org/abs/1807.05358 - Li X, Zhang G, Li K, Zheng W. Chapter 4 - Deep Learning and Its Parallelization. In: Buyya R, Calheiros RN, Dastjerdi AV, editors. Big Data. Morgan Kaufmann; 2016. pp. 95–118. doi:10.1016/B978-0-12-805394-2.00004-0 - Xiandong. Intro Distributed Deep Learning. In: Xiandong [Internet]. 13 May 2017 [cited 19 Oct 2022]. Available: https://xiandong79.github.io/Intro-Distributed-Deep-Learning - Mohan A. Distributed data parallel training using Pytorch on AWS. In: Telesens [Internet]. [cited 17 Oct 2022]. Available: https://www.telesens.co/2019/04/04/distributed-data-parallel-training-using-pytorch-on-aws/ - Writing Distributed Applications with PyTorch — PyTorch Tutorials 1.12.1+cu102 documentation. In: PyTorch [Internet]. [cited 19 Oct 2022]. Available: https://pytorch.org/tutorials/intermediate/dist_tuto.html#collective-communication - Getting Started with Distributed Data Parallel — PyTorch Tutorials 1.12.1+cu102 documentation. In: PyTorch [Internet]. [cited 19 Oct 2022]. Available: https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
Summary: Data parallelism in pytorch
Pages: 3