/avatar.png

Wentao's Blog

Pytorch Cuda Streams Introduction

This article explores the basic concepts of Cuda Stream, parallel execution, and multi-GPU synchronization strategies. We analyze the advantages of using multiple CUDA streams and how to ensure task synchronization through Cuda Event, utilizing Cuda streams to optimize program performance.

Overview of PyTorch Distributed Training

This document provides a comprehensive overview of distributed training capabilities within PyTorch. Covering the core components of torch.distributed, it delves into Distributed Data-Parallel Training (DDP), RPC-Based Distributed Training, and Collective Communication (c10d).

ShellLab

In shell lab, we’ll become more familiar with the concepts of process control and signal by writing a simple Unix shell program that supports job control. Source: [https://github.com/yewentao256/CSAPP_15213/tree/main/shelllab]