Wentao's Blog

Demystifying Dtype Promotion in PyTorch

yewentao published on 2024-01-06 included in category Pytorch

This article offers an insightful look into dtype promotion in PyTorch, explaining how different data types are handled during tensor operations. It covers the fundamental rules of dtype promotion, the specifics of how scalar values are integrated into tensor operations, and the role of TensorIterator in computing dtypes.

Exploring Structured Kernel and Tensor Iterator in PyTorch

yewentao published on 2023-12-03 included in category Pytorch

This article provides an in-depth examination of the Structured Kernel and TensorIterator in PyTorch, key components for optimizing tensor operations. We will delve into the implementation aspects, including op declaration, meta and impl steps in Structured Kernel, and the construction and computation processes in TensorIterator.

Pytorch Compiler Introduction

yewentao published on 2023-10-15 included in category Pytorch

Pytorch Compiler Introduction

Pytorch Cuda Streams Introduction

yewentao published on 2023-10-03 included in category Pytorch

This article explores the basic concepts of Cuda Stream, parallel execution, and multi-GPU synchronization strategies. We analyze the advantages of using multiple CUDA streams and how to ensure task synchronization through Cuda Event, utilizing Cuda streams to optimize program performance.

Overview of PyTorch Distributed Training

yewentao published on 2023-09-17 included in category Pytorch

This document provides a comprehensive overview of distributed training capabilities within PyTorch. Covering the core components of torch.distributed, it delves into Distributed Data-Parallel Training (DDP), RPC-Based Distributed Training, and Collective Communication (c10d).

MallocLab

yewentao published on 2023-09-16 included in category Csapp

In malloc lab, we will implement our own versions of malloc, free, and realloc.

Introducton to Pytorch Broadcast

yewentao published on 2023-09-10 included in category Pytorch

This article introduces the implementation details of pytorch broadcast mechanism, including the forward and backward calculation.

Unraveling PyTorch: Tensor Indexing and Assignment

yewentao published on 2023-08-21 included in category Pytorch

This article dissects PyTorch’s C++ core to uncover the mechanics of tensor indexing and assignment. From translating Python indices to C++ TensorIndex to the nuances of handleDimInMultiDimIndexing, we explore both basic and advanced tensor operations.

Deep Dive to Pytorch AutoGrad(2)

yewentao published on 2023-07-04 included in category Pytorch

This article introduces the implementation details of pytorch autograd mechanism.

Deep Dive to Pytorch AutoGrad(1)

yewentao published on 2023-07-04 included in category Pytorch

This article introduces the implementation details of pytorch autograd mechanism.