/avatar.png

Yewentao's Blog

Llama_index Source Code Analysis(1)

This blog introduces the basic concepts of RAG and further demonstrates the RAG process based on the source code interpretation of llama_index, including data loader, transformation, index, query, etc. In addition, this paper also analyzes the performance of llama_index RAG process and gives corresponding optimization suggestions.

Llama_index Source Code Analysis(2)

This blog introduces the basic concepts of RAG and further demonstrates the RAG process based on the source code interpretation of llama_index, including data loader, transformation, index, query, etc. In addition, this paper also analyzes the performance of llama_index RAG process and gives corresponding optimization suggestions.

Distributed Training Strategy Introduction

This blog post explores advanced parallelism techniques in deep learning to optimize computational efficiency and memory usage across GPUs. It covers Data Parallelism (DP), Zero Redundancy Optimizer (Zero), Pipeline Parallelism (PP) and Tensor Parallelism (TP).

ProxyLab

This document presents a structured approach for constructing a web proxy, supports multi-threading for concurrent request handling, and implements an in-memory cache utilizing a Least Recently Used (LRU) eviction policy.

Demystifying Dtype Promotion in PyTorch

This article offers an insightful look into dtype promotion in PyTorch, explaining how different data types are handled during tensor operations. It covers the fundamental rules of dtype promotion, the specifics of how scalar values are integrated into tensor operations, and the role of TensorIterator in computing dtypes.

Exploring Structured Kernel and Tensor Iterator in PyTorch

This article provides an in-depth examination of the Structured Kernel and TensorIterator in PyTorch, key components for optimizing tensor operations. We will delve into the implementation aspects, including op declaration, meta and impl steps in Structured Kernel, and the construction and computation processes in TensorIterator.

Pytorch Cuda Streams Introduction

This article explores the basic concepts of Cuda Stream, parallel execution, and multi-GPU synchronization strategies. We analyze the advantages of using multiple CUDA streams and how to ensure task synchronization through Cuda Event, utilizing Cuda streams to optimize program performance.