Summary: Attention is all you need

yewentao published on 2025-01-30 included in category Paper_summary

Summary for paper ‘Attention is all you need’

Summary: A New Golden Age for Computer Architecture

yewentao published on 2025-01-25 included in category Paper_summary

Summary for paper ‘A New Golden Age for Computer Architecture’

2024 Technical Notes

yewentao published on 2024-12-31 included in category Technical_notes

Technical notes during 2024.

GPU Puzzles

yewentao published on 2024-12-02 included in category Pytorch

This article provides solutions to the GPU Puzzles by Sasha Rush

Tensor Puzzles

yewentao published on 2024-11-09 included in category Pytorch

This article provides solutions to the Tensor Puzzles by Sasha Rush

Llama_index Source Code Analysis(1)

yewentao published on 2024-06-10 included in category Llm

This blog introduces the basic concepts of RAG and further demonstrates the RAG process based on the source code interpretation of llama_index, including data loader, transformation, index, query, etc. In addition, this paper also analyzes the performance of llama_index RAG process and gives corresponding optimization suggestions.

Llama_index Source Code Analysis(2)

yewentao published on 2024-06-10 included in category Llm

This blog introduces the basic concepts of RAG and further demonstrates the RAG process based on the source code interpretation of llama_index, including data loader, transformation, index, query, etc. In addition, this paper also analyzes the performance of llama_index RAG process and gives corresponding optimization suggestions.

Deep Dive to Pytorch Contiguous Operator(4)

yewentao published on 2024-05-25 included in category Pytorch

This blog covers PyTorch’s TensorIterator, focusing on fast setup and stride calculation for both normal and ambiguous tensors.

Distributed Training Strategy Introduction

yewentao published on 2024-04-13 included in category Pytorch

This blog post explores advanced parallelism techniques in deep learning to optimize computational efficiency and memory usage across GPUs. It covers Data Parallelism (DP), Zero Redundancy Optimizer (Zero), Pipeline Parallelism (PP) and Tensor Parallelism (TP).

Deep Dive into PyTorch Device Copy Operations

yewentao published on 2024-02-13 included in category Pytorch

This blog provides a overview of the different types of Device Copy operations within PyTorch, including Host to Device (H2D), Device to Host (D2H), and Device to Device (D2D) transfers.

Wentao's Blog