About Me
Contents
Computer Science M.Eng. student at Cornell University with expertise in deep learning, distributed training, and backend development. Former deep learning engineer at SenseTime, with a proven track record of optimizing large-scale LLMs across thousands of chips and contributing to open-source frameworks such as PyTorch. An award-winning developer and two-time startup co-founder, recognized for designing scalable AI solutions and driving innovation in machine learning..
Email: zhyanwentao@outlook.com
/ wy335@cornell.edu
Github: yewentao256
LinkedIn: Wentao Ye
1. Patents
- One-iter Tool (CN117312173A), patented in 2023, reducing model accuracy validation time from hours to minutes.
- Function-Level Task Scheduling Tool (CN115033366A), patented in 2022, streamlining distributed training workflows.
2. Awards
- National Third Prize, China University Computer Contest - WeChat Big Data Challenge, 2021 - Rank 80 / 6,768 teams
- National Third Prize, China University Computer Contest - Huawei Big Data Challenge, 2020 - Rank 27 / 1,491 teams
- National Second Prize, China Service Outsourcing Innovation & Entrepreneurship Competition, 2020 - Top 1% / 6417 teams
- National First Prize, China University Computer Capability Challenge, 2019 - Runner-up / 414 teams
3. Experience
Deep Learning Engineer
- SenseTime | SenseCore
- Jul 2022 - Aug 2024
R&D Intern
- SenseTime | Research Institute (Deep Learning Frameworks)
- Jan 2021 - Jul 2022
Co-founder & CTO
- Wuhan Hongyuan Investment & Technology Services Co., Ltd.
- Nov 2019 - Sep 2020
Co-founder
- Yuye (Wuhan) Technology Development Co., Ltd.
- Jun 2019 - Nov 2019
4. Education
Master of Computer Science
- Cornell University | Cornell Tech | New York, USA
- May 2025
- GPA: 4.0/4.0
Bachelor of Software Engineering (Excellent Engineer Program)
- Wuhan UniversityChina | Wuhan, China
- Jun 2022
- GPA: 3.91/4.0
5. OpenSource Projects
Feature Developer
PyTorch
- May 2023 - Present
- Optimized the CuDNN Convolution operator in PyTorch, achieving a 15% performance boost in CNN training and inference for computer vision tasks; successfully merged into the PyTorch codebase.
- Authored a blog series with 10+ articles, providing the developer community with insights into PyTorch’s core architecture and optimizations.
MMCV & PAVI Logger
- Jan 2021 - Dec 2022
- Rebuilt the PAVI data collection SDK, achieving a 10× improvement in data upload efficiency through optimized parallel processing, significantly reducing ingestion time and enhancing performance for large-scale datasets.
- Integrated the proprietary PAVI Logger system into the MMCV library, enabling efficient and customizable logging for deep learning workflows, with the core system remaining private.
Main Contributor
LazyLLM
- May 2024 - Aug 2024
- Independently built a Retrieval-Augmented Generation (RAG) system in LazyLLM with a specialized tree architecture, which improved query performance by 50% over LlamaIndex by enhancing the efficiency of parent/child node retrieval, optimizing response times for large language models.
DeepLink
- May 2023 - May 2024
- Designed and implemented the Op Inferrer, bypassing PyTorch’s TensorIterator to increase the inference speed of binary, unary, and reduction operators by 5% across 40+ models, including large language models (LLMs).
- Identified and resolved CUDA performance bottlenecks by optimizing implementations within DeepLink and DIOPI, achieving an average 20% performance improvement across 30+ models; enhanced computational efficiency allowed ResNet50 to surpass PyTorch’s benchmark performance, providing significant speedups for high-demand tasks.
DIOPI
- Apr 2023 - May 2024
- Developed 30+ machine learning operators in DIOPI, enabling advanced functionalities across diverse hardware; implemented multi-chip adaptations to support CUDA, Cambricon, and Ascend architectures, enhancing cross-platform compatibility, reducing integration time and enhancing operational efficiency for large-scale systems.
Owned Projects
GAN-Paint
- Nov 2024 - Jan 2025
- Developed a lightweight GAN (generative adversarial network) for large-area image completion and cross-scene stitching, achieving realistic outputs on a single RTX 2070 GPU.
- Implemented an end-to-end training pipeline with efficient data preprocessing, masking strategies, and evaluation, completing model training within hours.
MicroTorch
- Jun 2023 - Aug 2024
- Developed a minimalistic deep learning framework inspired by PyTorch, implementing core functionalities such as AutoGrad, dynamic computation graphs, and tensor operations.
- Designed to be lightweight and modular, making it ideal for educational purposes, with extensive examples to facilitate learning.
CMU CSAPP
- Dec 2022 - Feb 2024 (C, Assembly)
- Self-studied the CMU CSAPP-15213 course and completed its associated labs, covering core concepts such as assembly optimization, multi-level cache, compiling and linking, exception control flow, virtual memory, and system-level I/O.
- Blogs
TinyNN
- Nov 2022 - Dec 2022 (Python)
- Built TinyNN, a minimal implementation of Fully Connected Neural Networks and Convolutional Neural Networks, designed for educational and experimental purposes.
You After Taking Drugs
- Aug 2021 (Python)
- Independently developed this system in 7 days using computer vision algorithms; optimized for smooth performance on a single i3 CPU, ensuring a seamless user experience and earning client approval in the first review.
- Software Copyright: “After Taking Drugs (Facial Human Morphing Experience)” (2022SR0021854).
Sicpy Compiler
- Nov 2020 - Dec 2020 (C, Flex, Bison)
- Designed and implemented an untyped programming language Sicpy and its corresponding compiler using flex and bison.
- Developed features including lexical, syntax, and semantic analysis, as well as type inference and automatic garbage collection via reference counting, providing a complete custom language framework for functional and imperative programming experimentation.
New Super Mario
- Apr 2020 (C#, unity)
- Group project with Jifeng Wu, Jinran Tang and Taihe Li.