About Me

Computer Science M.Eng. student at Cornell University with expertise in deep learning, distributed training, and backend development. Former deep learning engineer at SenseTime, with a proven track record of optimizing large-scale LLMs across thousands of chips and contributing to open-source frameworks such as PyTorch. An award-winning developer and two-time startup co-founder, recognized for designing scalable AI solutions and driving innovation in machine learning..

Email: zhyanwentao@outlook.com / wy335@cornell.edu

Github: yewentao256

LinkedIn: Wentao Ye


  • One-iter Tool (CN117312173A), patented in 2023, reducing model accuracy validation time from hours to minutes.
  • Function-Level Task Scheduling Tool (CN115033366A), patented in 2022, streamlining distributed training workflows.

  • National Third Prize, China University Computer Contest - WeChat Big Data Challenge, 2021 - Rank 80 / 6,768 teams
  • National Third Prize, China University Computer Contest - Huawei Big Data Challenge, 2020 - Rank 27 / 1,491 teams
  • National Second Prize, China Service Outsourcing Innovation & Entrepreneurship Competition, 2020 - Top 1% / 6417 teams
  • National First Prize, China University Computer Capability Challenge, 2019 - Runner-up / 414 teams

  • Deep Learning Engineer

    • SenseTime | SenseCore
    • Jul 2022 - Aug 2024
  • R&D Intern

    • SenseTime | Research Institute (Deep Learning Frameworks)
    • Jan 2021 - Jul 2022
  • Co-founder & CTO

    • Wuhan Hongyuan Investment & Technology Services Co., Ltd.
    • Nov 2019 - Sep 2020
  • Co-founder

    • Yuye (Wuhan) Technology Development Co., Ltd.
    • Jun 2019 - Nov 2019

  • Master of Computer Science

    • Cornell University | Cornell Tech | New York, USA
    • May 2025
    • GPA: 4.0/4.0
  • Bachelor of Software Engineering (Excellent Engineer Program)

    • Wuhan UniversityChina | Wuhan, China
    • Jun 2022
    • GPA: 3.91/4.0

  • GitHub stars
  • May 2023 - Present
  • Optimized the CuDNN Convolution operator in PyTorch, achieving a 15% performance boost in CNN training and inference for computer vision tasks; successfully merged into the PyTorch codebase.
  • Authored a blog series with 10+ articles, providing the developer community with insights into PyTorch’s core architecture and optimizations.
  • GitHub stars
  • Jan 2021 - Dec 2022
  • Rebuilt the PAVI data collection SDK, achieving a 10× improvement in data upload efficiency through optimized parallel processing, significantly reducing ingestion time and enhancing performance for large-scale datasets.
  • Integrated the proprietary PAVI Logger system into the MMCV library, enabling efficient and customizable logging for deep learning workflows, with the core system remaining private.
  • GitHub stars
  • May 2024 - Aug 2024
  • Independently built a Retrieval-Augmented Generation (RAG) system in LazyLLM with a specialized tree architecture, which improved query performance by 50% over LlamaIndex by enhancing the efficiency of parent/child node retrieval, optimizing response times for large language models.
  • GitHub stars
  • May 2023 - May 2024
  • Designed and implemented the Op Inferrer, bypassing PyTorch’s TensorIterator to increase the inference speed of binary, unary, and reduction operators by 5% across 40+ models, including large language models (LLMs).
  • Identified and resolved CUDA performance bottlenecks by optimizing implementations within DeepLink and DIOPI, achieving an average 20% performance improvement across 30+ models; enhanced computational efficiency allowed ResNet50 to surpass PyTorch’s benchmark performance, providing significant speedups for high-demand tasks.
  • GitHub stars
  • Apr 2023 - May 2024
  • Developed 30+ machine learning operators in DIOPI, enabling advanced functionalities across diverse hardware; implemented multi-chip adaptations to support CUDA, Cambricon, and Ascend architectures, enhancing cross-platform compatibility, reducing integration time and enhancing operational efficiency for large-scale systems.
  • GitHub stars
  • Nov 2024 - Jan 2025
  • Developed a lightweight GAN (generative adversarial network) for large-area image completion and cross-scene stitching, achieving realistic outputs on a single RTX 2070 GPU.
  • Implemented an end-to-end training pipeline with efficient data preprocessing, masking strategies, and evaluation, completing model training within hours.
  • GitHub stars
  • Jun 2023 - Aug 2024
  • Developed a minimalistic deep learning framework inspired by PyTorch, implementing core functionalities such as AutoGrad, dynamic computation graphs, and tensor operations.
  • Designed to be lightweight and modular, making it ideal for educational purposes, with extensive examples to facilitate learning.
  • GitHub stars
  • Dec 2022 - Feb 2024 (C, Assembly)
  • Self-studied the CMU CSAPP-15213 course and completed its associated labs, covering core concepts such as assembly optimization, multi-level cache, compiling and linking, exception control flow, virtual memory, and system-level I/O.
  • Blogs
  • GitHub stars
  • Nov 2022 - Dec 2022 (Python)
  • Built TinyNN, a minimal implementation of Fully Connected Neural Networks and Convolutional Neural Networks, designed for educational and experimental purposes.
  • GitHub stars
  • Aug 2021 (Python)
  • Independently developed this system in 7 days using computer vision algorithms; optimized for smooth performance on a single i3 CPU, ensuring a seamless user experience and earning client approval in the first review.
  • Software Copyright: “After Taking Drugs (Facial Human Morphing Experience)” (2022SR0021854).
  • GitHub stars
  • Nov 2020 - Dec 2020 (C, Flex, Bison)
  • Designed and implemented an untyped programming language Sicpy and its corresponding compiler using flex and bison.
  • Developed features including lexical, syntax, and semantic analysis, as well as type inference and automatic garbage collection via reference counting, providing a complete custom language framework for functional and imperative programming experimentation.
  • Apr 2020 (C#, unity)
  • Group project with Jifeng Wu, Jinran Tang and Taihe Li.

gif

gif