高清无码

高清无码

Kaifeng Lyu

Machine Learning Theory, AI Safety/Alignment, Optimization

I am a postdoctoral research fellow at the Simons Institute at UC Berkeley. Before that, I obtained my Ph.D. in Computer Science in 2024 from Princeton University and I was very fortunate to be advised by Prof. Sanjeev Arora. I will be joining the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University as a Tenure-Track Assistant Professor in June 2025.

I did my undergraduate at Tsinghua University and received a B.Eng. in Computer Science and Technology in 2019. At Tsinghua, I was a student of Yao Class headed by Prof. Andrew Chi-Chih Yao and I was very fortunate to be advised by Prof. Jian Li.

高清无码Research Interests

I am primarily interested in machine learning theory, AI safety/alignment, and optimization.

高清无码Recruitment for Ph.D. students

Please see the Chinese version of this page for more information.

高清无码会议论文

ICML 2025

Weak-to-Strong Generalization Even in Random Feature Networks, Provably

Marko Medvedev*,Kaifeng Lyu*,Dingli Yu,Sanjeev Arora,Zhiyuan Li,Nathan Srebro

A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

Kairong Luo,Haodong Wen,Shengding Hu,Zhenbo Sun,Zhiyuan Liu,Maosong Sun,Kaifeng Lyu,Wenguang Chen

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

Kaiyue Wen*,Xingyu Dang*,Kaifeng Lyu

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Xiangyu Qi,Ashwinee Panda,Kaifeng Lyu,Xiao Ma,Subhrajit Roy,Ahmad Beirami,Prateek Mittal,Peter Henderson

Oral Presentation (Top 1.8%). Outstanding Paper Award (Top 3/3827=0.08%).

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

Binghui Li*,Zhixuan Pan*,Kaifeng Lyu,Jian Li

Efficient Stagewise Pretraining via Progressive Subnetworks

Abhishek Panigrahi*,Nikunj Saunshi*,Kaifeng Lyu,Sobhan Miryoosefi,Sashank Reddi,Satyen Kale,Sanjiv Kumar

Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias

Rui Lu*,Runzhe Wang*,Kaifeng Lyu,Xitai Jiang,Gao Huang,Mengdi Wang

Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

Kaifeng Lyu*,Haoyu Zhao*,Xinran Gu*,Dingli Yu,Anirudh Goyal,Sanjeev Arora

A Quadratic Synchronization Rule for Distributed Deep Learning

Xinran Gu*,Kaifeng Lyu*,Sanjeev Arora,Jingzhao Zhang,Longbo Huang

arXiv poster

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

Kaifeng Lyu*,Jikai Jin*,Zhiyuan Li,Simon S. Du,Jason D. Lee,Wei Hu

DistillSpec: Improving Speculative Decoding via Knowledge Distillation

Yongchao Zhou,Kaifeng Lyu,Ankit Singh Rawat,Aditya Krishna Menon,Afshin Rostamizadeh,Sanjiv Kumar,Jean-François Kagy,Rishabh Agarwal

The Marginal Value of Momentum for Small Learning Rate SGD

Runzhe Wang,Sadhika Malladi,Tianhao Wang,Kaifeng Lyu,Zhiyuan Li

Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing

Jikai Jin,Zhiyuan Li,Kaifeng Lyu,Simon S. Du,Jason D. Lee

Why (and When) does Local SGD Generalize Better than SGD?

Xinran Gu*,Kaifeng Lyu*,Longbo Huang,Sanjeev Arora

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction

Kaifeng Lyu,Zhiyuan Li,Sanjeev Arora

On the SDEs and Scaling Rules for Adaptive Gradient Algorithms

Sadhika Malladi*,Kaifeng Lyu*,Abhishek Panigrahi,Sanjeev Arora

New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound

Arushi Gupta*,Nikunj Saunshi*,Dingli Yu*,Kaifeng Lyu,Sanjeev Arora

Oral Presentation (Top 1.9%).

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Kaifeng Lyu*,Zhiyuan Li*,Runzhe Wang*,Sanjeev Arora

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

Zhiyuan Li,Yuping Luo,Kaifeng Lyu

(按字母序排序)

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Zhiyuan Li*,Kaifeng Lyu*,Sanjeev Arora

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Kaifeng Lyu,Jian Li

Oral Presentation (Top 1.9%).

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

Sanjeev Arora,Zhiyuan Li,Kaifeng Lyu

(按字母序排序)

Fine-grained complexity meets IP = PSPACE

Lijie Chen,Shafi Goldwasser,Kaifeng Lyu,Guy N Rothblum,Aviad Rubinstein

(按字母序排序)

Single-Source Bottleneck Path Algorithm Faster than Sorting for Sparse Graphs

Ran Duan,Kaifeng Lyu,Hongxun Wu,Yuanhang Xie

(按字母序排序)

Learning gradient descent: Better generalization and longer horizons

Kaifeng Lv*,Shunhua Jiang*,Jian Li

(Contribution order by default; Asterisk * stands for equal contribution.)

高清无码Students

PhD Students:

  • Haodong Wen (incoming)

  • Kexian Tang (incoming)

Master's Student:

  • Rui Chen (incoming)

高清无码Teaching

Planned Courses

  • Fall 2025. Large Language Models from Scratch: Theory and Practice, Tsinghua University.

Teaching Assistant Experience

高清无码Service

Professional Services

  • Organizer, NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning (M3L 2024).

  • Organizer, NeurIPS 2023 Workshop on Mathematics of Modern Machine Learning (M3L 2023).

  • Conference Reviewer: ICML (2020-2025), NeurIPS (2020-2023), ICLR (2022-2025), TPAMI, COLT (2020,2025), AAAI (2020), KDD (2022).

  • Journal Reviewer: TMLR, JMLR, TPAMI, AIJ.

  • Organizer, Yao Class Seminar, Tsinghua University (Fall 2019, Fall 2020, Spring 2021).

Universal Online Judge

  • I founded the Universal Online Judge (UOJ) in 2014, a popular online judge system in China.

  • UOJ is capable of testing both traditional and non-traditional programming problems in OI (Olympiad in Informatics). A team of top OI players regularly hosts programming contests on UOJ.

  • [Link] [GitHub] [Docs]

    高清无码Related information

    Email

    高清无码

    GitHub

    //github.com/vfleaking

    Google Scholar

    //scholar.google.com/citations?user=843JJtgAAAAJ
    TOP