Ziniu Li
About meI am a Ph.D. student at The Chinese University of Hong Kong, Shenzhen (CUHKSZ), advised by Prof. Zhi-Quan (Tom) Luo. I am interested in artificial intelligence, especially reinforcement learning and large language models. I have worked/interned at Tencent, Nanjing University, Cardinal Operations, etc. Feel free to contact me if you want to discuss some ideas. Research StatementMy research focuses on designing adaptive and scalable ML algorithms and analyzing their theoretical guarantees. In the field of large language models, my work spans several key areas: data selection (NeurIPS 2023, Spotlight), diversity-preserving supervised fine-tuning (NeurIPS 2024 FITML Workshop, Oral), computationally efficient RLHF (ICML 2024), and hallucination mitigation (NeurIPS 2024 AFM Workshop). In the field of imitation learning, I am interested in the theory of sample complexity (NeurIPS 2020, TPAMI 2021, UAI 2023, Oral), as well as applications in robotics (ICLR 2024 Blog) and signal processing (TSP 2024). I also work on optimization-centric topics with other researchers, including understanding Adam in training Transformers (NeurIPS 2024), memory-efficient optimizers (ICML 2024 ES-FoMo Workshop), zero-order optimization (IJCAI 2020), and prompt-tuning (EMNLP 2024). Recent Highlights*: indicating equal contribution or alphabetic ordering. Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models TL;DR: This work shows that PPO overshoots for RLHF in LLMs and introduces ReMax, which requires half the memory of PPO and runs twice as fast When is RL better than DPO in RLHF? A Representation and Optimization Perspective TL;DR: This work analyzes the reward modeling quality in view of representations and the optimization error sources Imitation Learning from Imperfection: Theoretical Justifications and Algorithms TL;DR: This work validates that importance sampling is effective in data selection when leveraging multiple imperfect (out-of-distribution and low-quality) data sources ServiceReviewerNeurIPS (Top Reviewer), ICML (Outstanding Reviewer), ICLR (Highlighted Reviewer). Teaching Assistant
Lecturer
Award
|