Publication*: indicating equal contribution or alphabetic ordering. 2024Why Transformers Need Adam: A Hessian Perspective Unlocking Black-Box Prompt Tuning Efficiency via Zeroth-Order Optimization Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity Sensing Jamming Strategy from Limited Observations: An Imitation Learning Perspective Adam-mini: Use Fewer Learning Rates To Gain More BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models When is RL better than DPO in RLHF? A Representation and Optimization Perspective 2023Imitation Learning from Imperfection: Theoretical Justifications and Algorithms Provably Efficient Adversarial Imitation Learning with Unknown Transitions Deploying Offline Reinforcement Learning with Human Feedback 2022Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis Rethinking ValueDice: Does It Really Improve Performance? A Note on Target Q-learning for Solving Finite MDPs with A Generative Oracle HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning 2021A Concise Introduction to Imitation Learning (In Chinese) Error Bounds of Imitating Policies and Environments for Reinforcement Learning 2020Error Bounds of Imitating Policies and Environments Efficient Exploration by Novelty-pursuit Self-Guided Evolution Strategies with Historical Estimated Gradients Solving The Inverse Design Problem of Electrical Fuse with Machine Learning 2019On Value Discrepancy of Imitation Learning |