News
- [2026.02] One paper accepted by CVPR 2026 ๐
- [2025.09] Two papers accepted by NeurIPS 2025 ๐
- [2025.06] I graduated from
Peking University ๐
- [2025.01] Three papers accepted by ICLR 2025 ๐
- [2024.03] One paper accepted by CVPR 2024 ๐, One paper accepted by CVPR Workshop ๐ช
Education
๐ Fine-Grained Understanding
As (Co)First-Author
-
PAM: Perceive Anything - Recognize, Explain, Caption, and Segment Anything in Images and Videos
Comprehensive region-level understanding: segment, recognize, explain, and caption in images and videos.
โ
NeurIPS 2025
-
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
Renrui Zhang*, Xinyu Wei*, Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li
The first specialized LMM for multimodal mathematical problem-solving (CLIP-Math + CoT SFT + DPO).
โ
ICLR 2025
-
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Enabling MLLMs to interpret visual prompts (points, boxes, shapes) for fine-grained image comprehension.
โ
ICLR 2025
๐จ Fine-Grained Generation
As (Co)First-Author
-
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition
A comprehensive dataset for multi-image composition with identity consistency across 7 representative tasks.
โ
CVPR 2026
-
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?
Benchmark to systematically assess modern T2I models' ability in following intricate textual instructions.
Under Review
-
VideoVerse: How Far is Your T2V Generator from a World Model?
Benchmark evaluating modern T2V models on temporal causality and world knowledge for building world models.
Under Review
-
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin, Xinyu Wei, Renrui Zhang, Le Zhuo, Shitian Zhao, Siyuan Huang, Huan Teng, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li
Unified image-to-image assistant for generation, manipulation, and translation via free-form language instructions.
โ
ICLR 2025
Other Publications
-
GENIUS: Generative Fluid Intelligence Evaluation Suite
Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei, Guopeng Li, Wenshan Wu, Wentao Zhang
Benchmark for evaluating generative fluid intelligence: inducing patterns, executing constraints, and adapting to novel scenarios.
Under Review
-
UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing
Hongyang Wei, Bin Wen, Yancheng Long, Yankai Yang, Yuhang Hu, Tianke Zhang, Wei Chen, Haonan Fan, Kaiyu Jiang, Jiankang Chen, Changyi Liu, Kaiyu Tang, Haojie Ding, Xiao Yang, Jia Sun, Huaiqing Wang, Zhenyu Yang, Xinyu Wei, Xianglong He, Yangguang Li, Fan Yang, Tingting Gao, Lei Zhang, Guorui Zhou, Han Li
Unified framework for single-image editing and multi-image composition with scalable and consistent multi-reference inputs.
Under Review
-
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
First comprehensive study of DPO vs. GRPO in autoregressive image generation with CoT reasoning.
โ
NeurIPS 2025
-
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei, Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang
Enhancing compressed device-deployed MLLMs via cloud collaboration and adapter-based knowledge distillation.
โ
CVPR 2024
-
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Tuning-free personalization for text-to-image models from a single face image with identity preservation.
โ
CVPR 2024
Hobbies
Photography ๐ธ, Body Building ๐ช, Movie ๐ฌ, Basketball ๐, Video Games ๐ฎ, Snorkeling ๐คฟ
I read history and philosophy ๐
I travel all around the world ๐
Service
- Reviewer for ICLR, CVPR, ECCV
- Cluster & API Quota Administrator, VCLab