News
[2026.06] ๐ฅ๐ฅ๐ฅLocateAnything surpassed 100K+ Hugging Face downloads within one week of release and ๐ topped the HF model trend ! Have a try!
[2026.02] One paper accepted by CVPR 2026 ๐
[2025.09] Two papers accepted by NeurIPS 2025 ๐
[2025.06] I graduated from Peking University ๐
[2025.01] Three papers accepted by ICLR 2025 ๐
[2024.03] One paper accepted by CVPR 2024 ๐, One paper accepted by CVPR Workshop ๐ช
Education
๐ Fine-Grained Understanding
As (Co)First-Author
PAM: Perceive Anything - Recognize, Explain, Caption, and Segment Anything in Images and Videos
Region-level fine-grained understanding with arbitrary kind of visual prompts: segment, recognize, explain, and caption in images and videos.
โ
NeurIPS 2025
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
Renrui Zhang*, Xinyu Wei* , Dongzhi Jiang, Ziyu Guo, Shicheng Li, Yichi Zhang, Chengzhuo Tong, Jiaming Liu, Aojun Zhou, Bin Wei, Shanghang Zhang, Peng Gao, Chunyuan Li, Hongsheng Li
The first specialized VLM for multimodal math problem-solving (CLIP-Math + CoT SFT + DPO), with automatic focus on key regions in mathematical figures.
โ
ICLR 2025
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Enabling MLLMs to interpret visual prompts (points, boxes, shapes) for fine-grained image comprehension.
โ
ICLR 2025
๐จ Fine-Grained Generation
As (Co)First-Author
MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition
Id-consistent multi-reference image generation demanding fine-grained reference understanding: a dataset, benchmark, and strong baseline model Qwen-MICo.
โ
CVPR 2026
TIIF-Bench: How Does Your T2I Model Follow Your Instructions?
Benchmark to systematically assess modern T2I models' ability in following intricate textual instructions.
Under Review
VideoVerse: How Far is Your T2V Generator from a World Model?
Benchmark evaluating modern T2V models on temporal causality and world knowledge.
Under Review
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin*, Xinyu Wei *, Renrui Zhang*, Le Zhuo, Shitian Zhao, Siyuan Huang, Huan Teng, Junlin Xie, Yu Qiao, Peng Gao, Hongsheng Li
Unified image-to-image assistant for generation, manipulation, and translation via free-form language instructions.
โ
ICLR 2025
Other Publications
GENIUS: Generative Fluid Intelligence Evaluation Suite
Ruichuan An, Sihan Yang, Ziyu Guo, Wei Dai, Zijun Shen, Haodong Li, Renrui Zhang, Xinyu Wei , Guopeng Li, Wenshan Wu, Wentao Zhang
Benchmark for evaluating generative fluid intelligence: inducing patterns, executing constraints, and adapting to novel scenarios.
Under Review
UniRef-Image-Edit: Towards Scalable and Consistent Multi-Reference Image Editing
Hongyang Wei, Bin Wen, Yancheng Long, Yankai Yang, Yuhang Hu, Tianke Zhang, Wei Chen, Haonan Fan, Kaiyu Jiang, Jiankang Chen, Changyi Liu, Kaiyu Tang, Haojie Ding, Xiao Yang, Jia Sun, Huaiqing Wang, Zhenyu Yang, Xinyu Wei , Xianglong He, Yangguang Li, Fan Yang, Tingting Gao, Lei Zhang, Guorui Zhou, Han Li
Unified framework for single-image editing and multi-image composition with scalable and consistent multi-reference inputs.
Under Review
Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO
First comprehensive study of DPO vs. GRPO in autoregressive image generation with CoT reasoning.
โ
NeurIPS 2025
Cloud-Device Collaborative Learning for Multimodal Large Language Models
Guanqun Wang, Jiaming Liu, Chenxuan Li, Junpeng Ma, Yuan Zhang, Xinyu Wei , Kevin Zhang, Maurice Chong, Ray Zhang, Yijiang Liu, Shanghang Zhang
Enhancing compressed device-deployed MLLMs via cloud collaboration and adapter-based knowledge distillation.
โ
CVPR 2024
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models
Tuning-free personalization for text-to-image models from a single face image with identity preservation.
โ
CVPR 2024 Workshop
Hobbies
Photography ๐ธ, Body Building ๐ช, Movie ๐ฌ, Basketball ๐, Video Games ๐ฎ
I read history and philosophy ๐
I travel all around the world ๐
AI is only a brief spark in modern history; modern history is only a blink in agricultural civilization; agricultural civilization is only a thin slice of Homo sapiens' story; and Homo sapiens are only a chapter in the vast history of life. Every age is connected, every phenomenon has a cause.
Service
Reviewer for ICLR, CVPR, ECCV, NeurIPS
Cluster & API Quota Administrator, VCLab