Yilun Chen

I am currently a research scientist at Embodied AI Center at Shanghai AI Labotory. Prior to that, I did my Ph.D. at CSE department of The Chinese University of Hong Kong, under the supervision of Prof. Jiaya Jia. My research interests primarily focus on robotic foundation models.

We are actively seeking motivated research fellows and interns with experience or interest in Robotics and 3D Vision at Shanghai AI Laboratory. If you are interested in robotic foundation models and 3D Vision, please contact me via email.

Email / LinkedIn / Google Scholar / Github



Timeline

  • [Nov 2025] CronusVLA is accepted by AAAI 2026 (Oral).
  • [Oct 2025] I co-organize the Workshop and Challenge on Multimodal Robot Learning in Physical Worlds . Welcome to join the Manipulation (built on GenManip) and Navigation (built on VLN-PE) tracks.
  • [Sept 2025] We're great to release InternVLA-M1, a spatially guided vision-language-action framework for generalist robot!
  • [March 2025] GenManip and RoboGround were accepted by CVPR 2025.
  • [Oct 2024] Our paper of PointLLM receives ECCV 2024 Best Paper Candidate!👏 Check out the Demo!
  • [Sep 2024] Three papers are accepted by NeurIPS 2024 and one paper is accepted by CoRL 2024.
  • [July 2024] One paper is accepted by ECCV 2024.
  • [Aug 2023] Code for FocalFormer3D is released!
  • [July 2023] Our paper FocalFormer3D is accepted by ICCV 2023.
  • [Mar 2023] FocalFormer3D ranks 1st place in nuScenes LiDAR 3D Detection and 3D Tracking leaderboard.
  • [Sep 2022] One paper is accepted by NeurIPS 2022.
  • [Aug 2022] DSGN++ is accepted by T-PAMI 2022 and code is available.
  • [March 2022] Two papers got accepted by CVPR2022.
  • [April 2020] Code for DSGN is released!
  • [March 2020] DSGN is accepted by CVPR 2020.
  • [June 2019] Fast Point R-CNN is accepted by ICCV 2019.
  • [February 2018] Our CPN is accepted by CVPR 2018
  • [Oct 2017] Won 1st Place in COCO 2017 Keypoint Challenge


Selected Publications
InternVLA-M1

InternVLA-M1: A Spatially Grounded Foundation Framework for Generalist Robot Policy
InternVLA-M1 Team
Technical Report, 2025
Dominated Hugging Face Robotics Trending with 6 of the top 8 models. (Sept. 2025)
[Project Page] [Paper] [Code]

CronusVLA

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Hao Li*, Shuai Yang*, Yilun Chen†, Xinyi Chen, Xiaoda Yang, Yang Tian, Hanqing Wang, Tai Wang, Dahua Lin, Feng Zhao, Jiangmiao Pang†
AAAI 2025 (Oral)
[Code] [Paper] [Simpler-OR Benchmark]

InstructVLA

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation
Shuai Yang*, Hao Li*, Bin Wang, Yilun Chen†, Yang Tian, Tai Wang, Hanqing Wang, Feng Zhao, Yiyi Liao, Jiangmiao Pang†
Arxiv Preprint, 2025
[Code] [Paper] [Simpler-Instruct Benchmark]

GenManip

GenManip: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao*, Yilun Chen*, Shuai Yang*, Xinyi Chen*, Yang Tian, Hao Li, Haifeng Huang, Hanqing Wang, Tai Wang, Jiangmiao Pang
CVPR 2025
[Code] [Project Page] [Paper]

Chat-Scene

Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
Haifeng Huang*, Yilun Chen*, Zehan Wang*, Rongjie Huang, Runsen Xu, Tai Wang, Yang Zhao, Jiangmiao Pang, Zhou Zhao
NeurIPS 2024
Ranked 1st place across ScanRefer benchmark. (Sept. 2024)
Ranked 1st place across Scan2Cap benchmark. (Sept. 2024)
[Code] [Paper]

PointLLM

PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin
ECCV 2024 (Oral)
Best Paper Candidate Award Candidate
[Paper] [Code] [Project Page]

FocalFormer3D

FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Anima Anandkumar, Jiaya Jia, Jose M. Alvarez
ICCV 2023
Ranked 1st place in nuScenes LiDAR 3D detection leaderboard. (Mar. 2023)
Ranked 1st place in nuScenes LiDAR 3D tracking leaderboard. (Mar. 2023)
[PDF] [Code]

DSGN++

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Yilun Chen, Shijia Huang, Shu Liu, Bei Yu, Jiaya Jia
T-PAMI 2022
Ranked 1st place among all camera-based approaches on KITTI 3D detection leaderboard (All categories, Nov. 2021).
Its multi-modal variant [VoCo] ranked 1st place among all approaches on KITTI 3D detection leaderboard (Car, May. 2022).
[PDF] [Code]

UVTR

Unifying Voxel-based Representation with Transformer for 3D Object Detection
Yanwei Li, Yilun Chen, , Xiaojuan Qi, Zeming Li, Jian Sun, Jiaya Jia
NeurIPS 2022
[PDF] [Code]

dsgn

Multi-View Transformer for 3D Visual Grounding
Shijia Huang, Yilun Chen, Jiaya Jia, Liwei Wang
CVPR 2022
[PDF] [Code]

dsgn

Efficient Neural Radiance Fields
Tao Hu, Shu Liu, Yilun Chen, Tiancheng Shen, Jiaya Jia
CVPR 2022
[PDF] [Code]

dsgn

DSGN: Deep Stereo Geometry Network for 3D Object Detection
Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia
CVPR 2020
Ranked 1st place among all camera-based approaches on KITTI 3D detection leaderboard. (All categories, Nov. 2019).
[PDF] [Project Page] [Code]

fprcnn

Fast Point R-CNN
Yilun Chen, Shu Liu, Xiaoyong Shen, Jiaya Jia
ICCV 2019
[PDF] [project page] [bibtex]

cpn

Cascaded Pyramid Network for Multi-Person Pose Estimation
Yilun Chen*, Zhicheng Wang*, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun
CVPR 2018
Champion of MS-COCO 2017 Keypoint Detection Challenge
Ranked 1st place on COCO Keypoint detection leaderboard (Oct. 2017).
[PDF] [project page] [Code] [bibtex]

rfcn++

R-FCN++: Towards Accurate Region-based Fully Convolutional Networks for Object Detection
Zeming Li, Yilun Chen, Gang Yu, Yangdong Deng
AAAI 2018 (Oral)
[PDF] [bibtex]

Experience
shlab March. 2023 - Now, Shanghai AI Lab
Collaborator: Jiangmiao Pang
nvidia June. 2022 - Feb. 2023, NVIDIA Research
Intern Mentor: Zhiding Yu, Jose M. Alvarez
smartmore Mar. 2020 - June. 2022, SmartMore Inc.
Intern Mentor: Shu Liu
tencent Mar. 2018 - Jan. 2020, Tecent Youtu Lab
Intern Mentor: Shu Liu
face Nov. 2016 - Nov. 2017, Megvii Face++
Intern Mentor: Gang Yu


Education
cuhk

Aug. 2018 - Present , The Chinese University of Hong Kong ,

Ph.D. Student, Computer Science & Engineering

cuhk

Aug. 2013 - July. 2017 , BeiHang University ,

Bachelor Degree, Computer Science & Engineering

Service
  • Conference Reviewer: CVPR, ECCV, ICCV, ICLR, NeurIPS, ICML, CoRL, IROS, ICRA

  • Jounral Reviewer: T-PAMI, IJCV, RA-L

  • Teaching: CSCI3310, CSCI3180, CSCI1120, ENGG1100