|
Research
|
|
UrbanVLA: A Vision-Language-Action Model for Urban Micromobility
Anqi Li*, Zhiyong Wang*, Jiazhao Zhang*, Minghan Li,
Yunpeng Qi, Zhibo Chen, Zhizheng Zhang†, He Wang†
arXiv preprint
UrbanVLA is a route-conditioned Vision-Language-Action model for urban micromobility. It aligns noisy navigation-tool routes with visual observations to enable scalable, long-horizon navigation. Trained via a two stage pipeline including SFT and RFT, UrbanVLA outperforms baselines by over 55% on MetaUrban and achieves robust real-world navigation across 500m+ routes.
|
|
|
Embodied Navigation Foundation Model
Jiazhao Zhang*, Anqi Li*, Yunpeng Qi*, Minghan Li*, Jiahang Liu,
Shaoan Wang, Haoran Liu, Gengze Zhou, Yuze Wu, Xingxing Li, Yuxin Fan,
Wenjun Li, Zhibo Chen, Fei Gao, Qi Wu, Zhizheng Zhang†, He Wang†
arXiv preprint
NavFoM is a cross-embodiment and cross-task navigation model trained on 8 million samples encompassing quadrupeds, drones, wheeled robots, and vehicles, spanning tasks including vision-and-language navigation, object searching, target tracking, and autonomous driving.
|
|
|
TrackVLA: Embodied Visual Tracking in the Wild
Shaoan Wang*, Jiazhao Zhang*, Minghan Li, Jiahang Liu,
Anqi Li, Kui Wu, Fangwei Zhong, Junzhi Yu,
Zhizheng Zhang†, He Wang†
CoRL 2025
TrackVLA is a vision-language-action model capable of simultaneous object recognition and visual tracking, trained on a dataset of 1.7 million samples. It demonstrates robust tracking, long-horizon tracking, and cross-domain generalization across diverse challenging environments.
|
Experience
|
|
Galbot
China
2025.02 - Present
Research Intern
Research Advisor: Prof. He Wang, Dr. Zhizheng Zhang
|
|
|
Peking University
China
2023.09 - Present
Undergraduate Student
Research Advisor: Prof. He Wang
|
|