RPG Seminar – Can 3D Vision-Language Models Truly Understand Natural Language?
Rapid advancements in 3D vision-language (3D-VL) tasks, such as 3D Visual Question Answering (3D-VQA) and 3D Visual Grounding (3D-VG), have opened up new avenues for human interaction with embodied agents […]
