Robot Representations

For Scene Understanding, Reasoning and Planning


Audience Q&A:

Robots now have advanced perception, navigation, grasping and manipulation capabilities, but how come it’s still exceedingly difficult to bring these skills together to get a robot to autonomously tidy a room? A key limiting factor is that robots still lack the contextual scene understanding capabilities that allow humans to efficiently and compactly reason about our world and our actions within it. Metric (where) and semantic (what) representations are now common, but contextual (how) representations–how do objects interrelate and how can a robot interact with objects to achieve the task?–are still missing. How should we formulate these representations, and crucially, how can we allow robots–embodied agents–learn and update their contextual scene understanding from live experiences? Researchers in AI knowledge representation and reasoning as well as in the more distant field of linguistics have long grappled with similar questions. The goal of this workshop is to bring together those experts with researchers in the fields of robot scene understanding and long-horizon planning to discuss the state of the art and uncover synergies across the currently disparate disciplines.


Time Speakers/Authors
09:00-09:10 Welcome
09:10-09:40 Invited talk: Shuran Song (Columbia University)
Hierarchical Representations for Language-Based Reasoning
09:40-10:10 Invited talk: Jiayuan Mao (MIT)
Neuro-Symbolic Concepts for Robotic Manipulation
10:10-10:30 Spotlight presentations
10:30-11:00 Coffee break and posters
11:00-11:30 Invited talk: Janet Wiles (The University of Queensland)
Social Robots and Language Technologies: What can Robotics Learn from the Language Sciences?
11:30-12:00 Panel discussion: Reasoning and representations
12:00-13:30 Lunch
13:30-14:00 Invited talk: Rajat Talak (MIT)
Spatial Perception for Robotics: Representation, Structure, and Real-Time Systems
14:00-14:20 Best paper talk: Stefano Ferraro, Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt
FOCUS: Object-Centric World Models for Robotics Manipulation
14:20-15:00 Spotlight presentations
15:00-15:30 Coffee break and posters
15:30-16:00 Invited talk: Manolis Savva (Simon Fraser University)
3D Simulation for Embodied AI: Emerging Challenges & Opportunities
16:00-16:30 Invited talk: Helisa Dhamo (Huawei)
Scene Understanding via Semantic Scene Graphs
16:30-17:00 Panel discussion: 3D scene understanding
17:00-17:30 Closing remarks


🥇: Best paper award

Call for Papers


Submission link:

Participants are invited to submit an extended abstract or short papers (up to 4 pages in RSS format) focusing on novel advances in 3D scene understanding, predicate/affordance reasoning, high-level planning and at the boundary between these research areas.

Important dates: (deadlines are AoE on the respective date)

  • Submission deadline: May 22 June 30
  • Acceptance notification
    • For papers submitted before May 22: June 16
    • For papers submitted after May 22: on a rolling basis
  • Camera ready submission: June 23 June 30
  • Workshop date: July 10, 2023

Topics of interest include but are not limited to:

  • Novel algorithms for spatial perception that combine geometry, semantics, and context;
  • Approaches to learning and structuring contextual knowledge from complex sensory inputs;
  • Techniques for reasoning over spatial, semantic, and temporal aspects for long-horizon planning;
  • Approaches that combine learning-based techniques with geometric and model-based estimation methods; and
  • Position papers and unconventional ideas on how to reach human-level performance in robot scene understanding, task planning and execution.

Contributed papers will be reviewed by the organizers and a program committee of invited reviewers. Accepted papers will be published on the workshop website and will be featured in spotlight presentations and poster sessions.

LaTeX template link:

Instructions for Authors of accepted papers:

You will have the opportunty to present your work in a spotlight presentation which should last no longer than 5 minutes, followed by a short time for questions from the audience. For more in depth discussions, we invite you to prepare a poster for the poster session.

Important dates:

  • By June 30: Submit camera ready paper on CMT
  • By July 8: Submit presentation slides via email to fjulian AT ethz DOT ch
  • By July 10: Prepare poster (A0 or digital) and bring it to the workshop


Jen Jen Chung

Jen Jen Chung

Associate Professor

The University of Queensland

Luca Carlone

Luca Carlone

Associate Professor

Massachusetts Institute of Technology

Julian Förster

Julian Förster

Doctoral candidate

ETH Zürich

Federico Tombari

Federico Tombari

Lecturer, Research Scientist

TUM, Google