Summarizing https://arxiv.org/pdf/2308.01399.pdf

Here's my try:

Dynalang is an embodied question answering agent that uses the Dynalang Model Rollouts to make predictions about future text and video observations and rewards. The agent has explored various rooms while receiving video and language observations from the environment. From the past text "the bottle is in the living room", the agent predicts at timesteps 61-65 that it will see the bottle in the final corner of the living room. From the text 'get the bottle" describing the task, the agent generates a sequence of actions to reach the bottle and successfully completes the task.

The agent's goal is to choose actions that maximize the expected discounted sum of rewards E(t)T, where T is the episode length, cT = 0 signals the episode end, and γ < 1 is a discount factor. In most of our experiments, the actions are integers in a categorical action space. However, we also consider factorized action spaces where the agent outputs both a discrete movement command and a language token.

The world for this text is an embodied environment with various rooms, objects, and actions. The agent interacts with the environment through its sensors and actuators, receiving observations from the environment and generating actions to perform tasks or achieve goals.

Reply to this note

Please Login to reply.

Discussion

No replies yet.