We evaluate our animation model against the Playable Environments baseline (PE) on the task of reconstructing a video from the initial state and actions for each player.
Note the irrealistic player animations and lack in matching between text prompts and generated results.
The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.
The full version of our model
Note the irrealistic player animations resulting from the model's inability to capture the multimodal distribution of player poses conditioned on text.
The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.
The full version of our model