We evaluate the choice of the diffusion framework by comparing our method with an equivalent one trained using a reconstruction objective rather than the diffusion objective.
Note the irrealistic player animations and lack in matching between text prompts and generated results.
The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.
The full version of our model
Note the irrealistic player animations and player sliding artifacts.
The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.
The full version of our model