Plotting Behind the Scenes: Towards Learnable Game Engines

Animation module ablation

We evaluate the choice of the diffusion framework by comparing our method with an equivalent one trained using a reconstruction objective rather than the diffusion objective.

Minecraft

Reconstruction Transformer

Note the irrealistic player animations and lack in matching between text prompts and generated results.

Ours small

The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.

Ours

The full version of our model

Tennis

Reconstruction Transformer

Note the irrealistic player animations and player sliding artifacts.

Ours small

The full version of our model, trained with a reduced amount of computational resources, matching the one used for the baselines.

Ours

The full version of our model