Abstract
We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation of Daniel & Tamar (2022a). In comparison to existing slot-or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform “what-if” generation – predict the consequence of changing properties of objects in the initial frames, and DLP’s compact structure enables efficient diffusionbased unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web/.
| Original language | English |
|---|---|
| Journal | Transactions on Machine Learning Research |
| Volume | 2024 |
| State | Published - 2024 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Vision and Pattern Recognition