DDLP: Unsupervised Object-centric Video Prediction with Deep Dynamic Latent Particles

Tal Daniel, Aviv Tamar

Research output: Contribution to journalArticlepeer-review

Abstract

We propose a new object-centric video prediction algorithm based on the deep latent particle (DLP) representation of Daniel & Tamar (2022a). In comparison to existing slot-or patch-based representations, DLPs model the scene using a set of keypoints with learned parameters for properties such as position and size, and are both efficient and interpretable. Our method, deep dynamic latent particles (DDLP), yields state-of-the-art object-centric video prediction results on several challenging datasets. The interpretable nature of DDLP allows us to perform “what-if” generation – predict the consequence of changing properties of objects in the initial frames, and DLP’s compact structure enables efficient diffusionbased unconditional video generation. Videos, code and pre-trained models are available: https://taldatech.github.io/ddlp-web/.

Original languageEnglish
JournalTransactions on Machine Learning Research
Volume2024
StatePublished - 2024

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'DDLP: Unsupervised Object-centric Video Prediction with Deep Dynamic Latent Particles'. Together they form a unique fingerprint.

Cite this