CLIPAG: Towards Generator-Free Text-to-Image Generation

Roy Ganz, Michael Elad

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings. While this phenomenon has gained significant research attention, it was solely studied in the context of unimodal vision-only architectures. In this work, we extend the study of PAG to Vision-Language architectures, which form the foundations for diverse image-text tasks and applications. Through an adversarial robustification finetuning of CLIP, we demonstrate that robust Vision-Language models exhibit PAG in contrast to their vanilla counterparts. This work reveals the merits of CLIP with PAG (CLIPAG) in several vision-language generative tasks. Notably, we show that seamlessly integrating CLIPAG in a "plug-n-play"manner leads to substantial improvements in vision-language generative applications. Furthermore, leveraging its PAG property, CLIPAG enables text-to-image generation without any generative model, which typically requires huge generators.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
Pages3831-3841
Number of pages11
ISBN (Electronic)9798350318920
DOIs
StatePublished - 2024
Event2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024 - Waikoloa, United States
Duration: 4 Jan 20248 Jan 2024

Publication series

NameProceedings - 2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024

Conference

Conference2024 IEEE Winter Conference on Applications of Computer Vision, WACV 2024
Country/TerritoryUnited States
CityWaikoloa
Period4/01/248/01/24

Keywords

  • adversarial attack and defense methods
  • Adversarial learning
  • Algorithms
  • Algorithms
  • Vision + language and/or other modalities

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'CLIPAG: Towards Generator-Free Text-to-Image Generation'. Together they form a unique fingerprint.

Cite this