Point2pix-Zero: Point-driven refined diffusion for multi-object image editing
Semantic image editing methods employing large-scale diffusion models have made significant strides in precise and controlled image editing with text prompts as guidance. However, these models struggle to handle complex images containing hard-described objects and/or multiple objects. In this work, we introduce a novel inference-time multi-object image editing strategy, Point2pix-Zero, editing a single object with the simple guidance of clicked points and the text of target objects. We employ an interactive methodology, point-discovery, as text-free guidance to identify the semantic information of intended edited objects and generate text prompts automatically. Instead of exploiting internal cross-attention maps of diffusion models as a guide, we inject external attention maps to rectify the visual-and-semantic pairing mismatches in cross-attention maps during the denoising process. Extensive empirical evaluations demonstrate the effectiveness of our proposed inference-time method in ensuring precise editing while maintaining image fidelity. Our method showcases superior performance in single- and multi-object image editing, positioning it as a new state-of-the-art.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Related Subject Headings
- Artificial Intelligence & Image Processing
- 4611 Machine learning
- 4605 Data management and data science
- 4603 Computer vision and multimedia computation
- 0906 Electrical and Electronic Engineering
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing
Citation
Published In
DOI
ISSN
Publication Date
Volume
Related Subject Headings
- Artificial Intelligence & Image Processing
- 4611 Machine learning
- 4605 Data management and data science
- 4603 Computer vision and multimedia computation
- 0906 Electrical and Electronic Engineering
- 0806 Information Systems
- 0801 Artificial Intelligence and Image Processing