Image Composition Demo

Introduction

Image composition aims to cut the foreground from one image and paste it on another image to form a composite image. As a common image editing operation, image composition has a wide range of applications like augmented reality, artistic creation, and e-commerce. There could be many issues (e.g., unreasonable location and size, incompatible color and illumination, missing shadow) that could make the obtained composite images unrealistic. These issues can be addressed by image composition techniques (e.g., object placement, image blending, image harmonization, shadow generation). More in-depth discussion and resources can be found in [1].

Image harmonization is to harmonize a composite image by adjusting its foreground appearances consistent with the background region. Here we adopt the image harmonization method CDTNet [2] to generate the harmonized image, which is trained on the iHarmony4 dataset [3].

Harmony level reflects how harmonious the composite image is. In particular, given a composite image, we adopt the BargainNet in [4] to extract the domain codes of the foreground region and background region, and then assess the harmony level based on the Euclidean distance between two domain codes.

Shadow Generation aims to generate plausible shadow for the foreground object in the composite image. Here we adopt a diffusion model to generate shadow for the foreground object according to background information, which is is built upon ControlNet [11] and trained on DESOBAv2 dataset [12]. The DESOBAv2 dataset is an extension version of our DESOBA dataset [5] by adding more images and object-shadow pairs.

Object placement assessment is to verify whether a composite image is plausible in terms of the object placement. A reasonable placement satisfies that the foreground object should be placed at a reasonable location on the background considering location, size, occlusion, semantics, and etc. The first object placement assessment dataset is constructed in [6] and used in GracoNet [9], in which the object placement assessment task is regarded as a binary classification problem.

Generative Composition aims to generate realistic composite images based on a foreground image of a specific object, a background image, and a bounding box indicating the foreground placement. Here we use the “Composition” version of Controllable Composition (ControlCom) [10] to produce composition image, which is built on pretrained stable diffusion by taking masked background and noisy latent image as input.

The background and foreground images used in our demo are obtained from the RealHM [7] and AIM-500 [8] datasets, respectively.

More resources (e.g., paper, code, dataset) for image composition can be found in https://github.com/bcmi/Awesome-Image-Composition.

Reference

[1] Li Niu, Wenyan Cong, Liu Liu, Yan Hong, Bo Zhang, Jing Liang, Liqing Zhang, "Making Images Real Again: A Comprehensive Survey on Deep Image Composition", arXiv preprint:2106.14490, 2021. [pdf]

[2] Wenyan Cong, Xinhao Tao, Li Niu, Jing Liang, Xuesong Gao, Qihao Sun, Liqing Zhang, "High-Resolution Image Harmonization via Collaborative Dual Transformations", CVPR, 2022. [pdf] [dataset]

[3] Wenyan Cong, Jianfu Zhang, Li Niu, Liu Liu, Zhixin Ling, Weiyuan Li, Liqing Zhang, "DoveNet: Deep Image Harmonization via Domain Verification", CVPR, 2020. [pdf] [dataset&code]

[4] Wenyan Cong, Li Niu, Jianfu Zhang, Jing Liang, Liqing Zhang, "BargainNet: Background-Guided Domain Translation for Image Harmonization", ICME, 2021. [pdf] [code]

[5] Yan Hong, Li Niu, Jianfu Zhang, Liqing Zhang, "Shadow Generation for Composite Image in Real-world Scenes", AAAI, 2022. [pdf] [dataset&code]

[6] Liu Liu, Zhang Bo, Li Jiangtong, Niu Li, Liu Qingyang, Liqing Zhang, "OPA: Object Placement Assessment Dataset", arXiv preprint:2107.01889, 2021. [pdf] [dataset]

[7] Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang, "SSH: A Self-Supervised Framework for Image Harmonization", ICCV, 2021. [pdf] [dataset]

[8] Jizhizi Li, Jing Zhang, Dacheng Tao, "Deep Automatic Natural Image Matting", IJCAI, 2021. [pdf] [dataset]

[9] Siyuan Zhou, Liu Liu, Li Niu, Liqing Zhang, "Learning Object Placement via Dual-path Graph Completion", ECCV, 2022. [pdf] [code]

[10] Bo Zhang, Yuxuan Duan, Jun Lan, Yan Hong, Huijia Zhu, Weiqiang Wang, Li Niu, "ControlCom: Controllable Image Composition using Diffusion Model", arXiv preprint:2308.10040, 2023. [pdf] [code]

[11] Lvmin Zhang, Huijia Zhu, Anyi Rao, Maneesh Agrawala, "Adding Conditional Control to Text-to-Image Diffusion Models", ICCV, 2023. [pdf] [code]

[12] Qingyang Liu, Junqi You, Jianting Wang, Xinhao Tao, Bo Zhang, Li Niu, "Shadow Generation for Composite Image Using Diffusion Model", CVPR, 2024. [pdf] [dataset]