Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization

Realistic image super-resolution is the task of creating an image with perceptually realistic details from a lower quality image.

However, detail-enhancement usually comes with unwanted artifacts and failing to feel realistic for some natural scenes. The latest generative stable diffusion models that try to solve realistic image super-resolution are facing this problem as they cannot keep fine structures faithfully at the pixel-level. Also, state of the art methods use extra skipped connections for enhancing details which makes training more intensive and some tasks that are performed in the latent space, such as image stylization, become more challenging.

The work introduced in [1] is a pixel-aware stable diffusion (PASD) network that solves the double task of realistic image super-resolution and personalized stylization.

Fig. 1  The architecture of the PASD network introduced in [1]

Fine details are perceived locally at pixel-level by using a pixel-aware cross attention module (PACA) and a module of degradation removal is introduced. The PASD network is able to do personalized stylization by shifting the base model to a certain style.

Bibliography

[1] Yang, Tao, et al. “Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization.” arXiv preprint arXiv:2308.14469 (2023)