Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large language models have achieved a good performance on different tasks and different data types. However, they often lack of content fidelity and new context creation. These features could help to generate images in a more personalized way.

Figure 1: Image generation using DreamBooth (https://arxiv.org/pdf/2208.12242.pdf).

DreamBooth is a fine tuning method proposed in the article. It allows you to obtain high results while requiring a few images (typically 3-5 images) for training. As a result, you get a final image that has:

  • the subject in a completely different environment - new context;
  • while the subject's key features are preserved - fidelity.