Generative AI for Specialized Dataset Enhancement and Expansion

An important challenge for applying machine and deep learning methods in applications where data collection is difficult, or costly is the reduced amount of annotated data.

This is the motivation behind various methods for enhancing and extending existing datasets. Especially in the medical field where data acquisition and annotation are expensive and time-consuming, the importance of enhancing the already existing data is amplified.

An important way to extend annotated data that has gained popularity in the last few months, especially but not solely in applications that deal with images, is with the use of state-of-the-art generative AI models, such as Deep Image Prior and Denoising Diffusion Models. Denoising Diffusion Models, such as stable diffusion, are among the most dominant paradigms for such tasks, as the generated samples can also be conditioned on specific inputs, that can be provided as natural language among other forms, hence a large part of the literature is currently focusing on them. Being able to add very realistic yet synthetically generated data in existing datasets, proposes new opportunities for further expanding the use of deep learning models in medical applications, not only for image classification/detection problems, but also for other data modalities like clinical and even genetic data, that the iToBoS project is focusing on.