Data augmentation and normalization in Skin Cancer AI

Skin Cancer AI is a novel AI solution for early detection of skin cancer.

This AI solution has been developed by Torus AI (previously Torus Actions) and being supported by ITOBOS, one of the biggest European projects for fighting skin cancer. At present, Skin Cancer AI can detect up to 12 types of melanoma which are the most deadly skin cancers, 38 types of non-melanoma skin cancers, and more than 70 other types of skin lesions.

Here is one example of Skin Cancer AI prediction:

test image is a malignant melanoma, taken from https://dermnetnz.org/

There are several components contributed to the success of Skin Cancer AI, among which we have learnt about quality check, detailed classification system, cancer risk system, and the use of synthetic collision images in training process in previous blogs.

In this blog we will learn about data augmentation and normalization, which is another important component for the success of Skin Cancer AI.

First of all, what is data augmentation?

Data augmentation in data analysis are techniques used to increase the amount of data by adding slightly modified copies of already existing data or newly created synthetic data from existing data. It acts as a regularizer and helps reduce overfitting when training a machine learning model.” read from Wikipedia (https://en.wikipedia.org/wiki/Data_augmentation).

Simply speaking, data augmentation is are techniques to make data more diverse in order to make the trained model generalize better. Simple data augmentation techniques include rotation, flip, crop, color and contrast change of images. The use of synthetic collision images in training is another kind of data augmentation.

So what is data normalization? And what is it dedicated to?

Data normalization is the inverse process of data augmentation, which means to make data more uniform to be interpreted more easily during inference or prediction phase. Simple data normalization techniques include center crop, value normalization, size padding. Two typical data normalization techniques used in Skin Cancer AI are center crop and average color padding. Here are examples of original images and their normalization:

Together, data augmentation and data normalization make the training more robust and the inference more accurate. They are good practices in building AI solution and their efficiency have been proven with the success of Skin Cancer AI.