Creation of a test dataset for automatic lesion detection

Test datasets serve as a benchmark to evaluate the performance of algorithms and models.

During the development of Machine Learning models, the full pipeline to create manual ground truth was not still available when the team was already developing solutions for automatic lesion detection in regional images. Therefore, it was crucial to provide a reduced test set to our partners for evaluating their models.

The test dataset was created using volunteer individuals outside the iToBoS clinical trials, with given permission to use their images. These images followed the image anonymization steps described in previous entries: Firstly, inpainting techniques were applied to the face and head areas. Tattoo anonymization was not needed since the patient did not have any tattoos or relevant marks. The next step consisted of tiling the regional images, so each image represented around a 10cm by 10cm square of the skin of the patient.

Finally, the images were stored in DICOM format, including pixel sizing information, and body location of each tile. The test set that we created includes images of two different scans. The ground truth was generated using automatic lesion detection by Canfield as a starting point, followed by manual annotations to refine it, using V7 Darwin platform to edit and correct the original detection. This test set contains around 300 images with their corresponding ground truth files.