Data sets for machine learning based approaches for non-contact dermoscopy

We present an overview of the online datasets meant for machine learning algorithms. As the saying goes, “An algorithm can be only as good as the data set”. 

Dermoscopy is a visual science. The quality of images plays critical role for machine learning tools. Many such datasets are available to the online community for research purposes. We at LUH have a great team of professors, PhDs and master students who are pursuing machine learning approach for malignant melanoma vs. benign nevus.

Choosing the right dataset is critical for machine learning, and with this, a variety of online datasets are explored. We have screened and considered only datasets with labels derived from certified dermatologists. The majority of datasets consist of widefield images, and only a few are purely dermatoscopic in nature. We require a large dataset for each skin disease in order to develop an image classifier. In the course of data exploration, it appears that only the skin diseases "Melanoma" and "Nevus" have sufficient images to train an image classifier with acceptable accuracy.

ISIC Archive

The data was collected by the International Skin Imaging Collaboration (ISIC) and images were taken from University of Athens Medical School, Melanoma Institute Australia, Medical University of Vienna, Memorial Sloan Kettering Cancer Centre, Melanoma Institute Australia, Hospital Clínic de Barcelona and the University of Queensland. ISIC is supported by philanthropic contributions to Memorial Sloan Kettering Cancer Centre. ISIC facilitates a partnership between industry and academia that aims to reduce melanoma mortality through digital skin imaging.

The following datasets contain 5598 Melanoma and 27,878 Nevus pictures.

  • 2018 JID Editorial Images
  • BCN_20000
  • Brisbane ISIC Challenge 2020
  • BCN_2020_Challenge
  • Dermoscopedia (CC-BY)
  • HAM10000
  • ISIC 2020 Challenge - MSKCC contribution
  • ISIC_2020_Vienna_part2
  • ISIC_2020_Vienna_part_1
  • MSK-1
  • MSK-2
  • MSK-3
  • MSK-4
  • SONIC
  • Sydney (MIA / SMDC) 2020 ISIC challenge contribution
  • UDA-1
  • MSK-5
  • UDA-2

Derm7pt

In this dataset, over 2000 images are included, including both clinical and dermoscopic images for every case.

PAD-UFES-20

A smartphone was used to collect patient information and clinical images for this skin lesion dataset. The skin samples collected from six different types of lesions totalled 2,298 samples. This dataset contains 58% biopsy-proven samples.

MED-NODE

This dataset includes images of melanomas and nevus taken from the University Medical Centre Groningen's Department of Dermatology.

PH2

A PH2 dataset was designed for use in research and benchmarking studies, permitting a comparison between different algorithms for segmenting and classifying dermoscopic images. In Matosinhos, Portugal, the Dermatology Department of Hospital Pedro Hispano acquired a dermoscopic image Database

SD-198

Digital cameras and mobile phones were used to capture images of 198 skin conditions and 6,584 clinical images collected as part of SD-198.