Data security and privacy at MPNE consensus 2024

IBM recently took part in the MPNE Consensus conference in Berlin, Germany.

The conference was organized by the Melanoma Patient Network Europe (MPNE) in the framework of the iToBoS project, and was focused on data, AI and data-dependent business models. One of the main outcomes of the iToBoS project is an AI-based diagnostic platform for early detection of melanoma. It is not surprising then, that the use of AI in context of healthcare scenarios was in the spotlight, with talks on both the promise of AI as well as risks and additional considerations. This consensus meeting addressed topics that MPNE has identified as critical to the overall discussion on the usage of data, including secondary use, artificial intelligence and business models.

There is a known tension between the need to analyze personal data to drive business and privacy concerns. Many data protection regulations, including the EU General Data Protection Regulation (GDPR) and the California Consumer Protection Act (CCPA), set out strict restrictions and obligations on companies that collect or process personal data. The dramatic increase in recent years of AI usage in healthcare, especially with the rise of large language models (LLM), and the ability of AI systems to train on large amounts of data, raise significantly both the rewards and the risk involved.

In the conference, Ariel Farkash presented work done by IBM as part of the iToBoS project on Model Anonymization and Data Minimization, and specifically about their relevance to healthcare scenarios and especially for Melanoma in the context of iToBoS.

Model Anonymization – Since anonymous data is exempt from data protection principles and obligations, the goal is to re-build ML models based on anonymized data. However, learning on anonymized data typically results in a significant degradation in accuracy. IBM addressed this challenge by guiding the anonymization using the knowledge encoded within the model and targeting it to minimize the impact on the model's accuracy, a process we call accuracy-guided anonymization.

Data Minimization – GDPR mandates the principle of data minimization, which requires that only data necessary to fulfill a certain purpose be collected. However, it can often be difficult to determine the minimal amount of data required, especially in complex machine learning models such as neural networks. IBM presented a method to reduce the amount of personal data needed to perform predictions with a machine learning model, by removing or generalizing some of the input features. This method makes use of the knowledge encoded within the model to produce a generalization that has little to no impact on its accuracy.

The speaker also mentioned the open-source project we initiated called ai-privacy-toolkit that contains these two technologies and more, with the goal of providing tools and techniques related to the privacy and compliance of AI models.

While presenting the privacy mitigations on AI models, it was surprising to hear the amount of questions and interest this brought. Questions were coming in from both software engineers and physicians in the crowd. They were both surprised by the risks involved and the fact that some mitigations not only exist but are even practical.

More details at MPNE consensus 2024 on data, AI and data-dependent business models.