Medical data from patient records, intensive care monitoring, or clinical studies are used in machine learning processes to train neural networks. These AI models support doctors in interpreting ultrasound, X-ray, MRI, or CT images, in diagnostics and therapy planning, or in medical research. In hospital management, for example, they allow predictions to be made about the recovery process of patients, enabling more precise capacity planning or a seamless transition to follow-up care. There is also data from images, video, and audio material, for example, from the social interaction between doctors and patients, which can provide important information about the condition of patients with psychiatric or psychosomatic illnesses.
All this health data is highly sensitive and is subject to the strict provisions of the General Data Protection Regulation (GDPR), the data protection laws of the federal and state governments, and a range of other legal regulations in the field of healthcare and medical research. In addition to medical confidentiality, this framework is intended to ensure the privacy and integrity of health data, prevent its misuse, and protect patients' fundamental rights and freedoms. Even in de-identified or pseudonymized form, health data is particularly worthy of protection, as additional information, e.g., in the case of rare diseases or studies with small numbers of participants, makes it possible to draw conclusions about individual persons.
In order to turn this highly sensitive data into AI models, a trusted research environment (TRE) is required that ensures compliance with the GDPR and other data protection regulations, as well as the principles of good scientific practice. On July 17, DFKI will open such a research environment for processing sensitive personal data to train neural networks. SEMLA – Secure Machine Learning Architecture – is an internal DFKI research infrastructure that meets the requirements of the GDPR and other legal regulations. At its core is the implementation of so-called technical and organizational measures (TOMs) for data protection and security, which are required in data-sensitive research projects.
"SEMLA enables us to work with highly sensitive data in the first place. We first evaluate data from medical projects in which DFKI is involved and then train neural networks on this data. There is already a great need for this internally. In the future, SEMLA will be made available as an open source so that other research institutes and market participants can easily adapt and use the SEMLA solution," says SEMLA project manager Dr. Jan Alexandersson.
SEMLA enables scientists to conduct secure research with highly sensitive personal data. In contrast to cloud solutions, SEMLA stores and processes the data exclusively on-premises, i.e., at DFKI. SEMLA consists of a computing infrastructure (CPU, GPU, memory), which is operated and protected in Kaiserslautern, and a biometrically secured annotation and experimentation laboratory in Saarbrücken, the SEMLAb. The new research infrastructure is designed so that research can be carried out with data of the second highest sensitivity, class 4, according to the classification scheme of the Alan Turing Institute.
In the future, third parties will be able to calculate models on the data sets hosted by SEMLA via the Internet. Certification in accordance with ISO 2700X and TISAX, as well as EuroPriSe – the European data protection seal of approval (EuroPriSe, 2022) – is also being sought for this purpose.