To enable deep neural networks to be trained and used more quickly for new tasks in the future, DFKI in Saarbrücken and Inria in Nancy are working on high-performance High Performance Computing (HPC) environments. ENGAGE focuses on investigating how HPC environments can be optimized, and efficiently used in conjunction with other hardware environments for AI.
"Our project will be a blueprint for optimised and flexible HPC deployment for machine learning and for extremely fast adaptation and computation of deep neural networks. An important application is the validation and certification of AI systems through targeted testing with synthetically generated data from simulations. This will help to increase the trustworthiness of AI and thus increase the acceptance of AI in areas such as autonomous driving or industrial production," says Prof. Dr. Philipp Slusallek, ENGAGE Project Manager and Scientific Director at DFKI Saarbrücken.
ENGAGE is a project within the framework of the strategic research and innovation agenda of DFKI and Inria in the field of artificial intelligence. The scientific institutions signed a Memorandum of Understanding on 22 January 2020, the first anniversary of the Aachen Treaty. In the memorandum, they agreed to significantly strengthen their collaboration in the field of AI and to structure and formalize their long-standing scientific cooperation. Both are firmly anchored in the European research landscape, bring complementary expertise and experience in the field of High-Performance Computing (HPC), Big Data, and AI, and can thus focus directly on the project goals. The results of ENGAGE, in turn, will flow into other joint projects. A proposal from DFKI on the related topic of large-scale AI is currently being reviewed by German funding agencies. With its Grid'5000 mainframe computer, Inria has a large-scale and flexible test environment for experimental research in all areas of computer science, with a focus on parallel and distributed computing, including cloud, HPC, Big Data, and AI.
The project's design as a Franco-German collaboration also makes it possible to use computing time on European resources such as the Jean-Zay supercomputer in France or the supercomputing infrastructure PRACE (Partnership for Advanced Computing in Europe). In addition, synergies are planned with the German-French initiative GAIA-X to define common requirements for the European data infrastructure.
To ensure that deep neural networks can be trained and used more quickly for new tasks in the future, the ENGAGE project team is working on three levels:
Neural networks need a lot of data as training material to do their job. In many cases, this is not available or not in sufficient quantity. For example, particularly dangerous situations in road traffic, which the autonomous vehicle is supposed to recognize, do not occur frequently enough. For new types of machine parts in industrial production, for instance, whose service life is to be predicted with models, data may not even be available as real data at the time of market launch. In order to be able to handle such tasks, nevertheless, computer science resorts to synthetic data. The artificial data is generated beforehand, and only then is the neural network trained with it. This two-phase procedure costs computing resources and time. In ENGAGE, the researchers want to train neural networks with a continuous and parallel data stream. The possibility of generating training data on-demand enables the spontaneous training of neural networks for unforeseen tasks.
With AI algorithms specially and automatically adapted to the respective HPC environments and optimized at the hardware level, neural networks are to be adapted or recalculated much more quickly. In addition to the basic, faster creation of complex neural networks, the HPC infrastructure enables the parallel and thus very fast generation of large amounts of synthetic data for training and testing neural networks. Synthetically generated and thus freely variable training data make neural networks and decisions based on them more robust and comprehensible in their respective fields of application.
The second goal in ENGAGE is to explore different deployment strategies for complex AI workflows on hybrid execution infrastructures, e.g., a combination of supercomputers, cloud, and edge systems. The main expected deliverables are an associated software framework for large-scale experiment deployment, monitoring, and execution on various relevant scalable infrastructures.
The third work package focuses on optimizing resource utilization for AI workflows by improving the exploitation of parallel computing operations. To this end, the team is developing a set of methodological and algorithmic tools for memory management and the efficient use of heterogeneous computing resources.
In order for an HPC infrastructure to be used efficiently and also in conjunction with other hardware, specialized tools for data and model creation, for managing the model versions created and the data sets used for them, as well as for virtualizing the concrete hardware infrastructure are central tasks of the Franco-German project.
ENGAGE is a project at the interface of Big Data, HPC, and AI. The goal is to be able to address a hybrid computing infrastructure of high-performance, cloud, and edge computing in a flexible, intelligent, and automated way. By proposing relevant deployment and planning solutions, this project will be one of the first to advance the understanding of how to best leverage this computing continuum.