Despite astonishing progress in the field of Machine Learning (ML), the robustness of high-performance models, especially the ones based on Deep Learning technologies, has been lower than initially predicted. These networks do not generalize as expected, remaining vulnerable to small adversarial perturbations (also known as adversarial attacks). Such shortcomings pose a critical obstacle to implement Deep Learning models for safety-critical scenarios such as autonomous driving, medical imaging, and credit rating.
Moreover, the gap between good performance and robustness also demonstrates the severe lack of explainability for modern AI approaches: Despite good performance, even experts cannot reliably explain model predictions.
Hence, the goals of this project are threefold:
- Investigate methods of explainability and interpretability for existing AI approaches (focusing on Deep Neural Networks).
- Develop novel architectures and training schemes that are more interpretable by design.
- Analyze the trade-offs between explainability, robustness, and performance.