Out-of-Distribution Detection, Sharpness, and Unlearning: Advancing Robust and Trustworthy Deep Learning

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/176056
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1760560
http://dx.doi.org/10.15496/publikation-117381
Dokumentart: Dissertation
Erscheinungsdatum: 2026-02-23
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Hein, Matthias (Prof. Dr.)
Tag der mündl. Prüfung: 2026-02-13
DDC-Klassifikation: 004 - Informatik
Schlagworte: Neuronales Netz , Robustheit , Neuronales Netz
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Deep neural networks have achieved outstanding performance across a variety of domains and tasks, often exceeding human-level performance. Given their widespread applications and continued adoption in real-world scenarios, including safety-critical ones, an obvious requirement is that they should be robust and trustworthy. In this work, we will investigate three perspectives on robustness and trustworthiness for deep learning systems. First, we focus on out-of-distribution (OOD) detection, which addresses the known issue of classifiers making overly confident predictions on data that doesn't belong to any of their training classes. We demonstrate that commonly used OOD detection benchmarks for ImageNet-1k are flawed due to contamination with in-distribution objects, leading to incorrect results. As a solution, we introduce NINCO, a new OOD test dataset for ImageNet-1k, manually verified to be free of in-distribution objects, enabling precise analysis of OOD detection failure modes. We observe that the popular Mahalanobis distance method, while state-of-the-art with some models, is brittle, a problem we link to violations of its underlying Gaussian assumptions. We introduce Mahalanobis++, a simple remedy using l2-normalization that mitigates this brittleness and yields state-of-the-art OOD detection results on ImageNet-1k. Then, we turn to the sharpness of the loss landscape, which can be seen as robustness in weight space, and its relation to generalization. Sharpness has long been hypothesized to be predictive or even causally responsible for the generalization of neural networks. We first conduct an empirical study of several sharpness measures in a setting that goes beyond previously investigated setups, both in scale and in the kind of models and measures investigated. Overall, we find that sharpness cannot reliably predict the generalization properties. We then turn to Sharpness-Aware Minimization (SAM), a method that aims to explicitly seek flat minima by modifying the optimization objective. We show that the generalization benefits of SAM can be achieved with a strongly modified objective that only considers the normalization layers of neural networks, which casts further doubt on the sharpness narrative. Finally, we investigate unlearning in large language models (LLMs). After training, LLMs often contain knowledge or exhibit behaviour that should be removed from the model. Such removal is referred to as unlearning, and is typically achieved via fine-tuning on targeted datasets. We show that existing unlearning evaluations are brittle and propose a unified evaluation protocol, along with the novel Lesser-Known-Facts (LKF) dataset, that resembles a realistic scenario in which lesser-known concepts acquired during the pretraining phase should be unlearnt. We also propose unlearning via the Jensen-Shannon divergence (JensUn). We demonstrate that it leads to a better trade-off between unlearning quality and general model utility, and is more robust in settings where the unlearnt information is recovered by finetuning the model on unrelated data.

Das Dokument erscheint in: