Abstract:
Recent years have witnessed remarkable breakthroughs across various scientific and technological pursuits, including visual recognition and language understanding that are approaching human-level capabilities and protein folding. These breakthroughs have been driven in part by the availability of data and computational resources at unprecedented scales, as well as rapid progress in designing and optimizing highly complex learning algorithms.
Despite the impressive success of modern machine learning, the real-world applicability of these algorithms is hindered by two interrelated problems. First, a theoretical understanding of these complex methods is far from complete. Second, from a practical standpoint, these methods are brittle against adversarial examples and domain shifts and often do not comply with quality attributes such as fairness and privacy. Many of these problems can be characterized in the framework of learning under extreme non-identifiability and constitute the key challenges of contemporary research in machine learning. Unsurprisingly, the lack of a theoretical understanding of algorithms under this framework is particularly severe.
There are three key aspects any theory should consider in this framework:
1. Identification: The problem of learning from finite samples fundamentally suffers from the issue of non-identifiability. The key to obtaining any theoretical justification is to define a model (through assumptions based on prior knowledge) and seek a model-relative justification. This is even more crucial for learning under extreme non-identifiability, where the issue of non-identifiability persists even in the infinite sample limit.
2. Estimation: Once the parameters or quantities of interest are precisely identified, the problem of learning is merely a problem of statistical estimation and can admit theoretical guarantees, such as consistency or rates of convergence.
3. Computation: Computation is the other key aspect of learning. While the focus of statistics is to provide guarantees under constraints on sample sizes, from a computational perspective, the focus is to obtain guarantees under constraints on the available computation.
In this thesis, we contribute to the theoretical foundations of the problem of learning under extreme non-identifiability with emphasis (to varying degrees) on each of the three aspects. In particular, we consider the problems of statistical clustering and causal learning and make the following contributions.
1. Theory of kernel clustering: We provide recovery guarantees for kernel-based clustering under parametric and non-parametric assumptions on the data-generating process. We study phase transitions in the problem of high-dimensional Gaussian clustering and show that kernel-based clustering algorithms can be informational-theoretically optimal in their respective computational classes.
2. Theory of causal learning: We introduce the framework of causal learning theory for forecasting and provide finite-sample uniform convergence guarantees for the causal risk of the class of vector autoregressive models. Under a linear causal model with potential latent confounders, we study the problem of causal generalization from the lens of interpolation and regularization.