Information Theory, Robust Uncertainty Quantification and Predictive Guarantees.
Mathematics & Statistics, UMass Amherst
Time: September 30, 2019 @ 1:30 PM to 2:30 PM
Location: Ewing Hall Room336
We discuss connections between information theory, statistical learning, uncertainty quantification and predictive modeling, and in particular how to systematically select probabilistic metrics for enhanced learning and prediction. In machine learning, uncertainty quantification, as well as in model selection, reduction and approximate inference, we typically use a variety of probability metrics and information divergences, e.g. Wasserstein, Kullback-Leibler (KL), Renyi, χ^2 or Hellinger metrics. Although some choices are natural e.g. the relation between the KL divergence and Maximum Likelihood, often selecting probability metrics may appear arbitrary or becomes justified only a posteriori, based on the success of our final goal. To address these questions we focus instead on the impact of probability metrics on the tasks at hand, e.g. on predicting given observables or carrying out designated statistical learning tasks such as coarse-graining. To this end, this perspective requires to relate probability metrics/divergences with observables. Here we discuss some recently derived information inequalities that clarify and classify the connection between metrics and tasks to be performed; for example, the KL-divergence (the average of the log-likelihood between probabilities) leads to tight and computable information inequalities to control “typical” observables, e.g. expected values and variances. The family of Renyi divergences (related to the cumulant generating function of the log-likelihood) allows for information inequalities for rare events and related risk-sensitive observables. Finally in the context of sensitivity analysis the Fisher Information (the covariance of the score function) controls sensitivities of expected values, while the cumulant generating function of the score controls the sensitivity of rare events. All these metrics can be used in conjunction with concentration inequalities for easier implementation or to account for finite data. Finally we demonstrate these methods in complex, high dimensional reaction networks, and graphical modeling for multiscale modeling of energy storage devices.