## January 24, 2018, 12:00pm – 1:00pm

*Special Guest Lecture: Data Assimilation for High Dimensional Systems: Role of Unstable Subspace*

**Location:** SAMSI Classroom

**Speaker:** *Amit Apte*, International Centre for Theoretical Sciences

## Abstract

Nonlinear filtering problems for estimation of the state of a high dimensional chaotic system given noisy, partial observations of the systems are widely known as data assimilation in the context of earth sciences. The main object of interest in these problems is the conditional distribution, called the posterior, of the state conditioned on the observations. The characteristics of the dynamics of the system, in particular the unstable subspace, play a crucial role in determining the asymptotic in time properties of this posterior, as discussed extensively by Anna Trevisan and collaborators when introducing a method known as assimilation in the unstable subspace (AUS). This talk will focus on our recent work related to the convergence of the Kalman filter covariance matrix onto the unstable-neutral subspace for a linear, deterministic dynamical system with linear observation operator. I will also discuss implications for a widely used method known as ensemble Kalman filter (EnKF).

Joint work with Marc Bocquet, Karthik Gurumoorthy, Alberto Carrassi, Colin Grudzien, Chris Jones

## References

No references provided at this time

## January 31, 2018, 1:15pm – 2:15pm

*Lecture: Sum-of-Squares Optimization Without Semi-definite Programming*

**Location:** SAMSI Classroom

**Speaker:** Sercan Yildiz, Second-Year SAMSI Postdoctoral Fellow and Researcher at the University of North Carolina at Chapel Hill

## Abstract

Sum-of-squares polynomials are polynomials which can be expressed as a finite sum of squared polynomials. They are instrumental in polynomial optimization and the related problem of deciding polynomial nonnegativity. These problems are of fundamental importance in many mathematical areas, including probability theory, control theory, power systems engineering, design of experiments, and statistical estimation. In this talk, we propose a homogeneous primal-dual interior-point method for optimization over sum-of-squares polynomials, combining techniques for non-symmetric conic optimization and polynomial interpolation. Our approach optimizes directly over sum-of-squares polynomials, circumventing the semidefinite programming (SDP) reformulation which requires a large number of auxiliary variables. As a result, it has substantially lower theoretical time and space complexity than the conventional SDP-based approach. Computational results confirm that, for problems that involve high-degree polynomials, the proposed method is several orders of magnitude faster than the SDP-based approach.

## References

- Papp and S. Yildiz. Sum-of-Squares Optimization Without Semidefinite Programming. Available at https://arxiv.org/abs/1712.01792.
- Papp and S. Yildiz. On “A Homogeneous Interior-Point Algorithm for Non-Symmetric Convex Conic Optimization”. Available at https://arxiv.org/abs/1712.00492.

## February 7, 2018, 1:15pm – 2:15pm

*Special Guest Lecture: Bayesian Modeling of Non-stationary Spatial Processes via Domain Partitioning*

**Location:** SAMSI Classroom

**Speaker:** *Veronica Berrocal*, Associate Professor of Biostatistics, University of Michigan

## Abstract

A key component of statistical models for spatial processes is the spatial covariance function, which is traditionally assumed to belong to a parametric class of stationary models whose parameters are estimated using observed data. While convenient, the assumption of stationarity is often not realistic. A rich literature on alternative strategies to model non-stationary spatial processes has been proposed.

In this talk, we will discuss two Bayesian statistical approaches to model non-stationary spatial processes where the common assumption is that the process is globally non-stationary but locally stationary. In the first model, the spatial dependence of the process is assumed to be captured locally by a parametric, stationary covariance function whose parameters vary across subregions of local stationarity and the definition of the latter is informed by covariate processes. To account for uncertainty in the spatial partitioning of the domain in regions of local stationarity we use Bayesian Model Averaging. In the second model, the non-stationarity of the spatial process is due to inhomogeneities in the decay of the spatial correlation. To identify such regions of varying spatial correlation decay, we expand the Multi-Resolution Approximation (M-RA) method of Katzfuss (JASA 2017) and define a non-stationary model, by providing the basis function coefficients with a mixture shrinkage prior.

We illustrate the two models through an application in soil science and air pollution.

## References

No references provided at this time

## February 14, 2018, 1:15pm – 2:15pm

*Lecture: Convergence of Representation Learning and Network Analysis*

**Location:** SAMSI Classroom

**Speaker:** Peter Diao, Second-Year SAMSI Postdoctoral Fellow and Researcher at the University of North Carolina at Chapel Hill

## Abstract

This talk will report on an early stage exploration into the use of network analysis techniques for understanding the convergence of the representation learning approaches in machine learning. We follow the approach given in systems neuroscience called representation similarity analysis in order to apply it to representations learned in artificial neural networks. The network science we use will be based on our previous work on graph limits and cutnorm metric. Much of talk will go over basic ideas in neural networks and representation learning.

## References

- Bengio, Yoshua, Aaron Courville, and Pascal Vincent. “Representation learning: A review and new perspectives.” IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.
- Kriegeskorte, Nikolaus, Marieke Mur, and Peter A. Bandettini. “Representational similarity analysis-connecting the branches of systems neuroscience.” Frontiers in systems neuroscience2 (2008): 4.
- Python Software: https://pypi.python.org/pypi/cutnorm/0.1.5

## February 21, 2018, 1:15pm – 2:15pm

*Lecture: Probabilistic Templates for Astronomical Lightcurves*

**Location:** SAMSI Classroom

**Speaker:** David Jones, Second-Year SAMSI Postdoctoral Fellow and Researcher at the University of North Carolina at Chapel Hill

## Abstract

The intensity of many of the most interesting astronomical sources (e.g., RR Lyrae stars) varies periodically as a function of time producing a “lightcurve”, which can be used to classify the type of source. Since telescope time is limited, real-time source classification involves a number of decisions including carefully selecting which sources to observe, the instrument(s) to observe them with, and the future time points at which to observe them.

In this talk we introduce a Bayesian non-parametric hierarchical lightcurve model and use it to construct probabilistic templates for the lightcurve shapes characteristic of each source class. The next step in our ongoing work is to use our probabilistic templates to perform soft source classification and to find the optimal times at which new observations should be collected in order to improve classification. We will also discuss the construction of probabilistic templates for lightcurve classes which are not completely homogeneous but rather vary in a systematic way according to some physical parameter or covariate. We illustrate our ideas using lightcurves from the Catalina Real-Time Transient Survey.

## References

No references provided at this time

## February 28, 2018, 1:15pm – 2:15pm

*Lecture: How Proper are Bayesian models in Astronomical Literature?*

**Location:** SAMSI Classroom

**Speaker:** Hyungsuk Tak, SAMSI Postdoctoral Fellow

## Abstract

The well-known Bayes theorem assumes that a posterior distribution is a probability distribution. However, the posterior distribution may no longer be a probability distribution if an improper prior distribution (non-probability measure) such as an unbounded uniform prior is used. Improper priors are often used in the astronomical literature to reflect on a lack of prior knowledge, but checking whether the resulting posterior is a probability distribution is sometimes neglected. It turns out that 24 articles out of 75 articles (32%) published online in two renowned astronomy journals (ApJ and MNRAS) between Jan 1, 2017 and Oct 15, 2017 make use of Bayesian analyses without rigorously establishing posterior propriety. A disturbing aspect is that a Gibbs-type Markov chain Monte Carlo (MCMC) method can produce a seemingly reasonable posterior sample even when the posterior is not a probability distribution (Hobert and Casella, 1996). In such cases, researchers may erroneously make probabilistic inferences without noticing that the MCMC sample is from a non-existent probability distribution. We review why checking posterior propriety is fundamental in Bayesian analyses when improper priors are used and discuss how we can set up scientifically motivated proper priors to avoid the pitfalls of using improper priors.

## References

- Hobert, J. P. and Casella, G. (1996).
**“The Effect of Improper Priors on Gibbs Sampling in Hierarchical Linear Mixed Models.”***Journal of the American Statistical Association*, 91(436):1461–1473.

## March 7 2018, 1:15pm – 2:15pm

*Lecture: Estimating Extreme Strom Surge Levels: A Statistical Perspective*

**Location:** SAMSI Classroom

**Speaker:** Whitney Huang, SAMSI Postdoctoral Fellow

## Abstract

Storm surge is an abnormal rise of water, largely induced by the strong winds of a hurricane, that could cause tremendous damage in coastal areas. Therefore, it is critically important to estimate the surge levels especially those extreme ones. However, the estimation of surge levels poses an unique statistical challenge due to the rareness of hurricanes in space and time. To overcome this difficulty, the join probability modeling of hurricane characteristics combined with hydrodynamic simulations is currently the recommended method by the Federal Emergency Management Agency (FEMA) for calculating the extreme surges in terms of 10-, 50-, 100-, and 500-year return levels.

In this talk, I will present the FEMA’s approach from a statistical perspective starting from the estimation of the distributions of hurricane characteristics to the design and analysis of the hydrodynamic simulations. I will highlight the challenges and how we might improve the current practice in terms of estimation and uncertainty qualification.

## References

## March 14, 2018, 1:15pm – 2:15pm

*Lecture: Inference on the Future State of the Climate Through Combining Multiple Interdependent Climate Model Outputs with Observations using Bayesian Hierarchical Models*

**Location:** SAMSI Classroom

**Speaker:** Huang Huang, SAMSI Postdoctoral Fellow

## Abstract

Climate scientists have been developing a lot of climate models for variables of interest, like temperature, pressure, based on physical dynamics. Due to different techniques in implementing the climate models and the uncertainty in the climate system, the variable values are not identical from different model outputs. Actually, they can be treated as a representation of the climate system. Many statistics scientists keep working on sensible statistical models to combine different climate model outputs, but most of them assume all the climate model outputs are exchangeable. In fact, many climate models may have similar origins or share common components, leading to dependence among each model. In this work, we present a Bayesian hierarchical model to account for the model dependence, which gives a good inference for the underlying process for the variable of interest. In addition, we use spatial Gaussian random field to allow for spatial correlation in the modeling, offering us a sensible map for the inference on the future state of the climate.

## References

No references provided at this time

## March 15, 2018, 11:00am – 12:00pm

*Special Guest Lecture: Adaptive covariance inflation in the EnKF by Gaussian scale mixtures*

**Location:** SAMSI Classroom

**Speaker:** *Patrick N. Raanes*, Postdoctoral Fellow at Nansen Environmental and Remote Sensing Center

## Abstract

We study inflation: the complementary scaling of the state covariance in the ensemble Kalman filter (EnKF). Firstly, error sources in the EnKF are catalogued and discussed in relation to inflation; nonlinearity is given particular attention. Then, the “finite-size” refinement known as the EnKF-N is shown to be a Gaussian scale mixture, again demonstrating its connection to inflation. Existing methods for adaptive inflation estimation are reviewed. One such method is selected to complement the EnKF-N to make a hybrid that is suitable for contexts with model error. Benchmarks are obtained from experiments with the two-scale Lorenz model where only the slow scale is resolved. The proposed hybrid EnKF-N method of adaptive inflation is found to yield systematic accuracy improvements in comparison with the existing methods, albeit to a moderate degree.

## References

No references provided at this time

## April 4, 2018, 11:00am – 12:00pm

*Special Guest Lecture: Cooking the GOOS: Our increasingly sophisticated network of ocean observations and its use in the detection of climate change*

**Location:** SAMSI Classroom

**Speaker:** *Frederick M. Bingham*, Professor of Physics, University of North Carolina Wilmington

## Abstract

The ocean is difficult to observe. Past attempts to do so have been ad hoc and fraught with sampling bias that made it difficult to reliably detect long-term changes. Fortunately in the past decade or two ocean observing systems have been developing to remedy this gaping hole in our knowledge of the climate system. In this talk I will review current systems to observe ocean temperature, salinity, and mass, sea level, surface productivity, surface winds and currents. I will also discuss some of the important results that have emerged over the past few years and the new directions that the global ocean observing system is headed.

## References

No references provided at this time

## April 4, 2018, 1:15pm – 2:15pm

*Lecture: Multivariate Spectral Downscaling for Multiple Air Pollutants*

**Location:** SAMSI Classroom

**Speaker:** Yawen Guan, SAMSI Postdoctoral Fellow

## Abstract

Fine particulate matter (PM2.5) is a mixture of air pollutants that, at a high concentration level, has adverse effects on human health. The speciated fine PM have complex spatial-temporal and cross dependence structures that should be accounted for in estimating the spatial-temporal distribution of each component. Two major sources of air quality data are used: monitoring data and the Community Multiscale Air Quality (CMAQ) model. The monitoring stations provide fairly accurate measurements of the pollutants, however they are sparse in space and take measurements at a coarse time resolution, typically 1-in-3 or 1-in-6 days. On the other hand, the CMAQ model provides daily concentration levels of each component with complete spatial coverage on a grid; these model outputs, however, need to be evaluated and calibrated to the monitoring data.

In this talk, I will provide a brief introduction to the data and present a statistical method to combine these two data sources for estimating speciated PM2.5 concentration. Our method models the complex relationships between monitoring data and CMAQ output at different spatial resolutions, and we model the spatial dependence and cross dependence among the components of speciated PM2.5. We apply the method to compare Community Multiscale Air Quality (CMAQ) model output with speciated PM 2.5 measurements in the United States in 2011.

## References

No references provided at this time

## April 18, 2018, 11:00am – 12:00pm

*Lecture: From Numerical Integration on Smooth manifolds to Sampling in a Big Data Context*

**Location:** SAMSI Classroom

**Speaker:** Matthias Sachs, SAMSI Postdoctoral Fellow

## Abstract

In this talk I will present results from two projects which I have been working on during my time at SAMSI.

(i) Support point construction via heat kernel repulsion: I discuss the classical problem of how to pick N weighted points on a d−dimensional manifold so as to obtain a reasonable quadrature rule. This is joint work with Jianfeng Lu and Stefan Steinerberger.

(ii) Hypercoercivity estimates for adaptive Langevin dynamics (AdLD): In recent years adaptive Langevin dynamics has become a popular method for scalable Bayesian posterior sampling, where it is typically used in combination with sub-sampling of the data to reduce the computational cost per gradient evaluation. While AdLD has proven to be a reliable sampling method in many practical applications, the ergodic properties of the underlying SDE dynamics are only poorly understood. In this talk I will present novel results on the convergence properties of AdLD which in particular allow the derivation of a central limit theorem for AdLD. This is joint work with Gabriel Stoltz and Benedict Leimkuhler.

## References

No references provided at this time

## April 18, 2018, 1:15pm – 2:15pm

*Lecture: Nonsubsampled Graph Filter Banks and Distributed Implementation*

**Location:** SAMSI Classroom

**Speaker:** Cheng Cheng, SAMSI Postdoctoral Fellow

## Abstract

Graph signal processing provides an innovative framework to process data on graphs. A proper definition of the down-sampling and up-sampling procedures is not obvious especially when the residing graph is of large order and complicated topological structure. In this talk, I consider nonsubsampled graph filter banks (NSGFBs) which does not include down-sampling and up-sampling procedures, to process data on a graph in a distributed manner. For an NSGFB on a graph of large order, a distributed implementation has significant advantages, since data processing and communication demands for the agent at each vertex depend mainly on the topological structure of its small neighborhood. In this talk, I will introduce an iterative distributed algorithm to implement the proposed NSGFBs. Based on NSGFBs, we also develop a distributed denoising technique which is demonstrated to have satisfactory performance on noise suppression.

## References

No references provided at this time

## April 25, 2018, 1:15pm – 2:15pm

*Lecture: Data Driven Explorations Aimed at the Improvement of Sea Ice Modeling*

**Location:** SAMSI Classroom

**Speaker:** Christian Sampson, SAMSI Postdoctoral Fellow

## Abstract

In this talk I will highlight some recent work investigating novel data sets and analysis techniques for sea ice at several scales with the aim of improving sea ice modeling at those scales. I will discuss classifying the complexity of sea ice micro-structure and it’s relation to transport properties through its porous micro-structure, the evolution of melt ponds and sea ice topography, and large scale ice deformation and cracking along with metrics for the comparison between models and satellite data.

## References

No references provided at this time

## May 2, 2018, 11:00am – 12:00pm

*Lecture: Multisensor Fusion of Remotely Sensed Vegetation Indices using Space-Time Dynamic Linear Models *

**Location:** SAMSI Classroom

**Speaker:** Maggie Johnson, SAMSI Postdoctoral Fellow

## Abstract

Characterizing growth cycle events in vegetation, such as spring green-up, from massive spatiotemporal remotely sensed vegetation index datasets is desirable for a wide area of applications. For example, the timings of plant life cycle events are very sensitive to weather conditions, and are often used to assess the impacts of changes in weather and climate. Likewise, quantifying and predicting changes in crop greenness can have a large impact on agricultural strategies. However, due to the current limitations of imaging spectrometers, remote sensing datasets of vegetation with high temporal frequency of measurements have lower spatial resolution, and vice versa. In this research, we propose a space-time dynamic linear model to fuse high temporal frequency data (MODIS) with high spatial resolution data (Landsat) to create daily, 30 meter resolution data products of a vegetation greenness index. The method models spatiotemporal dependence within and across different landcover types with a multivariate Matern latent process and is able to handle the spatial change-of-support problem, as well as the high percentage of missing values present in the data. To handle the massive size of the data, we utilize a fast variogram/crossvariogram estimation procedure, and a moving window Kalman smoother to produce a daily, 30 meter resolution product with associated uncertainty.

## References

No references provided at this time

## May 2, 2018, 1:15pm – 2:15pm

*Lecture: New Statistical Methods for Ocean Heat Content Estimation with Argo Profiling Floats*

**Location:** SAMSI Classroom

**Speaker:** Mikael Kuusela, SAMSI Postdoctoral Fellow

## Abstract

Over 90% of the net energy increase of the Earth’s climate system is stored as heat energy in the oceans. The Argo array of profiling floats is uniquely capable of measuring these changes in the ocean heat content. A number of previous Argo-based analyses have established robust warming of the global ocean, but the observed rates vary greatly between the different analyses. We study the sensitivity of the Argo ocean heat content estimates to the underlying statistical assumptions and show preliminary results that indicate that careful statistical modeling is needed in order to avoid biases in the heat content estimates.

## References

No references provided at this time