## September 13, 2017, 1:15pm – 2:15pm

*Lecture: Detecting Planets in the Presence of Stellar Activity*

**Location:** SAMSI Classroom

**Speaker:** David Jones, Duke University

## Abstract

The radial velocity technique is one of the two main approaches for detecting planets outside our solar system, or exoplanets as they are known in astronomy. When a planet orbits a star it causes the star to move and this induces a Doppler shift (i.e. the star light appears redder or bluer than expected), and it is this effect that the radial velocity method attempts to detect. Unfortunately, these Doppler signals are typically contaminated by various “stellar activity” phenomena, such as dark spots on the star surface. A principled approach to recovering planet Doppler signals in the presence of stellar activity was proposed by Rajpaul et al. (2015), and involves the use of dependent Gaussian processes to jointly model the corrupted Doppler signal and multiple proxies for stellar activity.

We build on this work in two ways: (i) we propose using dimension reduction techniques to construct more informative stellar activity proxies; (ii) we extend the Rajpaul et al. (2015) model to a larger class of models and use a model comparison procedure to select the best model for the particular stellar activity proxies at hand. Our approach results in substantially improved statistical power for planet detection than using existing stellar activity models in the astronomy literature. Future work will move beyond our current class of models by making use of kernel-learning methods.

## References

No references provided at this time

## September 20, 2017, 1:15pm – 2:15pm

*Lecture: Disjunctive Cuts for Mixed-Integer Conic Programs*

**Location:** SAMSI Classroom

**Speaker:** Sercan Yildiz, SAMSI Postdoctoral Fellow

## Abstract

Mixed-integer linear programming (MILP) provides a powerful and versatile framework for optimization problems which require discrete decisions. However, many optimization problems of practical interest cannot be modeled with linear constraints alone. Mixed-integer conic programming (MICP) captures nonlinear relationships between the decision variables with conic constraints and enhances the representation power of MILP. Inspired by the practical success of cutting-planes in MILP, we consider in this talk inequalities derived from two-term disjunctions on regular cones. These inequalities can be used to strengthen generic problem formulations in MICP. In the cases where the cone under consideration is the second-order cone or the positive semidefinite cone, we show that the convex hull of the disjunction admits a simple tractable description in the original space under certain conditions. We also provide low-complexity convex relaxations that can be used as cuts when these conditions are not satisfied.

## References

- Fatma Kilinc-Karzan and Sercan Yildiz. Two-term disjunctions on the second-order cone. Mathematical Programming Ser B., 154(1):463–491, 2015.
- Sercan Yildiz and Fatma Kilinc-Karzan. Low-complexity relaxations and convex hulls of disjunctions on the positive semidefinite cone and general regular cones. Optimization Online preprint (2016). http://www.optimization-online.org/DB_HTML/2016/04/5398.html.

## September 27, 2017, 1:15pm – 2:15pm

*Lecture: Two novel statistical methods from astro-statistics*

**Location:** SAMSI Classroom

**Speaker:** Hyungsuk Tak, SAMSI Postdoctoral Fellow

## Abstract

I introduce two statistical methods motivated by two astrophysical problems. The first one is a new Markov chain Monte Carlo method for multi-modality, called the repelling-attracting Metropolis (RAM) algorithm, that maintains the simple-to-implement nature of the Metropolis algorithm, but is more likely to jump between modes. The RAM algorithm is a Metropolis-Hastings algorithm with a proposal that consists of a downhill move in density that aims to make local modes repelling, followed by an uphill move in density that aims to make local modes attracting. The downhill move is achieved via a reciprocal Metropolis ratio so that the algorithm prefers downward movement. The uphill move does the opposite using the standard Metropolis ratio which prefers upward movement.

The second one is a mixture of Gaussian and Student’s t measurement errors for robust and accurate inference. A Gaussian error assumption, i.e., an assumption that the data are observed up to Gaussian noise, can bias any parameter estimation in the presence of outliers. A heavy tailed error assumption based on Student’s t-distribution helps reduce the bias, but it may be less efficient in estimating parameters if the heavy tailed assumption is uniformly applied to most of normally observed data. The proposed mixture error assumption selectively converts Gaussian errors into Student’s t errors according to latent outlier indicators, leveraging the best of the Gaussian and Student’s t errors; a parameter estimation becomes not only robust but also accurate.

## References

- H. Tak, X.-L. Meng, and D. A. van Dyk (2017+) “A Repelling-Attracting Metropolis Algorithm for Multimodality,” Journal of Computational and Graphical Statistics, to appear (arXiv preprint 1601.05633).
- H. Tak, J. A. Ellis, and S. K. Ghosh (2017+) “Robust and Accurate Inference via a Mixture of Gaussian and Student’s $t$ Errors,” submitted (arXiv preprint 1707.03057).

## October 4, 2017, 1:15pm – 2:15pm

*Lecture: Analysis of Brain Functional Connectivity using Dense Graph Limit Theory*

**Location:** SAMSI Classroom

**Speaker:** Peter Diao, SAMSI Postdoctoral Fellow

## Abstract

Dense graphs arise naturally in the context of computing correlation matrices or similitary matrices between large numbers of time series. Such correlation matrices arise in the context of neuroscience through the analysis of activity patterns in brain recordings. Our work explores the use of the cut-norm, a norm defined for the analysis of large dense networks, for comparing families of dense graphs. Our main contribution is to find a practical algorithm for approximating the cut norm. Our method, background theory, and results on neuroscience data will be presented.

## References

- Alon, N., & Naor, A. (2006). Approximating the cut-norm via Grothendieck’s inequality. SIAM Journal on Computing, 35(4), 787-803.
- Ansariola, M., Megraw, M., & Koslicki, D. (2017). IndeCut evaluates performance of network motif discovery algorithms. bioRxiv, 156836.
- Poldrack, R. A., Laumann, T. O., Koyejo, O., Gregory, B., Hover, A., Chen, M. Y., … & Hunicke-Smith, S. (2015). Long-term neural and physiological phenotyping of a single human. Nature communications, 6, 8885.
- Wen, Z., & Yin, W. (2013). A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142(1-2), 397-434.

## October 11, 2017, 1:15pm – 2:15pm

*Lecture: Computational and Statistical Tradeoffs in Analyzing Highly Distributed Data:*

An Introduction to “Theory of Data Systems”

An Introduction to “Theory of Data Systems”

**Location:** SAMSI Classroom

**Speaker:** Maggie Johnson, SAMSI Postdoctoral Fellow

## Abstract

Massive datasets produced, for example, from remote sensing and climate models are often highly distributed in data centers around the world. This presents a unique challenge in the statistical methods used to analyze such datasets, as methods may be subject to computational and storage constraints due to the size of the data, and to architectural (or movement) constraints related to the topology of a distributed data system. In particular, spatial and spatiotemporal statistical methods for analysis of such data usually require moving data to a centralized location and the cost and feasibility of this movement is often not taken directly into account. The purpose of this talk is to give an overview of the problem we aim to address: how to optimize the tradeoffs between distributed system/software architecture (i.e. the costs of computing and moving data), time constraints (e.g. analysis of streaming data), and the quality of statistical inference. We introduce the problem using an example of kriging in a distributed data setting, and discuss the challenges posed by the problem and possible directions for our research.

## References

No references provided at this time

## October 18, 2017, 1:15pm – 2:15pm

*Lecture: Metrics for Evaluating Sea Ice Models*

**Location:** SAMSI Classroom

**Speaker:** Yawen Guan, SAMSI Postdoctoral Fellow

## Abstract

Arctic sea ice plays an important role in the global climate. Sea ice models governed by physical equations have been used to simulate the state of the ice including features such as ice thickness, concentration, and motion. Recent satellite observations with high spatio-temporal resolution have also provided unique opportunities to examine ice motion and deformation. These multiple disparate data sources prompted the research questions in our working group: How do we evaluate the skill of models in simulating ice features? How do we identify numerical model parameter space that produces realistic state of the ice? I will discuss some current approaches and potential methods for validating sea ice models.

## References

No references provided at this time

## October 25, 2017, 1:15pm – 2:15pm

*Lecture: Locally stationary spatio-temporal interpolation of Argo profiling float data*

**Location:** SAMSI Classroom

**Speaker:** Mikael Kuusela, SAMSI Postdoctoral Fellow

## Abstract

Argo floats measure sea water temperature and salinity in the upper 2,000 m of the global ocean. The statistical analysis of the resulting spatio-temporal dataset is challenging due to its non-stationary structure and large size. We propose mapping these data using locally stationary Gaussian process regression where covariance parameter estimation and spatio-temporal prediction are carried out in a moving-window fashion. This yields computationally tractable non-stationary anomaly fields without the need to explicitly model the non-stationary covariance structure. We also investigate Student-t distributed microscale variation as a means to account for non-Gaussian heavy tails in Argo data. We use cross-validation to study the point prediction and uncertainty quantification performance of the proposed approach. We demonstrate clear improvements in the point predictions and show that accounting for the non-stationarity and non-Gaussianity is crucial for obtaining well-calibrated uncertainties. The approach also provides data-driven local estimates of the spatial and temporal dependence scales which are of scientific interest in their own right.

## References

Joint work with Michael L. Stein (UChicago)

## November 8, 2017, 1:15pm – 2:15pm

*Lecture: Some Thoughts on Joint Probability Method for Estimating Storm Surges*

**Location:** SAMSI Classroom

**Speaker:** Whitney Huang, SAMSI Postdoctoral Fellow

## Abstract

The estimation of hurricane-induced storm surges is critically important to quantifying risks in coastal areas. The join probability method (JPM) combined with hydrodynamic simulations is currently the recommended method by the Federal Emergency Management Agency (FEMA) for calculating the magnitude of surges in terms of 10-, 50-, 100-, and 500-year return levels. The purpose of this talk is to i) introduce the main idea of JPM in the hurricane surges context; ii) describe the optimal sampling strategies in JPM as a means to reduce computational burden of running hydrodynamic simulations; iii) make a clearer link between the JPM and the tail-modeling aspects of extreme value analysis.

## References

No references provided at this time

## November 15, 2017, 1:15pm – 2:15pm

*Lecture: On the Construction of Scalable Markov Chain Monte Carlo Methods via Ergodic Stochastic Differential Equations*

**Location:** SAMSI Classroom

**Speaker:** Matthias Sachs, SAMSI Postdoctoral Fellow

## Abstract

In this talk I will present the general context of my research which I am carrying out in the course of the SAMSI QMC program. I will primarily focus on Monte-Carlo methods and give a general introduction on how Markov-chain Monte Carlo methods can be constructed as discretisations of a solutions of ergodic stochastic differential equations. I will discuss how a Metropolis acceptance-rejection criteria and the omission of the same affects the performance and scalability to high dimensional sampling problems of such constructed sampling methods both in terms of the variance of estimates as well as in terms of asymptotic bias. I will discuss applications of the theory in machine learning, namely stochastic gradient methods for Bayesian inference on large datasets.

## References

No references provided at this time

## November 29, 2017, 1:15pm – 2:15pm

*Lecture: To Be Determined*

**Location:** SAMSI Classroom

**Speaker:** Huang Huang, SAMSI Postdoctoral Fellow

## Abstract

To Be Announced

## References

To Be Announced

## December 6, 2017, 1:15pm – 2:15pm

*Lecture: To Be Determined*

**Location:** SAMSI Classroom

**Speaker:** Christian Sampson, SAMSI Postdoctoral Fellow

## Abstract

To Be Announced

## References

To Be Announced

## December 13, 2017, 1:15pm – 2:15pm

*Lecture: To Be Determined*

**Location:** SAMSI Classroom

**Speaker:** Cheng Cheng, SAMSI Postdoctoral Fellow

## Abstract

To Be Announced

## References

To Be Announced