Postdoctoral Fellow Seminars: Spring 2020

January 8, 2020

Lecture: Kriging: Beyond Matérn

Location: SAMSI Classroom
Speaker: Pulong Ma, Second-Year SAMSI Postdoctoral Fellow

Abstract

Satellite instruments and computer models that simulate physical processes of interest often lead to massive amount of data with complicated structures. Statistical analysis of such data needs to deal with a wide range of challenging problems such as high-dimensionality and nonstationarity. To understand and predict real-world processes, kriging, originated in geostatistics in the 1960s, has been widely used for prediction in spatial statistics and uncertainty quantification (UQ). In the first part of my talk, I shall give a brief overview of my research related to kriging or Gaussian process regression to tackle these challenging issues in various real-world applications. In the second part of my talk, I shall introduce a new family of covariance functions to perform kriging. Over the past several decades, the Matérn covariance function has been a popular choice to model dependence structures. A key benefit of the Matérn class is that it is possible to get precise control over the degree of differentiability of the process realizations. However, the Matérn class possesses exponentially decaying tails, and thus may not be suitable for modeling long range dependence. This problem can be remedied using polynomial covariances; however, one loses control over the degree of differentiability of the process realizations, in that the realizations using polynomial covariances are either infinitely differentiable or not differentiable at all. To overcome this dilemma, a new family of covariance functions is constructed using a scale mixture representation of the Matérn class where one obtains the benefits of both Matérn and polynomial covariances. The resultant covariance contains two parameters: one controls the degree of differentiability near the origin and the other controls the tail heaviness, independently of each other. This new covariance function also enjoys nice theoretical properties under infill asymptotics including equivalence measures, asymptotic behavior of the maximum likelihood estimators, and asymptotically efficient prediction under misspecified models. The improved theoretical properties in predictive performance of this new covariance class are verified via extensive simulations. Application using NASA’s Orbiting Carbon Observatory-2 satellite data confirms the advantage of this new covariance class over the Matérn class, especially in extrapolative settings. This talk concludes with discussions on extrapolation in UQ studies.


January 15, 2020

Lecture: Multi-Resolution Functional ANOVA (MRFA) Emulation

Location: SAMSI Classroom
Speaker: Wenjia Wang, Second-Year SAMSI Postdoctoral Fellow

Abstract

Gaussian process is a standard tool for building emulators for both deterministic and stochastic computer experiments. However, application of Gaussian process models is greatly limited in practice, particularly for large-scale and many-input computer experiments that have become typical. In this talk, a multi-resolution functional ANOVA model will be introduced as a computationally feasible emulation alternative. More generally, this model can be used for large-scale and many-input non-linear regression problems.


January 22, 2020

Lecture: Data-driven Methods for Multi-scale Models of Cell Migration

Location: SAMSI Classroom
Speaker: John Nardini, Second-Year SAMSI Postdoctoral Fellow

Abstract

Human skin cells collectively migrate into a wound area for healthy wound repair; failure of this process leads to so-called non-healing wounds. There is little consensus on why non-healing wounds occur, but they are a significant burden to the US Healthcare system, as they occur in up to 2% of the population and cost $18 billion annually.

In this talk, I will discuss the derivation and analysis of multi-scale mathematical models that can be used to better understand the wound healing process. These modeling formulations include a biochemically-stage structured reaction diffusion equation to incorporate how biochemical signaling pathways may alter cell behavior and a nonlinear diffusion equation model to capture the effects of cell-cell interactions on population-wide migration. Analysis of these equations allows for insight into both healthy and impaired wound dynamics and comparison of these models to biological data allows for parameterization of the models to ensure they are biologically realistic. I will also present some results on how machine learning can aid us in the model development process with either continuum or agent-based models.


January 29, 2020

Lecture: New Classes of Priors Based on Stochastic Orders: Theory and Applications in Reliability

Location: SAMSI Classroom
Speaker: Fabrizio Ruggeri, Italian National Research Council (CNR)

Abstract

In the context of robust Bayesian analysis, we first introduce a new class of univariate prior distributions based on stochastic orders and distortion functions. Then we introduce a new class of multivariate priors based on stochastic orders, multivariate total positivity of order 2 (MTP2) and weighted distributions. We provide the new definitions, their interpretation and the main properties and we also study the relationship with other classical classes of prior beliefs. We also consider metrics (Kolmogorov and Kantorovich in the former case, Hellinger and Kullback-Leibler in the latter) to measure the uncertainty induced by such classes. Finally, we present the application of the former class in the context of fault tree analysis for a spacecraft re-entry example, whereas the latter will be illustrated with an example about train door reliability.


February 5, 2020

Lecture: Adversarial Machine Learning: An Adversarial Risk Analysis Perspective

Location: SAMSI Classroom
Speaker: David Rios, AXA-ICMAT Chair in Adversarial Risk Analysis and Member of the Spanish Royal Academy of Sciences

Abstract

Adversarial Machine Learning (AML) is emerging as a major field aimed at the protection of automated decision systems against potential security threats. The majority of work in this area has built upon the framework of Game Theory by modelling a conflict between an attacker and a defender. In the talk, after reviewing game-theoretic approaches to AML, we discuss the benefits that an Adversarial Risk Analysis framework brings when defending ML based systems. A research agenda is included.


February 12, 2020

Lecture: Health Data Science to Understand Cancer Patients’ Survival and Survivorship Issues: Competing Risks Survival and Comorbidity

Location: SAMSI Classroom
Speaker: Hyunsoon Cho, Associate Professor, Health Data Science, Department of Cancer Control and Population Health, Graduate School of Cancer Science and Policy, National Cancer Center, Korea

Abstract

In this talk, I will demonstrate data science examples of health care big data utilizations to understand cancer patients’ survival and health. Early detection through the national cancer screening program and improvements in treatment resulted in better chances of survival. Thus, understanding both cancer and non-cancer mortality patterns experienced by patients is critical. First, I will address the estimation of competing risks survival analysis based on the population-based cancer registry data and show how the results can inform cancer survivorship policy. To access non-cancer health status and mortality patterns, we further adapted network analysis and data visualization. I will show the estimation of non-cancer mortality patterns from the national mortality database, and analysis of comorbidity patterns from the Korean National Health Insurance Claims big data. My final example will be the utilization of electronic health records from a clinical data warehouse to estimate and predict cardiotoxicity, the major treatment-related adverse event, in cancer patients. I will close this talk with a discussion of opportunities and challenges in this area.


February 19, 2020

Lecture: Sampling of Bayesian posterior distributions in the presence of large data: Adaptive Langevin dynamics and its’ Hypocoercive properties

Location: SAMSI Classroom
Speaker: Matthias Sachs, SAMSI Postdoctoral Fellow

Abstract

Adaptive Langevin dynamics is a method for sampling the Boltzmann–Gibbs distribution at prescribed temperature in cases where the potential gradient is subject to stochastic perturbation of unknown magnitude.   The method replaces the friction in underdamped Langevin dynamics with a dynamical variable, updated according to a negative feedback loop control law as in the Nose–Hoover thermostat. Using a hypocoercivity analysis we show that the law of Adaptive Langevin dynamics converges exponentially rapidly to the stationary distribution, with a rate that can be quantified in terms of the key parameters of the dynamics. This allows us in particular to obtain a central limit theorem with respect to the time averages computed along a stochastic path. Our theoretical findings are illustrated by numerical simulations involving classification of the MNIST data set of handwritten digits using Bayesian logistic regression


February 26, 2020

Lecture: Multivariate spectral downscaling for PM2.5 species

Location: SAMSI Classroom
Speaker: Yawen Guan, Assistant Professor, Department of Statistics, University of Nebraska

Abstract

Fine particulate matter (PM2.5) is a mixture of air pollutants that has adverse effects on human health. Understanding the health effects of PM2.5 mixture and its individual species has been a research priority over the past two decades. However, the limited availability of speciated PM2.5 measurements continues to be a major challenge in exposure assessment for conducting large-scale population-based epidemiology studies. The PM2.5 species have complex spatial-temporal and cross dependence structures that should be accounted for in estimating the spatiotemporal distribution of each component. Two major sources of air quality data are commonly used for deriving exposure estimates: point-level monitoring data and gridded numerical computer model simulation, such as the Community Multiscale Air Quality (CMAQ) model. We propose a statistical method to combine these two data sources for estimating speciated PM2.5 concentration. Our method models the complex relationships between monitoring measurements and the numerical model output at different spatial resolutions, and we model the spatial dependence and cross dependence among PM2.5 species. We apply the method to combine CMAQ model output with major PM2.5 species measurements in the contiguous United States in 2011.


March 4, 2020 – 11:00am – noon

Lecture: Augmented Probability Simulation Methods for Decisions and Games

Location: SAMSI Classroom
Speaker: Tahir Ekin, Associate Professor of Quantitative Methods, McCoy College of Business, Texas State University

Abstract

Expectation-based decision and game theoretic models require both computation/estimation of the objective/utility function and its optimization. This can be computationally challenging especially in cases with continuous and multi-modal sources of uncertainty or complex objective function surfaces. We propose augmented simulation approaches, that treat the decision variable(s) as random, and construct an augmented distribution in the space of both decisions and random variables. Simulation from this distribution simultaneously solves for the expectation of the objective function and optimization problem. In doing so, we sample more frequently from the marginal decision space in that the objective function has higher values in a maximization problem. This talk introduces augmented probability simulation and its extensions to solve for stochastic programming problems and game theoretic models. There will be a discussion and illustration on a variety of applications such as news-vendor type models, service systems and cybersecurity.


March 4, 2020 – 1:15pm-2:15pm

Lecture:  Conformal Spatial Prediction Intervals

Location: SAMSI Classroom
Speaker: Maggie Mao, First-Year SAMSI Postdoctoral Fellow

Abstract

Predicting the response at an unobserved location is a fundamental problem in spatial statistics applications. Traditional methods are model-based, thus there is a risk of model misspecification biases as spatial dependence can be difficult to assess, especially in non-stationary cases. A model-free prediction has been achieved in other contexts using the conformal prediction machinery, which requires the data to be exchangeable.  While exchangeability is a mild assumption in some applications, it is apparently incompatible with spatial dependence in general. However, in this talk, I will show that a wide class of spatial processes are locally approximately exchangeable, which suggests that near-valid predictions can be achieved by using conformal prediction on a suitably dense subset of data points closest to the point at which prediction is desired.  We prove that the proposed local conformal spatial prediction interval is approximately valid, and numerical examples on both real and simulated data, across a range of non-stationary and non-Gaussian settings, confirm that the predictions are both valid and efficient.


March 11, 2020

** SPRING BREAK – No Seminar Scheduled **


March 18, 2020

** SPRING BREAK – No Seminar Scheduled **


March 25, 2020

Lecture: Quantifying and Detecting Individual Level `Always Survivor’ Causal Effects Under `Truncation by Death’ and Censoring Through Time

Location: SAMSI Classroom
Speaker: Jaffer Zaidi, First-Year SAMSI Postdoctoral Fellow

Abstract

The analysis of causal effects when the outcome of interest is possibly truncated by death has a long history in statistics and causal inference. The survivor average causal effect is commonly identified with more assumptions than those guaranteed by the design of a randomized clinical trial or using sensitivity analysis. This paper demonstrates that individual level causal effects in the `always survivor’ principal stratum can be identified with no stronger identification assumptions than randomization. We illustrate the practical utility of our methods using data from a clinical trial on patients with prostate cancer. Our methodology is the first and, as of yet, only proposed procedure that enables detecting individual level causal effects in the presence of truncation by death using only the assumptions that are guaranteed by design of the clinical trial. This methodology is applicable to all types of outcomes. We answer policy relevant questions for informed decision making.


April 1, 2020

Lecture: Normal-bundle Bootstrap

Location: Virtual Presentation
Speaker: Ruda Zhang, First-Year SAMSI Postdoctoral Fellow

Abstract

I will present a method that, given a data set from a probability distribution with salient geometric structure, generates new data sets that preserve the structure. From regression, to deep learning, to topological data analysis, a common feature of the data sets can be summarized by the manifold distribution hypothesis, that natural high-dimensional data concentrate close to a nonlinear low-dimensional manifold. Our method is inspired by constructions in differential geometry and algorithms for nonlinear dimension reduction. As a variant of the bootstrap resampling method,it is useful for the inference of statistical estimators. Our method is also useful for data augmentation, where one wants to increase training data diversity to reduce overfitting, without collecting new data. I’ll also talk about a spin-off paper project for sampling on manifolds.


April 8, 2020

Lecture: Constrained Bayesian Inference through Posterior Projections

Location: Virtual Presentation
Speaker: Deborshee Sen, First-Year SAMSI Postdoctoral Fellow

Abstract

Bayesian approaches are appealing for constrained inference problems in allowing a probabilistic characterization of uncertainty while providing a computational machinery for incorporating complex constraints in hierarchical models. However, the usual Bayesian strategy of placing a prior on the constrained space and conducting posterior computation with Markov chain Monte Carlo algorithms is often intractable. An alternative is to conduct inference for a less constrained posterior and project samples to the constrained space through a minimal distance mapping. We formalize and provide a unifying Bayesian framework for such posterior projections. For theoretical tractability, we initially focus on constrained parameter spaces corresponding to closed and convex subsets of the original space. We then consider non-convex Stiefel manifolds. We provide a general formulation of the projected posterior and show that it corresponds to a valid posterior distribution on the constrained space for particular classes of priors and likelihood functions. We also show that asymptotic properties of the unconstrained posterior are transferred to the projected posterior. Posterior projections are illustrated through multiple examples, both in simulation studies and real data applications.


April 15, 2020

Lecture: RNN-Based Counterfactual Prediction

Location: Virtual Presentation
Speaker: Jason Poulos, First-Year SAMSI Postdoctoral Fellow

Abstract

This paper proposes using recurrent neural networks (RNNs) for estimating the effect of a binary treatment on a continuous outcome in panel data settings where a subset of units is exposed to treatment after an initial time period. The RNNs learn a useful representation of control unit outcomes in previous periods for predicting future outcomes. The model trained on controls is used to predict the counterfactual (untreated) outcomes of treated units. The causal effect of treatment is estimated by contrasting the counterfactual predictions to the observed outcomes of the treated. Under this approach, an unbiased estimate of the treatment effect requires that the control and treated unit outcomes are drawn from the same distribution. We weight the training loss by an estimated treatment propensity score to emphasize the fit of the model for control units most likely to be treated in each time period. We conduct a battery of placebo test experiments to evaluate the performance of RNNs with different network architectures under a variety of panel data settings.


April 22, 2020

Lecture: Causal Inference in the Presence of Interference

Location: Virtual Presentation
Speaker: Michael Hudgens, Professor Department of Biostatistics, UNC Gillings School of Global Public Health

Abstract

A fundamental assumption usually made in causal inference is that of no interference between individuals (or units), i.e., the potential outcomes of one individual are assumed to be unaffected by the treatment assignment of other individuals. However, in many settings, this assumption obviously does not hold. For example, in infectious diseases, whether one person becomes infected may depend on who else in the population is vaccinated. In this talk, we will discuss recent approaches to assessing treatment effects in the presence of interference.


April 29, 2020

Lecture: Statistical Inference for Mean Functions of 3D Functional Objects

Location: Virtual Presentation
Speaker: Xinyi Li, SAMSI

Abstract

Functional data analysis has become a powerful tool for conducting statistical analysis for complex objects, such as curves, images, shapes and manifold-valued data. Among these data objects, 2D or 3D images obtained using medical imaging technologies emerging recently have been attracting researchers’ attention. Examples are functional magnetic resonance imaging (fMRI) and positron emission tomography (PET), which provide a very detailed characterization of brain activity. In general, 3D complex objects are usually collected within the irregular boundary, whereas the majority of existing statistical methods have been focusing on a regular domain. To address this problem, we model the complex data objects as functional data and propose trivariate spline smoothing based on tetrahedralizations for estimating the mean functions of 3D functional objects. The asymptotic properties of the proposed estimator are systematically investigated where consistency and asymptotic normality are established. We also provide a computationally efficient estimation procedure for covariance function and corresponding eigenvalue and eigenfunctions and derive uniform consistency. Motivated by the need for statistical inference for complex functional objects, we then present a novel approach for constructing simultaneous confidence corridors to quantify estimation uncertainty. Extension of the procedure to the two-sample case is discussed together with numerical experiments and a real-data application using Alzheimer’s Disease Neuroimaging Initiative database.


May 6, 2020

Lecture: Design Indentifiability in Regression Methods for Respondent-driven Sampling

Location: Virtual Presentation
Speaker: Mamadou Yauck, First-Year SAMSI Postdoctoral Fellow

Abstract

Respondent-Driven Sampling (RDS) is a form of link-tracing sampling, a technique for sampling `hard-to-reach’ communities that aims to leverage members’ social relationships to reach potential participants. Current analyses of RDS for multivariate modeling suffer (mainly) from three challenges. First, there is no clear guidance on the type of inference (design-based or model-based) to conduct and the framework within which it can be achieved. Second, there is a lack of guidelines on how to deal with homophilic covariates, for which units are more likely to recruit peers with similar traits. Third, there is no consensus on how to properly model dependence within the RDS network. In this presentation, we address these issues.