*How to give a research talk*

**September 14, 2011, 1:30pm – 2:30pm**

Speaker: Richard Smith

### Abstract

The first talk will be about how to give a research talk. Although the seminar is intended primarily for the SAMSI and NISS postdocs, everyone is welcome to attend.

*1. David Sivakoff (SAMSI & Duke): Contact process on modular random graphs*

*2. Sylvie Tchumtchoua (SAMSI & Duke): Flexible Online Bayesian Methods for High-dimensional Data*

**September 28, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speakers: David Sivakoff and Sylvie Tchumtchoua

### Abstract 1

We studied the contact process (or SIS epidemic) on a pair of dense networks with sparse connections between them. I will give an intuitive derivation for the distribution of the time at which the contact process jumps from one part of the network to the other.

### Abstract 2

High-dimensional data with hundreds of thousands of observations are becoming commonplace in many disciplines. For example, in medical research, neuroscience and psychology, images consisting of hundreds of thousands of voxels/pixels are collected at several time points on multiple subjects.

Statistical methods for analyzing the data all at once are computationally infeasible as they require storing the entire data set into memory, which is impossible with most statistical packages. We consider two online Bayesian methods for analyzing such data: online variational Bayes approximations and compressive sensing. These ideas were originally developed in other contexts in the machine learning community. We consider such approaches in widely used classes of statistical models such as hierarchical regression models and structural equation models.

*Your Career – The Big Picture of Networking & LinkedIn*

**October 5, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speaker: Dan Galloway

### Abstract

Dan Galloway is a professional speaker and Life Coach. http://www.thinkbigpicture.com/

*Snakes and Ladders: Building a Quantitative Career*

**October 12, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speaker: David Banks (Duke University)

### Abstract

Charting your career is like preparing a five course dinner. You have to decide what you want to create, balance the menu, and plan the timing carefully. As researchers who understand stochasticity, we know that the smart cook has back-up plans and reserves to draw upon. This talk discusses some of the strategies one can use to craft a lifetime of activity in your profession, under the premise that one wants to progress into different roles over time. Specific topics include professional societies, networking, time management, skill building, and research collaboration.

### Bio

David Banks is a professor in the Department of Statistical Science at Duke University. He got his Ph.D. in statistics in 1984 from Virginia Tech, and worked at Berkeley, Cambridge, Carnegie Mellon, the National Institute of Standards and Technology, the Department of Transportation, and the Food and Drug Administration before joining Duke.

*1. Nathaniel Burch: Data Assimilation and Smart Material Models*

*2. Alexander Chen: Modeling the Diffusion of Viruses Through Mucus*

**October 26, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speakers: Nathaniel Burch (SAMSI) and Alexander Chen ( SAMSI)

### Abstract 1

This talk is a brief introduction to both data assimilation and smart material models. Data assimilation combines information from imperfect model forecasts and noisy observations to produce an optimal representation of the state of a physical system. Model discrepancy is often the main source of uncertainty, due largely to the nonlinear and chaotic nature of realistic models. However, a general framework for dealing with model discrepancy does not exist. One application of data assimilation is to control systems involving smart materials. We introduce the homogenized energy model for smart materials and identify an area where data assimilation can be applied.

### Abstract 2

The mucus layer constitutes one of the most important immune defense mechanisms in the human body. Acting as a buffer between invasive microorganisms and important tissue inside the body, mucus also houses mucin and antibodies, which combine to attack and immobilize viruses. Of great interest is a model for the movement of viruses–primarily based on diffusion–and the “survival probability” of the viruses through the mucus layer. Potential applications include the ability to engineer antibodies with properties that are conducive to lowering the survival probability of viruses.

*1. John Jakeman: A comparison of Gaussian Process models and Polynomial Chaos methods for Uncertainty Quantification*

*2. Stan Young: Principal components analysis versus non-negative matrix factorization*

**November 2, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speakers: John Jakeman (Purdue) and Stan Young (NISS)

### Abstract 1

Gaussian Process (GP) models and Polynomial Chaos (PC) methods are frequently used to quantify uncertainty in numerical models. When a model is computationally intensive to evaluate many uncertainty quantification (UQ) methods, such as Monte-Carlo sampling, become infeasiable. GP and PC methods have been frequently used to build model surrogates that can in many cases be constructed from relatively few models runs. These surrogates can be evaluated extremely cheaply, and thus drastically increase the speed of UQ analyses. PC and GP models varying in their construction and the information they provide. In this talk I will discuss the difference between these two methods in the context of two simple problems.

### Abstract 2

Principal components analysis (PCA) is a standard exploratory technique for trying to get a sense of what is going on with a two-way table of data. The underlying math of PCA is a singular value decomposition (SVD). If there is sense to be made of the data, the projection to a lower dimensional space, the scores, makes sense. Understanding the loadings, elements of the right eigenvectors, of PCA is often problematic. If the two-way table consists of non-negative numbers, then non-negative matrix factorization (NMF) offers interpretive advantages. I review a simple way to compute a SVD that shows how the problems with PCA/SVD arise. An analysis of a small simulated data set is used to illustrate the advantages of NMF. I give some more complex examples from metabolomics. PCA is widely used. Where possible, replacing PCA with NMF will typically greatly simplify the interpretation of the two-way table. I may also include a few slides on Variable Importance at the end of the talk.

*1. Ying Sun: Functional boxplots for visualization of complex curve or image data: an application to precipitation and climate model output*

*2. Chia Ying Lee: Techniques for rare event simulation*

**November 9, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speakers: Ying Sun (SAMSI) and Chia Ying Lee (SAMSI)

### Abstract 1

In many statistical experiments, the observations are functions by nature, such as temporal curves or spatial surfaces/images, where the basic unit of information is the entire observed function rather than a string of numbers. For example, the temporal evolution of several cells, the intensity of medical images of the brain from MRI, the spatio-temporal records of precipitation in the U.S., or the output from climate models are such complex data structures. Our interest lies in the visualization of such data and the detection of outliers. With this goal in mind, we have defined functional boxplots and surface boxplots. Based on the center outwards ordering induced by band depth for functional data or surface data, the descriptive statistics of such boxplots are: the envelope of the 50% central region, the median curve/image, and the maximum non-outlying envelope. In addition, outliers can be detected in a functional/surface boxplot by the 1.5 times the 50% central region empirical rule, analogous to the rule for classical boxplots. We illustrate the construction of a functional boxplot on a series of sea surface temperatures related to the El Nino phenomenon, and its outlier detection performance is explored by simulations. As applications, the functional boxplot is demonstrated on spatio-temporal U.S. precipitation data for nine climatic regions and on climate general circulation model (GCM) output. Further adjustments of the functional boxplot for outlier detection in spatio-temporal data are discussed as well.

### Abstract 2

Monte Carlo sampling has been widely applied to compute expected values of quantities of interest, such as the probabilities of an event to occur. The efficiency of Monte Carlo sampling depends on the variance of the quantity to be computed, which, for rare events having very small probabilities of occurrence, can be extremely large. Consequently, methods to speed up Monte Carlo sampling have been actively studied. These include importance sampling, and particle splitting methods. In this talk, we discuss several such methods for computing rare event probabilities, and in particular appeal to the theory of large deviations to guide the design of the importance sampling or particle splitting schemes. We will also discuss some possible directions of open problems relating to rare event simulation.

*1. Andreas Aristotelous: Towards A Multiscale Agent Based Cartilage Regeneration Model (Overview)*

*2. Facundo Muñoz: Productivity tools for research*

**November 16, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speakers: Andreas Aristotelous (SAMSI) and Facundo Muñoz (València)

### Abstract 1

Osteoarthritis is one of the most common joint degenerative diseases, resulting in the erosion of joint surfaces and loss of mobility. Recent research efforts have focused on tissue engineering as a promising approach for cartilage regeneration and repair. Mathematical modeling is important in the development and testing of various tissue restoration techniques. Agent based models are widely used in the biomedical field. Of interest to us is the development of an agent based model that describes the interactions happening in different scales inside the cartilage tissue. Information from such a model can potentially be used in improving cartilage tissue engineering methods in the lab.

### Abstract 2

A review of some of the most useful tools that I have found for everyday work in Research. Some of these tools are software, some others are web services, but they all make our work easier, faster, more organized, or simply better. They are categorized into four groups: tools aimed at Production, Collaboration, Management, and Synchronization. I expect that participants will also share their own tools, tips, and tricks that they have found helpful.

*1. Jenný Brynjarsdóttir: Calibration and model discrepancy*

*2. Jessi Cisewski: Inverse function-based inference: inverse sensitivity analysis*

**November 30, 2011, 1:30pm – 2:30pm**

SAMSI, room 150

Speakers: Jenný Brynjarsdóttir (SAMSI) and Jessi Cisewski (UNC Chapel Hill)

### Abstract 1

When quantifying uncertainty in computer models it is important to account for all sources of uncertainty. One important source of uncertainty is model discrepancy; the difference between reality and the computer model output. However, the challenge with incorporating model discrepancy in a statistical analysis of computer models is the confounding with calibration parameters. In this talk we explore a simple example and think of ways to account for model discrepancy in a Bayesian setting, and the effect on calibration of a physical parameter.

### Abstract 2

The Inverse Function-Based Inference working group will be exploring the inverse sensitivity problem from a generalized fiducial framework. In this talk, I will address the specific inverse sensitivity problem the group will be exploring. The setting involves an inverse mapping onto the input space that is completely determined by the deterministic model. A current computational measure-theoretic solution will be introduced.