Past Working Groups

2015

Working Group Description
Multiple Sources of Bias Contact: bias@samsi.info
Working Group Leaders: Sandy Zabell, Cliff Spiegelman
Login: https://sakai.duke.edu/sakai-login-tool/container
Pattern Evidence Contact: patternevidence@samsi.info
Working Group Leaders: Karen Kafadar, Anil Jain
Login: https://sakai.duke.edu/sakai-login-tool/container
Forensic Experiments Contact: experiments@samsi.info
Working Group Leader: Dennis Lin
Login: https://sakai.duke.edu/sakai-login-tool/container
Statistical Evidence Contact: statisticalevidence@samsi.info
Working Group Leaders: Colin Aitken, Anjali Mazumder
Login: https://sakai.duke.edu/sakai-login-tool/container
Possible Matches Contact: matches@samsi.info
Working Group Leader: Len Stefanski
Login: https://sakai.duke.edu/sakai-login-tool/container
Ballistic Images Contact: ballistic@samsi.info
Working Group Leaders: Nell Sedransk, Cliff Spiegelman, Sarena Wiesner
Login: https://sakai.duke.edu/sakai-login-tool/container
Forensic Evidence Contact: evidence@samsi.info
Working Group Leader: Cedric Neumann
Login: https://sakai.duke.edu/sakai-login-tool/container
Clinical Brain Imaging Contact: brain@samsi.info
Working Group Leader: Ciprian Crainiceanu, Johns Hopkins University
Login: https://sakai.duke.edu/sakai-login-tool/container
Computational Approaches to Large-scale 
Inverse Problems with Applications to Neuroscience
Contact: inverse@samsi.info
Working Group Leader: Arvind Saibaba, North Carolina State University
Login: https://sakai.duke.edu/sakai-login-tool/container
Understanding Neuromechanical Processes in
Locomotion with Physical Modeling and Network Analysis
Contact: neuromechanical@samsi.info
Working Group Leaders: Laura Miller, UNC and Katie Newhall, UNC
Login: https://sakai.duke.edu/sakai-login-tool/container
Mathematical and Statistical Approaches to Modeling Brain Networks Contact: networks@samsi.info
Working Group Leaders: Rob Kass, Carnegie Mellon University; Uri Eden, Boston U; Mark Kramer, Boston U
Login: https://sakai.duke.edu/sakai-login-tool/container
Theory of neural networks: structure and dynamics Contact: theoretical@samsi.info
Working Group Leaders: Carina Curto, PSU; Brent Doiron, U. of Pittsburgh; Chris Hillar, MSRI
Login: https://sakai.duke.edu/sakai-login-tool/container
Acquisition, Reconstruction, and Processing of MRI Data Contact: mri@samsi.info
Working Group Leader: Daniel Rowe, Marquette University
Login: https://sakai.duke.edu/sakai-login-tool/container
Imaging Genetics Contact: genetics@samsi.info
Working Group Leader: Hongtu Zhu, UNC
Login: https://sakai.duke.edu/sakai-login-tool/container
Structural Connectivity Contact: connectivity@samsi.info
Working Group Leaders: David Dunson, Duke University; Hongtu Zhu, UNC
Login: https://sakai.duke.edu/sakai-login-tool/container
Functional Imaging Methods and Functional Connectivity Contact: functional@samsi.info
Working Group Leaders: Hernando Ombao, UCI; John Aston, University of Cambridge
Login: https://sakai.duke.edu/sakai-login-tool/container
Big Data Integration in Neuroimaging Contact: neurobigdata@samsi.info
Working Group Leaders: Martin Lindquist and Timothy Johnson
Login: Login: https://sakai.duke.edu/sakai-login-tool/container
Analysis of Optical Imaging Data Contact: optical@samsi.info
Working group leader: Mark Reimers
Login: Login: https://sakai.duke.edu/sakai-login-tool/container

2014

Working Group Description
Fall Course 2014-2015: Stochastic Process Modeling for Ecological Processes Schedule: Wednesdays 4:30 pm at SAMSI, Research Triangle Park, NC, beginning September 3, 2014 No class during week of Thanksgiving: Wednesday Nov 26, 2014 Last class: December 3, 2014 Instructors: L. Miller, A. Lloyd, D. Adalsteinsson, J. Clark, A. Gelfand The central theme of this proposed course is the use of stochastic process models to introduce desired behaviors into models for complex ecological processes. Objective include both emulation and data analysis. Specifically, there will be a focus on (stochastic) pde’s for dynamical systems with application to invasive and emerging infections (Lloyd) and to dispersal and spatial effects in competition models (Miller). There can be demanding coding aspects to this work with the possibility of some flipped classroom “lectures” (Adalsteinsson, Miller). On the more statistical side, the emphasis would be on hierarchical modeling with examples of hierarchical models for inference on seed dispersal and pde’s for population growth (Clark) supplemented with space-time diffusions for invasive species (Gelfand). Model development here would be complemented with some computer labs in R. Registration for this course is being processed through your respective university: UNC-CH: STOR 930 Section 001 (cross-listed with MATH 892 Section 001) Duke: STA 790.04 NCSU: MA 810.002 For additional information about this course, send e-mail to eco@samsi.info

2013

Working Group Description
Kepler Working Group This group is specifically geared toward starting discussions to help people prepare for the actual meeting in June 2013. There are already discussion threads where you are encouraged to: introduce yourself, discuss ideas for focused working groups, and suggest background reading that may be helpful for other participants. Please contribute to via comments section of each discussion. Also, feel free to start a discussion topic of your own. (Membership is limited to invited participants.)
Online Streaming and Sketching Purpose: Methodology and fast algorithms for computing leverage scores, with application to astronomy and genomics. Preliminary list of topics: * Leverage scores: Computation, fast approximation, sensitivity, numerical stability of algorithms, behavior under sketching * Randomized low-rank approximations: Subset selection, CUR, Nyström, PCA, robust PCA, subsampled regression, regression on manifolds, construction of robust linear models, windowed and online streaming approaches * Randomized sketching and importance sampling strategies: Methodology and numerical computation * Local vs global: Eigenvector and invariant subspace localization, eigen-analysis of data connectivity matrices, numerical stability of streaming and updating methods, relation to generalized eigenvalue problems * Data fusion/integration: Robust and fast/streaming methods with application to galaxy formation and evolution
2013-14 Course: Geometric and Topological Summaries of Data and Inference The course will focus on geometric and topological summaries computed from data that are routinely generated across science and engineering. The focus is on modeling objects that have geometric or topological structure. Examples include curves, or surfaces such as bones or teeth, or objects of higher dimension such as positive definite matrices, or subspaces that describe variation in phenotypic traits due to genetic variation, or the geometry of multivariate trajectories generated from cellular processes. Specific topics will include the following. (1) Geometry in statistical inference — Material covered will include recent work in machine learning and statistics on the topics of manifold learning, subspace inference, factor models, and inferring covariance/positive definite matrices. Applications will be used to highlight methodologies. The focus will be on methods used to reduce high-dimensional data to low-dimensional summaries using geometric ideas. (2) Topology in statistical inference — Material covered will focus on probabilistic perspectives on topological summaries such as persistence homology and on inference of topological summaries based on the Hodge operator and the Laplacian on forms. Again, applications will be used to highlight methodologies. (3) Random geometry and topology — Material will cover the geometry and topology induced by random processes. Topics include the topology of random clique complexes, random geometric complexes, limit theorems of Betti numbers of random simplicial complexes. (4) Applications of the Laplacian operator in data analysis — Material will cover the various uses of the Laplacian in data analysis, including manifold learning, spectral clustering, and Cheeger inequalities. More advanced topics will include the Hodge operator or combinatorial Laplacian and applications to data analysis including decomposing ranked data into consistent and inconsistent components, inference of structure in social networks, and decomposing games into parts that have Nash equilibria and parts that cycle. Prerequisites: Background in calculus and linear algebra and some reasonable foundation in statistics and probability. Course Format: The main instructor will be Sayan Mukherjee but there will be several guest lecturers, with material and instructors paralleling certain of the major themes in the 2013-2014 year-long SAMSI program on Low-Dimensional Structure in High-Dimensional Systems (LDHD). All course updates including example projects, reading material, and lecture slides will be posted at http://www.stat.duke.edu/~sayan/SAMSI/.
Fall 2013 Course: Computational Methods for Social Sciences Coordinating Instructor: Richard L. Smith (Department of STOR, UNC) Lead Instructors: David Banks, Department of Statistical Science, Duke; Thomas Carsey, Department of Political Science and Odum Institute, UNC; Peter Mucha, Departments of Mathematics and Applied Physical Sciences, UNC; Jerry Reiter, Department of Statistical Science, Duke Time and Place: Wednesdays from 4:30pm-7:00pm beginning August 28, 2013; Statistical and Applied Mathematical Sciences Institute, 19 T.W. Alexander Drive, RTP, N.C. First Class is Wednesday, August 28, 2013 Course description: The Statistical and Applied Mathematical Sciences Institute (SAMSI) is hosting a year-long research program on Computational Methods for Social Sciences. As part of this program, there will be an advanced graduate course on the topics of the program. The syllabus will cover the three main research themes in the program: (a) Social Networks; (b) Statistical Methods for Censuses and Surveys; (c) Agent-based models. The course will meet once a week at SAMSI and will consist primarily of lectures by senior researchers. The course will be suitable for advanced graduate students in Mathematics, Statistics, Biostatistics or quantitative social sciences (e.g. Sociology, Political Science, Psychology). There are no specific prerequisites. Assessment will be by class presentations or a written project, the exact format to be determined partly based on the number of students participating. For additional information about the course, send e-mail to cmss@samsi.info.
CMSS: Social Networks
CMSS: Causal Inference
CMSS: Censuses and Surveys This workng group is investigating several methods related to computation in surveys and censuses. The group is particularly interested in combining information from multiple sources. One specific application is record linkage. How does one account for uncertainty in imperfect linkage? How effective is creating a joint distribution from available information compared to imperfect, multi-way record linkage? What should one do with survey weights in linked files? Another specific application is merging big data with probability samples. Can we use information from surveys to help generalize analyses from large-scale administrative/private or organic data?
CMSS: Weighting in Surveys
CMSS: Agent-based Models
High-dimensional Graphical Models Graphical models are now a standard tool in statistics, and high-dimensional graphical models have been studied extensively. One of the ideas behind graphical models is to break down a high-dimensional problem into several low-dimensional ones. The difficult problem is model selection: one way to do that is to identify the low-dimensional components and see how they fit together. Buhlmann and Meinhausen (2006) were among the first to introduce the local method of neighborhood regression for model selection among graphical Gaussian models. More recent approaches include local l_1 regularized logistic regression for model selection among discrete Ising models, the concept of sparse local separators, and the usage of neighborhood structure. One of the activities of the working group will be review of the current iterature on local methods for structure estimation in high dimension. Another will be the exploration of geometric and topological local methods for the identification of structure in graphical models. Suggestions for other activities are, of course, welcome.
Data Analysis on Hilbert Manifolds and their Applications The theoretical focus of this Working Group is nonparametric statistics on Hilbert manifolds and dimensionality reduction from infinity to low dimension as small as 1. This will include an extension of CLT for iid variables to infinite dimensional Hilbert manifolds and beyond — to infinite dimensional stratified spaces — as well as the neighborhood hypothesis testing methodology, extending recent results to two or multiple samples on Hilbert manifolds or even to infinite dimensional stratified spaces. Applications to Hilbert manifolds data analysis that will be potentially discussed, depending on the WG participants, could be to any of the following: i. MRI imaging including MRI, DTI, or f-MRI; ii. CT imaging; iii. eye medical imaging, including stereo imaging; iv. 2D and 3D scene recognition from similarity shapes of curves and surfaces; v. projective shape of 3D scenes reconstructed from digital camera imaging; vi. color and texture imaging data; vii. spatial and temporal data on the geoid; viii. plate tectonics data and continental drift; ix. paths of eyes of storms on planet Earth; x. volcanic activity, earthquakes, and other Earth Sciences data; xi. astronomy data; xii. solar system data; xiii. DNA based data.
Nonlinear Low-dimensional Structures in High-dimensions for Biological Data Many problems in biology, particularly at the molecular level, involve very high dimensions. Traditional methods, such as principal component analysis provide, an important first step toward understanding the underlying lower dimensional structure. Nevertheless, finding even lower-dimensional and possibly nonlinear structures requires more qualitative procedures. Some possible candidates, among others, include geodesic principal component analysis, SiZER analysis, and persistent homology. The aim of this Working Group will be to describe and understand the nature of certain biological data coming from areas such as brain artery tree networks, gene networks, biomechanical motion data, and metagenomics, to name a few. We will explore and formulate strategies to best apply these qualitative procedures. In addition, a recent insight into a connection between SiZER analysis and persistent homology will be discussed.
Inference: Dimension Reduction Dimension reduction methods search for low dimensional structure in high dimensional data. Although unsupervised methods such as PCA have been used for over 80 years, new issues in robustness, very high dimensionality p >> n, nonlinear structure, and so on continue to provide rich sources of statistical and computational issues. Supervised methods, such as Sliced Inverse Regression, which look for low dimensional association between predictors and response have been developed within the past 35 years, and many useful extensions continue to be developed. Performing valid inference to correctly account for the impact of dimension reduction including variable selection also poses problems that have not been thoroughly explored. This working group will work on problems of extending unsupervised and supervised methods to some of the interesting new areas such as machine learning, functional data, discrete data, and complex data structures such as networks. As well, we will work on inferential issues.

2012

Working Group Description
SAMSI Fall Course – Operations Research Methods in Healthcare Principal Instructor: V. Kulkarni Course Day and Time: Course will be held at SAMSI (driving directions) in RTP on Wednesdays, 4:30-7:00 p.m. in Room 150. Schedule: First class Wednesday, September 5, 2012 ; last class day, Thursday, December 6, 2012 Course Description: This is a seminar-style course treating application of operations research methods such as stochastic modeling, queuing theory (including fluid models), optimization and simulation to problems in healthcare. Potential problems to be studied are data-based design of healthcare operations, patient flow, scheduling of facilities and personnel, management of transplant lists, mass casualty events and comparative effectiveness research. Students will be expected to read and make presentations of material from the relevant literature. Registration for this course is being processed through your respective university: Duke: STA 790-02 NCSU: MA 810.002 UNC: STOR 892.1 Questions about the course or the Healthcare program should be emailed to healthcare@samsi.info
MD Imaging The focus of the Imaging Working Group will be on methodological and computational questions of statistics, mathematics and computer science posed by imaging science and technology, with applications to either astronomy, high energy physics, the environment, health sciences or other areas. This will be a venue to discuss the challenges, motivate and develop innovative approaches towards computing environments, analyses, methods, algorithms and tools, in relation to imaging science and innovative applications.
MD Online Streaming & Sketching Working Group leaders: Petros Drineas, Ilse Ipsen, and Michael Mahoney Webmasters: John Holodnak and Kevin Penner Purpose: Development and analysis of fast randomized algorithms for computing leverage scores, and their application Preliminary, Partial List of Topics: Approximating leverage scores for L2 and other regression problems: Online, streaming, incremental streaming algorithms Numerical analysis: Sensitivity of leverage scores, numerical stability of algorithms Applications in astronomy: Characterization of streaming & time dependent aspects of low rank approximations Incremental computation of leverage scores Applications in feature selection: How to distinguish among almost identical columns with high leverage scores (RRQR factorization, clustering) Derivation of formal bounds

2011

Working Group Description
Statistics of Extremes – Climate and Methodology UQ This group will examine the characterization of extreme events from a statistical point-of-view. The overall group will be composed of several project groups, each with a specific focus. The project groups may be application-driven, may work develop new statistical methodologies, or may work to further the theory on which extreme value analyses rely. The overall group will provide a structure for the project groups as well as provide an environment for investigating aspects which have general interest.
Parallel Computing Issues – Climate UQ – adaptive design of experiments, and resolution issues – embedding of emulated sub-models to resolve sub-processes that are now computationally prohibitive – python open source software open platform
Simulation of Rare Events – Methodology UQ This group will focus on methods for simulating rare events in high-dimensional physical systems, especially PDE models. We will explore the use of importance sampling and large deviation theory in order to identify important mechanisms or configurations of the parameters that lead to rare events. We also will consider the role of asymptotic analysis in constructing effective sampling weights for such computations.
Data Assimilation – Methodology UQ Data assimilation is the process of fusing information from imperfect models, noisy measurements, and priors, to produce an optimal representation of the state of a physical system. Data assimilation can be interpreted and carried out in a Bayesian framework. Practical methods for large-scale systems include suboptimal and the ensemble Kalman filter approaches, optimal interpolation, and three and four dimensional variational methods. This working group will focus on emerging problems that include, but are not limited to: new computational algorithms, modeling of model errors, impact of observations, and quantification of posterior uncertainties. There will be strong ties between theory and applications investigated throughout the program.
Stochastic to Deterministic Models and Back Again – Methodology UQ Models of complex multiscale and/or multiphysics phenomena often require combining stochastic and deterministic models. Direct coupling of stochastic and deterministic models, e.g. molecular dynamics with a continuum model Stochastic parameterization, with parameters determined by a stochastic model simulation or other statistical models Such models are often used to predict “engineering scale” questions from limited microscale information. Of course, this is a classic analysis/modeling problem. The working group will focus on computational issues, including: Rigorous formulation and analysis of coupling mechanisms and their discretizations Numerical treatment of averaging and computed expectations and the effect of approximations A posteriori error analysis, resolutions required in different components, adaptive computation Rigorous treatment of feedback between stochastic and deterministic models, e.g. nonlinear iterative methods, convergence
Model Validation – Methodology UQ Model validation refers to the process of assessing the accuracy with which mathematical models can predict physical events, or, more specifically, quantities of interest observed in physical phenomena. Validation should be a prerequisite for predictive modeling, which often forms the basis for decision-making. This working group will study the principles, merits, and limitations of various probabilistic approaches to model validation. Special emphasis will be laid on methods for splitting datasets for calibration and validation purposes, on the analysis of model discrepancies, on the development of rejection metrics, and on any other issues of interest raised during the working group meetings. The working group is organized and managed by Serge Prudhomme (ICES, UT Austin), Sujit Ghosh (Statistics, NCSU), and Jan Hannig (STOR, UNC).
Multiphysics – Methodology UQ Multiphysics models comprising compositions of models of several physical processes, often at different scales, dominate many areas of science and engineering. The working group will study UQ topics for MP models (including both forward and inverse topics) Research issues: Complex feedback between physical processes, highly nonlinear responses Complex and unresolved coupling mechanisms Different kinds and representations of uncertainty for different components, and complex interactions between sources of uncertainty and error Complex, high dimension parameter space Bifurcations and discontinuous model changes High performance computational issues
Approximating Computationally Intensive Functions and Sampling Design in High Dimensions – Methodology UQ This is a core problem in constructing surrogates, with application to uncertainty propagation, inference, prediction, and design. Relevant research questions: Cross-examination of different methods: Projection, regression, interpolation, L1 minimization, Gaussian process/kriging, etc Appropriate measures of performance/accuracy and their dependence on the intended use of the surrogate model. Error analysis and convergence properties Sparse representations: l1 minimization, pursuit algorithms, low-rank approximation “Optimal” choices of nodes/design points for different methods Adaptive approaches: a posteriori error estimates; derivative properties; dimension reduction; additional optimization of nodes; ANOVA; relation to sequential design of experiments Interpreting and combining uncertainty information from stochastic surrogates (e.g., Gaussian process variance) and deterministic error bounds Deriving optimality criteria and search algorithms that are good for high dimensions Borrow existing theoretical results in high dimensional statistics (Donoho, etc.) to shed light on the structure of “optimal” designs in high dimension. Additional issues— Lack of regularity: discovering and approximating discontinuities in high dimensions Incorporating gradient information Enforcing constraints on output and input domains Fault tolerance and missing samples
Inverse Function-based Inference – Methodology UQ Jan Hannig (lead), D. Estep, Troy Butler (U. Texas), Simon Tavener (CSU) The working group will study the use of set-valued inversion of models for inference Research issues: Approximation of set-valued inverses in complex spaces Computation of inverse measures in parameter space Convergence and accuracy of computed inverse measures Theoretical issues regarding inversion of multiple observations Relation to fiducial inference and Dempster and Shafer calculus Intrusive and non-intrusive algorithms, dimension-benign computational algorithms
Surrogate Models – Methodology UQ This working group focuses on the exploration of properties, utility, and performance of two classes of model surrogates, namely Polynomial Chaos and Gaussian Process surrogates. The study will be done in the context of specific model problems with a range of difficulty involving nonlinearity and dimensionality. Test problems will include both algebraic functions as well as simple ODE/PDE problems.
Engineered Systems – Engineering UQ
Sustainability – Engineering UQ
Materials – Engineering UQ
Renewable Energy – Engineering UQ The group will consider uncertainty quantification issues arising in specific applications linked to renewable energy. In particular, we will study biofuels and wind farms. Other aspects involves the inclusion, in a UQ framework, of factors such as technological advances and/or regulations.
Nuclear Energy – Engineering UQ
Geosciences – Geosciences UQ
Data Assimilation in IPCC Level Models – Climate UQ In the IPCC AR4, the results were based on runs from ca. 24 models. These were built and run at climate research centers around the world and are each integrated Earth system models that comprise many components, including atmosphere, ocean, ice and land. The so-called dynamical core of such models is a computational model covering both the atmosphere and ocean and based on the primitive equations of GFD. While this is essentially computational, data come into the process of forming the final model. This incorporation occurs at a number of stages of the model development, including parametrization of sub-grid scale effects and model tuning. The process is not, however, done systematically and current practice is not thought of as “data assimilation.” There seems to be a growing realization that DA will have a significant role to play in future climate model development. This is, in part, driven by the need to quantify uncertainty in the model predictions. Nevertheless, there is not a consensus as to how DA should be used in these large-scale climate models. This working group will consider the issues involved in formulating a plan for DA in such models. The first step will be to understand how such a model is put together and uncover all the steps where data is currently used in the model formation. For this purpose, we will look at the latest CESM from NCAR.
Numerical Methods for Uncertainty Quantification – Spring Part 2 Principal Instructors: Various Course Day and Time: Course will be held at SAMSI (driving directions) in RTP on Wednesdays, 4:30-7:00 p.m. in Room 150. Schedule: First class Wednesday, January 18, 2012 ; last class Wednesday, April 25, 2012 This course focuses on numerical methods for stochastic computation and uncertainty quantification (UQ). It is a two-semester course, where the first semester focuses on fundamental materials for UQ computing and the second semester on more advanced research materials. The main topics covered in the second semester include advanced numerical techniques for SPDE: adaptive methods and compressive sensing propagation of probability distributions Bayesian inference data assimilation model calibration Prerequisites: Numerical linear algebra, numerical methods for ordinary and partial differential equations; programming skill in one language, e.g., C/C++, FORTRAN, or Matlab. The course is open to all levels of graduate students in Mathematics, Statistics, as well as to those in other departments of sciences and engineering. Senior level undergraduate students with outstanding background are also considered. Registration for this course is being processed through your university. Duke: STA 294-01 NCSU: MA 810.002 UNC: STOR 891-001 Questions about the course or the UQ program should be emailed to uq@samsi.info.

2010

Working Group Description
Dynamics OF Networks Complex Networks Program working group.

The dynamics of networks working group is exploring a variety of mathematical and statistical approaches for describing and understanding the changing connection topology of networks over time, the interplay of these network dynamics with other dynamic processes on the network, and the connections between these different mathematical and statistical methodologies.

Sampling / Modeling / Inference CN Program working group.

The Working Group on Sampling/Modeling/Inference in networks aims to work towards moving the current state of knowledge on these inter-related tasks — in the specific context of networks — to rest on a more principled and integrated mathematical and statistical foundation.  We are pursuing this goal by focusing on a handful of specific prototype problems in the context of certain application areas, ranging from information networks to animal communities to neuroscience.

Dynamics ON Networks CN Program working group.

Random graphs are useful models of social and technological networks. To date most of the research in this area has concerned geometric properties of the graphs. This working group will focus on processes taking place ON the network. In particular we are interested in how their behavior on networks differs from  that in homogeneously mixing populations or on regular lattices of the type commonly used in ecology and physics.

Geometrical / Spectral Analysis CN Program working group

This working group is concerned with the following topics: detection of communities in networks, multiscale spectral methods for the analysis of the geometry of networks, algorithms that simplify graphs into simpler graphs in order to speed up certain optimization problems, metrics for comparing graphs, and multiscale homogenization of random walks. These topics have applications biology and to spread of “epidemics” in financial networks.