Opening Workshop, Massive Datasets Program – September 9-12, 2012

Partial list of research topics

Data visualization and analytics

High-speed visualization of high-dimensional datasets; data representation, extraction, integration and transformation; real-time visual interaction; spatio-temporal data mining

Online streaming and sketching

Algorithm paradigms for massive datasets (streaming, online, randomized); scalability; filtering; anomaly detection; data structures for fast computation of statistics; database enabled machine learning tools; computing environments and programming models (GPU’s, cloud computing, custom chips)

Large-scale optimization

Convex optimization (sparse modeling and compressed sensing, matrix completion); online optimization (streaming data, on-line learning, control theory); distributed optimization (parallel and GPU computation, data fusion); machine learning; high-dimensional models


Dimension reduction for high-dimensional data (feature selection, sub-sampling and screening, sparse PCA); predictive inference and multiple testing (false discovery rates, uncertainty in prediction); high-dimensional MCMC methods for posterior inference (particle filters, hybrids with optimization methods)


Rapid registration and segmentation (GPU’s, distributed computing); multiple testing and inference for large-scale imaging data (sky surveys, satellite images, false discovery rate with dependence); dynamic imaging (streaming data, spatio-temporal models)

Systems and architectures

Reliability; resilience; probabilistic computing, multiple precision; real-time methods; variable data flows; hardware platforms

High-energy physics

Reconstruction and analysis of particle collisions from the LHC; pattern recognition and parameter extraction; simulations to estimate error rates; parameter estimation for large numbers of parameters; maximum likelihood estimators


Statistics on remote resources; computations on special purpose architectures and GPUs; communication avoiding methods; randomized and online algorithms; detection and classification of transient events and outliers; Bayesian inference and machine learning; high dimensional models with empirical priors; non-parametric models; visualization of large high-dimensional datasets

Environment and climate

Production, validation, processing, distribution and integration of data; data fusion and remote sensing; algorithms for large distributed datasets; spatial or spatio-temporal statistics

Schedule and Supporting Media

Sunday, September 9, 2012
Radisson RTP

Time Description Speaker Slides Videos
8:30-9:00 Registration and Continental Breakfast
8:50-9:00 Welcome and Introduction Ilse Ipsen, N.C. State University/SAMSI
9:00-10:00 Statistical Methods in Astronomy Tamas Budavari, Johns Hopkins University    
10:00-10:30 Break
10:30-11:30 Mining Massive Datasets: A (randomized) Linear Algebraic Perspective Petros Drineas, Rensselaer Polytechnic Institute    
11:30-1:00 Lunch
1:00-2:00 Visual Analytics for Knowledge Discovery in High Dimensional Data Haesun Park, Georgia Institue of Technology  
2:00-2:30 Break
2:30-3:30 Optimization Techniques for Statistical Analysis on Large Datasets Stephen Wright, University of Wisconsin
3:30-4:00 Break
4:00-5:00 Resampling Methods for Massive Data Michael Jordan, Univ. of California-Berkeley    

Monday, September 10, 2012
Radisson RTP

Time Description Speaker Slides Videos
8:30-8:55 Registration and Continental Breakfast
8:55-9:00 Welcome
Session: Inference
9:00-9:45 Stability Bin Yu, University of California, Berkeley  
9:45-10:30 On Personalized Information Filtering Xiaotong Shen, University of Minnesota
10:30-11:00 Break
11:00-11:45 Resting State Brain Functional Connectivity Data: progress, future challenges and data Brian Caffo, Johns Hopkins University  
11:45-12:15 Panel Chair:
Bill Eddy, Carnegie Mellon University
Alex Gray, Georgia Tech,
Karen Kafadar, Indiana University,
Bo Li, Purdue University
12:15-1:30 Lunch
Session: Imaging
1:30-2:15 Numerical Methods for Large Scale Inverse Problems in Image Reconstruction Jim Nagy, Emory University  
2:15-3:00 Iterative Screening and Estimation Jianqing Fan, Princeton University  
3:00-3:30 Break
3:30-4:15 Supernova Discovery in the Era of Data-Intensive Science Rollin Thomas, Lawrence Berkeley National Lab  
4:15-4:45 Panel Co-Chairs:
Daniela Ushizima,
Lawrence Berkeley National Lab and Jiayang Sun, Case Western Reserve
Peihua Qiu, University of Minnesota,
Erkki Somersalo, Case Western
4:45-5:15 Poster blitz (2 minutes per poster)
5:15-5:30 Break
5:30-7:30 Poster Session and Reception

Tuesday, September 11, 2012
Radisson RTP

Time Description Speaker Slides Videos
8:30-9:00 Registration and Continental Breakfast
Session: Environment & Climate
9:00-9:45 A Bird’s Eye View of the Carbon Cycle: Spatiotemporal tools for constraining the CO2 budget from atmospheric observations Anna Michalak, Stanford University
9:45-10:30 Architecting Highly Scalable Scientific Data Management and Discovery Systems Dan Crichton, Jet Propulsion Lab
10:30-11:00 Break
11:00-11:45 Uncertainty Quantification for Regional-Climate-Model Output Noel Cressie, University of Wollongong and The Ohio State University
11:45-12:15 Panel:
Jessica Matthews, CICS-NC
Amy Braverman, Jet Propulsion Lab,
Steve Sain, NCAR,
Richard Smith, SAMSI/UNC-CH
12:15-1:30 Lunch
Session: High Energy Physics
1:30-2:15 Recreating the Big Bang in the Laboratory: The Scientific, Computational and Data Challenges of High Energy Nuclear Physics Steffen Bass, Duke University  
2:15-3:00 Statistical Aspects of the Discovery of the Higgs Boson at the Large Hadron Collider Kyle Cranmer, New York University  
3:00-3:30 Break
3:30-4:15 Searches and Measurements in High Energy Physics Luc Demortier, Rockefeller University
4:15-4:45 Panel
Robert Wolpert, Duke University
Mandeep Gill, SLAC;
Cosma Shalizi, Carnegie Mellon University;
Daniel Whiteson, University of California, Irvine
4:45-6:00 Open Mic and Refreshments

Wednesday, September 12, 2012
Radisson RTP

Time Description Speaker Slides Videos
8:30-9:00 Registration and Continental Breakfast
Session: Streaming, Sketching & Datamining
9:00-9:45 Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments Michael Mahoney, Stanford University
9:45-10:30 Convex Relaxations for Recovery of Models with Simultaneous Structures Maryam Fazel, University of Washington
10:30-11:00 Break
11:00-11:45 Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation Inderjit Dhillon, University of Texas, Austin  
11:45-12:15 Panel
Piotr Indyk, MIT
Graham Cormode, AT&T Labs-Research,
Ashish Goel, Stanford University,
Michael Mahoney, Stanford University
12:15-1:30 Lunch
Working Groups
1:30-3:00 Working Group Formation and Initial Meeting
3:00 Adjourn