Opening Workshop, Massive Datasets Program - September 9-12, 2012

Workshop Information

September 9, 2012 - 8:30am - September 12, 2012 - 4:30pm

The Opening Workshop was held Sunday-Wednesday, 9-12 September 2012, at the Radisson Hotel Research Triangle Park, NC. The hotel is in close proximity to SAMSI.

Schedule

Sunday, September 9, 2012
Radisson RTP

8:30-9:00 Registration and Continental Breakfast
8:50-9:00 Welcome and Introduction
Ilse Ipsen, N.C. State University/SAMSI
  Tutorials
9:00-10:00 Tamas Budavari, Johns Hopkins University
Statistical Methods in Astronomy
VIDEO
10:00-10:30 Break
10:30-11:30 Petros Drineas, Rensselaer Polytechnic Institute
Mining Massive Datasets: A (randomized) Linear Algebraic Perspective
VIDEO
11:30-1:00 Lunch
1:00-2:00 Haesun Park, Georgia Institue of Technology
Visual Analytics for Knowledge Discovery in High Dimensional Data
VIDEO
2:00-2:30 Break
2:30-3:30 Stephen Wright, University of Wisconsin
Optimization Techniques for Statistical Analysis on Large Datasets
3:30-4:00 Break
4:00-5:00 Michael Jordan, Univ. of California-Berkeley
Resampling Methods for Massive Data
VIDEO

Monday, September 10, 2012
Radisson RTP

8:30-8:55 Registration and Continental Breakfast
8:55-9:00 Welcome
  Session: Inference
9:00-9:45 Bin Yu, University of California, Berkeley
Stability
9:45-10:30 Xiaotong Shen, University of Minnesota
On Personalized Information Filtering
10:30-11:00 Break
11:00-11:45 Brian Caffo, Johns Hopkins University
Resting State Brain Functional Connectivity Data: progress, future challenges and data
11:45-12:15 Panel
Chair: Bill Eddy, Carnegie Mellon University
Panelists: Alex Gray, Georgia Tech, Karen Kafadar, Indiana University, Bo Li, Purdue University
12:15-1:30 Lunch
  Session: Imaging
1:30-2:15 Jim Nagy, Emory University
Numerical Methods for Large Scale Inverse Problems in Image Reconstruction
2:15-3:00 Jianqing Fan, Princeton University
Iterative Screening and Estimation
3:00-3:30 Break
3:30-4:15 Rollin Thomas, Lawrence Berkeley National Lab
Supernova Discovery in the Era of Data-Intensive Science
4:15-4:45 Panel
Co-Chairs: Daniela Ushizima, Lawrence Berkeley National Lab and Jiayang Sun, Case Western Reserve
Panelists: Peihua Qiu, University of Minnesota, Erkki Somersalo, Case Western
4:45-5:15 Poster blitz (2 minutes per poster)
5:15-5:30 Break
5:30-7:30 Poster Session and Reception

SAMSI will provide poster presentation boards and tape. The board dimensions are 4 ft. wide by 3 ft. high. They are tri-fold with each side being 1 ft. wide and the center 2 ft. wide. Please make sure your poster fits the board. The boards can accommodate up to 16 pages of paper measuring 8.5 inches by 11 inches.

Tuesday, September 11, 2012
Radisson RTP

8:30-9:00 Registration and Continental Breakfast
  Session: Environment & Climate
9:00-9:45 Anna Michalak, Stanford University
A Bird’s Eye View of the Carbon Cycle: Spatiotemporal tools for constraining the CO2 budget from atmospheric observations
9:45-10:30 Dan Crichton, Jet Propulsion Lab
Architecting Highly Scalable Scientific Data Management and Discovery Systems
10:30-11:00 Break
11:00-11:45 Noel Cressie, University of Wollongong and The Ohio State University
Uncertainty Quantification for Regional-Climate-Model Output
11:45-12:15 Panel:
Chair: Jessica Matthews, CICS-NC
Panelists: Amy Braverman, Jet Propulsion Lab, Steve Sain, NCAR, Richard Smith, SAMSI/UNC-CH
12:15-1:30 Lunch
  Session: High Energy Physics
1:30-2:15 Steffen Bass, Duke University
Recreating the Big Bang in the Laboratory: The Scientific, Computational and Data Challenges of High Energy Nuclear Physics
2:15-3:00 Kyle Cranmer, New York University
Statistical Aspects of the Discovery of the Higgs Boson at the Large Hadron Collider
3:00-3:30 Break
3:30-4:15 Luc Demortier, Rockefeller University
Searches and Measurements in High Energy Physics
4:15-4:45 Panel
Chair: Robert Wolpert, Duke University
Panelists: Mandeep Gill, SLAC; Cosma Shalizi, Carnegie Mellon University; Daniel Whiteson, University of California, Irvine
4:45-6:00 Open Mic and Refreshments

Wednesday, September 12, 2012
Radisson RTP

8:30-9:00 Registration and Continental Breakfast
  Session: Streaming, Sketching & Datamining
9:00-9:45 Michael Mahoney, Stanford University
Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments
9:45-10:30 Maryam Fazel, University of Washington
Convex Relaxations for Recovery of Models with Simultaneous Structures
10:30-11:00 Break
11:00-11:45 Inderjit Dhillon, University of Texas, Austin
Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
11:45-12:15 Panel
Chair: Piotr Indyk, MIT
Panelists: Graham Cormode, AT&T Labs-Research, Ashish Goel, Stanford University, Michael Mahoney, Stanford University
12:15-1:30 Lunch
  Working Groups
1:30-3:00 Working Group Formation and Initial Meeting
3:00 Adjourn

Partial list of research topics:

* Data visualization and analytics:
High-speed visualization of high-dimensional datasets; data representation, extraction, integration and transformation; real-time visual interaction; spatio-temporal data mining

* Online streaming and sketching:
Algorithm paradigms for massive datasets (streaming, online, randomized); scalability; filtering; anomaly detection; data structures for fast computation of statistics; database enabled machine learning tools; computing environments and programming models (GPU's, cloud computing, custom chips)

* Large-scale optimization:
Convex optimization (sparse modeling and compressed sensing, matrix completion); online optimization (streaming data, on-line learning, control theory); distributed optimization (parallel and GPU computation, data fusion); machine learning; high-dimensional models

* Inference:
Dimension reduction for high-dimensional data (feature selection, sub-sampling and screening, sparse PCA); predictive inference and multiple testing (false discovery rates, uncertainty in prediction); high-dimensional MCMC methods for posterior inference (particle filters, hybrids with optimization methods)

* Imaging:
Rapid registration and segmentation (GPU's, distributed computing); multiple testing and inference for large-scale imaging data (sky surveys, satellite images, false discovery rate with dependence); dynamic imaging (streaming data, spatio-temporal models)

* Systems and architectures :
Reliability; resilience; probabilistic computing, multiple precision; real-time methods; variable data flows; hardware platforms

* High-energy physics:
Reconstruction and analysis of particle collisions from the LHC; pattern recognition and parameter extraction; simulations to estimate error rates; parameter estimation for large numbers of parameters; maximum likelihood estimators

* Astronomy:
Statistics on remote resources; computations on special purpose architectures and GPUs; communication avoiding methods; randomized and online algorithms; detection and classification of transient events and outliers; Bayesian inference and machine learning; high dimensional models with empirical priors; non-parametric models; visualization of large high-dimensional datasets

* Environment and climate:
Production, validation, processing, distribution and integration of data; data fusion and remote sensing; algorithms for large distributed datasets; spatial or spatio-temporal statistics