2012-13 Program on Statistical and Computational Methodology for Massive Datasets

This year-long SAMSI program focuses on fundamental methodological questions of statistics, mathematics and computer science posed by massive datasets, with applications to astronomy, high energy physics, and the environment. Serious challenges posed by massive datasets have to do with "scalability" and "data streaming". Techniques developed for small or moderate-sized datasets simply do not translate to modern massive data sets. Data acquisition rates on the order of gigabytes per second necessitate innovative approaches towards computing environments, analysis, and algorithms.

Representative research questions include:

  • How to reduce the amount of data (spatial sampling, temporal sampling, feature extraction)?
  • How to distribute the analysis over available computing resources (at the data collection points, in a cloud, on custom chips, or on the desk top)?
  • How to guarantee resilience and fault tolerance of hardware and software?
  • How to cope with missing values in a data set?
  • How to estimate error rates, and increase the signal to noise ratio in recorded data?
  • How to detect anomalies, outliers and transient events?
  • How to perform on-the-fly analyses and real time computations?
  • How to efficiently use high-dimensional or non-parametric models?

Research foci will include: Inference, large scale nonlinear optimization, online streaming and sketching, imaging, data visualization, systems and architectures, as well as applications to astronomy, high energy physics, and the environment.

Program Leaders: Michael Jordan (University of California, Berkeley), Karen Kafadar (Indiana University, Bloomington), Michael Mahoney (Stanford University), Steve Sain (NCAR), Jiayang Sun (Case Western Reserve University), Alexander Szalay (Johns Hopkins University)

Local scientific coordinator: Yufeng Liu (University of North Carolina)

Directorate Liaison: Ilse Ipsen (North Carolina State University)

National Advisory Committee Liaison: Bin Yu (University of California, Berkeley)

 

Description of Activities

Workshops:

The program will start with an opening workshop on September 9-12, 2012.

Courses: Several courses at the graduate level will also be taught at SAMSI, throughout the year.

Working Groups: Working groups will meet throughout the program to pursue particular research topics identified in the kickoff workshop (or subsequently chosen by the working group participants). The working groups consist of SAMSI visitors, postdoctoral fellows, graduate students, local faculty and scientists. These groups are at the very heart of the scientific activities at SAMSI.

Further Information

For additional information about the program send E-mail to md@samsi.info