2012-13 Program on Statistical and Computational Methodology for Massive Datasets

This year-long SAMSI program focused on fundamental methodological questions of statistics, mathematics and computer science posed by massive datasets, with applications to astronomy, high energy physics, and the environment.

Serious challenges posed by massive datasets have to do with “scalability” and “data streaming”. Techniques developed for small or moderate-sized datasets simply do not translate to modern massive data sets. Data acquisition rates on the order of gigabytes per second necessitate innovative approaches towards computing environments, analysis, and algorithms.

Research Working Groups

Working groups are at the very heart of the scientific activities at SAMSI. They consist of SAMSI visitors, postdoctoral fellows, graduate students, local faculty, and other scientists. The working groups met every week throughout the program year, to pursue the following research topics that were identified at the Planning Workshop and at the Opening Workshop, or subsequently chosen by the working group participants:

  • Inference
  • Online streaming and sketching
  • Imaging
  • Data mining and clustering
  • Multi-scale modeling
  • Graphical models and graphics processors
  • Stochastic processes and astrophysical inference
  • Discovery and classification in synoptic surveys
  • High energy physics
  • Environment and climate

Graduate course

The two-semester graduate course Computational and inferential methods for high dimensions and massive datasets (Fall 2012 and Spring 2013) covered fundamental methodological questions of statistics, mathematics and computer science posed by massive datasets, with applications to astronomy, high energy physics, and environment and climate.