Geometric and Topological Summaries of Data and Inference II – Spring 2014


First class: Tuesday, January 14, 2014
No class on March 11, 2014
Last class: Tuesday, April 15, 2014

Course Description: This course was a continuation of the fall offering, but the material covered will not necessarily require the previous course as a prerequisite. Part of the aim is to provide some of the necessary background needed for material to be covered in the following LDHD scheduled workshops:

  • Topological Data Analysis: February 3-7, 2014
  • Statistical inference in sparse high-dimensional models: theoretical and computational challenges: February 24-26, 2014
  • SAMSI-CRM workshop on geometric aspects of high-dimensional inference: March 31-April 2, 2014

Topics covered:

(1) Persistent Homology — In computational algebraic topology, one attempts to recover qualitative global features of the underlying data — such as connectedness, or the number of holes, or the existence of obstructions to certain constructions — based upon a random sample. In other words, one hopes to recover the underlying topology. An advantage of topology is that it is stable under deformations and thus can potentially lead to robust statistical procedures. A combinatorial construction converts the data into an object forwhich it is possible to compute the topology. A multi-scale solution to this problem is the technique of persistent homology. It quantifies the persistence of topological features as the scale changes. Persistent homology is useful for visualization, feature detection, and object recognition.

(2) Morse Theory — The geometry of Morse functions can completely characterize the topology of an object by the way in which topological characteristics of sub-level sets change at critical points. Indeed, classical Morse theory tells us that the homotopy type is characterized by attaching a cell, whose dimension is determined by the number of negative eigenvalues of the Hessian at a critical point, to the boundary of the set at the critical point. This indeed is a pathway that connects geometry with topology, and one which also serves as a bridge to statistics.

(3) SiZer — In its original form, SIgnificant ZERo crossings of derivatives (SiZer) is a graphical tool that helps one visually understand graphical features of a surface as the scale, resolution, or bandwidth changes. What was mainly a graphical aid turns out to have some deep geometrical structure that is only beginning to be understood and has precise ties with topics (1) and (2). We will examine this connection.

(4) Metagenomic Data Analysis — The data come from massively parallel sequencing (MPS), otherwise referred to as high-throughput sequencing, next-generation sequencing, or pyro-sequencing. One very important observation that is repeatedly made when studying the microbiome is that it has the structure of a singular mathematical object embedded in high-dimensional space that represents mathematical novelty, but at the same time it poses major technical challenges. The goal here is to produce meaningful quantitative descriptors particularly as it relates to certain gastrointestinal diseases. Although the statistical methods currently used in microbial ecology are standard for statisticians, the microbiology terminology requires some background. We will provide this background with the purpose of being able to apply topics (1), (2), and (3) to this promising field.

Prerequisites: Background in calculus and linear algebra and some reasonable foundation in statistics and probability.

Course Format: The main instructor was Peter Kim but there were several guest lecturers, with material and instructors paralleling some of the workshops in the 2013-2014 year-long SAMSI program on Low-Dimensional Structure in High-Dimensional Systems (LDHD).

Registration for this course is being processed through your respective university:

  • Duke: STA 790.01
  • NCSU: MA 810.001
  • UNC: STOR-892.1 cross listed with MATH 892.1

Questions: email