Title: Sparse Signal Recovery for Discrete or Continuous Data
Abstract: Sparse signal detection has been one of the most important challenges in the analysis of large-scale data-sets arising from many different disciplines, e.g. Genomics, Finance and Astronomy. In this talk, I will focus on two key aspects of inference on a high-dimensional sparse mean vector: (1) how to provide theoretical justifications for existing methods that perform strongly, and (2) how to use this theoretical insight to develop new approaches that can outperform the current methods in the ‘ultra-sparse’ regime. In the first half of the talk, I will discuss multiple testing optimality for continuous data, and prove Oracle properties of the popular `Horseshoe’ prior [1]. I will then develop a novel prior called the ‘Horseshoe+’ prior [2] that sharpens the ‘Horseshoe’ prior’s signal detection abilities. I will illustrate that the Horseshoe+ prior outperforms the existing methods both in theory and practice and correctly identifies the `differentially expressed' genes from microarray data. In the second half, I will briefly discuss inference on high dimensional sparse count data which is fundamentally different from the high-dimensional Gaussian case. I will present the ‘Gauss-Hypergeometric’ prior for sparse Poisson means [3], motivated by the growing interest in analyzing sparse count data and end with an application to detect mutational hotspots in whole exome sequencing data.
References:
[1] Datta, J. and Ghosh, J. K. (2013). Asymptotic properties of Bayes risk for the horseshoe prior. Bayesian Analysis, 8(1):111–131.
[2] Bhadra, A., Datta, J., Polson, N. G., and Willard, B. (2015). The Horseshoe+ Estimator of Ultra-Sparse Signals. arXiv preprint arXiv:1502.00560.
[3] Datta, J. and Dunson, D. B. (2015). Priors for High-Dimensional Sparse Poisson Means. arXiv preprint arXiv:1510.04320. (Biometrika, under revision)