A Working Group in the Astrostatistics Program
Group Leaders: Tom Loredo (leader), Jogesh Babu (alternate), and Hyunsook Lee (web)
Meeting Time: Wednesdays 3:30 - 5:00 pm (unless otherwise noted)
Remote access for the meetings is available:
Note: Until we can get the needed bandwidth, equipment, etc. for broadcasting meetings via a webcam, SAMSI will be employing the following process for remote participation in working groups. A webcam picture will be available for viewing during the meetings. However, for audio, participants will have to call into the conference line. |
Astronomers survey events and sources across the full range of astronomical disciplines, from the study of flaring events on the Sun and the distribution of minor planets in the solar system, to the study of the distributions of stars and compact objects in the Galaxy, and extending to the distribution of galaxies and explosive transient sources throughout the visible cosmos. The catalogs produced by such surveys span a factor of tens of millions in size, leading to a great diversity in the range of scientific questions and methodological issues data analysts must address.
Despite the great diversity in the subjects and sizes of astronomical surveys, there are common themes that arise throughout survey data analysis work. The most fundamental is the inherently hierarchical nature of survey data analysis. The scientific questions driving a survey project typically concern inferring properties of a population of sources; yet to address such questions, one must also carefully address statistical issues on the individual source level, such as source detection and measurement uncertainty, or source classification. Source- and population-level inferences often "feed back" to each other, with source uncertainties influencing population inferences, yet population properties influencing quantification of source characteristics. The classic example is Malmquist bias, where correct inference of population averages requires adjustment of individual source properties, but the size of the adjustment depends on the shape of the (unknown!) source distribution.
The SPS working group will explore a wide variety of topics at the interface of survey and sampling analysis in statistics, and the analysis of surveys of various types and sizes in astronomy. Following is a list of initial topics arising in discussions at the SAMSI Astrostatistics Workshop. We plan to identify a few of these for focused study, particularly during our focused research session from 20-31 March 2006.
Source characteristics are often uncertain due to measurement error (e.g. for flux or magnitude distributions), or due to propagated uncertainty when they are inferred from other measured quantities (e.g., luminosities estimated from luminosity indicators, redshifts estimated from photometry). The uncertainties significantly complicate inference about the population. Source uncertainties grow as candidate sources become dim and harder to detect; source uncertainty is thus intimately related to the truncation or selection of the sample. We will study both quantification of source uncertainties and truncation/selection effects, and propagation of uncertainty through population-level inferences.
Generalizing methods for analysis of truncated & censored data Many methods currently used are adapted from survival analysis and assume negligible measurement error. How can these methods be generalized to account for significant, heteroscedastic measurement error, and "missing data" that really is not censored, but measured with an upper limit?
Statistical perspective on Malmquist and Lutz-Kelker biases These astronomical terms refer to biases in population-level inferences related to source uncertainty and selection effects in surveys. Are there counterparts to these effects in the statistics literature? Can current astrostatistical treatment of such effects be improved?
Calibrating indirect indicators Luminosity indicators and photometric redshifts must be trained/calibrated using data sets that may have significant measurement error or truncation. Cosmologists have developed methods for handling this. Can these methods be improved? Can such methods improve analyses of data in other astronomical subdisciplines (e.g., gamma-ray burst population studies)?
Astronomical survey data sets have sample sizes ranging from dozens to hundreds of millions. Large data sets offer the promise of detailed inferences, but sound inferences may require computations that are challenging for huge samples. We will examine various tradeoffs one must make between methodological complexity and sample size.
Summaries and approximations What combinations of data summaries and analysis methodology permit accurate population inference, as a function of the size of the data set?
Nonparametric & semiparametric modeling As we seek to learn more details about populations, simple parametric models may prove insufficient. What non- and semi-parametric methods are feasible for analysis of large data sets (within both the frequentist and Bayesian approaches)? At what sample sizes can we expect such methods to become preferable to parametric modeling?
Astronomer members of the SPS team are very interested in improving astrostatistical practice for comparing parametric models, including accounting for uncertainty in model selection. This is interesting both for the analysis of particular candidate sources found in a survey (i.e., for source detection and classification within a survey catalog), and for comparison of models for populations.
Within catalogs Standard astronomical practice is to classify candidate sources via cuts. For some surveys, cuts are conservative, providing security that reported sources are accurately identified, but sacrificing information about numerous weak sources. For other surveys, cuts are optimistic, raising concern about the true properties of weak sources. To what extent can cutting be avoided by weighting? What is the best way to establish weights for candidate sources? Even the best weighting scheme must eventually cut (e.g., due to systematic error in the models producing weights, or practical constraints on catalog size); what determines the optimal cut level?
Population inferences What model comparison or model selection techniques are most appropriate for typical parametric astronomical models? What techniques are there that allow comparison or assessment of nonparametric models (possibly for multivariate data)?
Date | Topics & Readings | Attendees |
Wed., 8 Feb |
First SPS working group meeting: Read the group overview (above) and the Feigelson & Babu (1998) and Petrosian (2002) reviews. We will discuss topics and readings for upcoming meetings. Please consider preparing a presentation (e.g., shared via a PDF file posted here) for an upcoming meeting. |
Remote: David Chernoff, Martin Hendry, Kuo-Ping Li, and Martin
Weinberg At SAMSI: Jogesh Babu, Pablo de la Cruz, Woncheol Jang, Hyunsook Lee, Ji Meng Loh, Tom Loredo, and Francisco Vera |
Wed., 15 Feb |
Second meeting:
Tom's presentation References for saddlepoint approximations and Neyman-Scott problem |
Remote:Ruth Barrera, David Chernoff, Kuo-Ping Li, Tom Loredo, and
Martin Weinberg |
Wed., 22 Feb |
Two presentations were given:
Martin's talk will be continued next week. |
Remote:
David Chernoff, Martin Hendry, Tom Loredo, Haywood Smith, Martin Weinberg, and
unreconized members |
Wed., 1 March |
Martin Hendry:"What do astronomers mean by Malmquist bias?" We are experiencing lots of noises during telecon (other groups have far less). Please, consider the followings:
|
Remote:
Ruth Barrera, David Chernoff, Alanna Connors, Martin Hendry, Woncheol Jang,
Tom Loredo, Haywood Smith, Antonio Uribe, Martin Weinberg,
Steven (from UT) and unreconized members |
Wed., 8 March |
Jogesh Babu and Woncheol Jang: Check Particle Physics group website for some presentation materials. Recommended papers Possible collaborations among statisticians in point process, spatial statistics and astronomers with survey data? |
Remote:
Ruth Barrera, David Chernoff,
Tom Loredo, Haywood Smith, Antonio Uribe |
Wed., 15 March |
Tom Loredo:"Coincidence Assessment" and references (also listed in Tom's email) |
Remote:
Ruth Barrera, David Chernoff, Alanna Connors, Martin Hendry,
and Antonio Uribe |
Mon., 20 March |
Open Colloquium
(10:00-11:30am at SAMSI, Rm. 203) Michael Woodroofe:"Shape Restricted Estimation with Applications to Astronomy" |
. |
Wed., 22 March |
Martin Hendry:"The Best of Both Worlds?" [in PDF] (PPT file is small in size) Jogesh Babu:"Multivariate K-S and other related statistics" (Lecture note from Mar. 6th Astrostat seminar, check from pg 13.) |
Remote:
Ruth Barrera, Haywood Smith, Antonio Uribe, and Martin Weinberg |
Tue., 28 March |
We will meet at 1:30pm.
(Note the date change.) Quite number of references were discussed |
Remote:
Vincent Martinez, Haywood Smith, and Antonio Uribe |
Wed., 5 April |
Tom Loredo: "Recap on recent meetings"
References mentioned |
Remote:
Ruth Barrera, Martin Hendry, and Ji Meng Loh |
Wed., 12 April |
No meeting | . |
Wed., 19 April |
Woncheol Jang:
"Cluster Analysis of Massive Data with Application to Astronomy"
References mentioned |
Remote:
David Chernoff, Pablo de la Cruz, Martin Hendry, Tom Loredo,
Haywood Smith, and Antonio Uribe |
Wed., 26 April |
Ji Meng Loh: "Some Spatial Statistics for Astronomy" [Paper saving version] | Remote:
Ruth Barrera, David Chernoff, Pablo de la Cruz, Ji Meng Loh, Tom Loredo,
Haywood Smith, and Antonio Uribe |
Wed., 3 May |
Meeting Postponed | . |
Wed., 10 May |
Tom Loredo: | Remote:
David Chernoff, Pablo de la Cruz, Martin Hendry,
Hyunsook Lee, and Tom Loredo |
Name | Email Address | Affiliation |
Jogesh Babu | Penn State University, Dept. of Statistics | |
Ruth Barrera | National University of Colombia | |
Brendon Brewer | University of Sydney, School of Physics | |
David Chernoff | Cornell University, Dept. of Astronomy | |
Alanna Connors | Eureka Scientific | |
Pablo de la Cruz | Universitat de València, Dept. of Statistics | |
Gauri S. Datta | University of Georgia, Dept. of Statistics | |
Sam Finn | Penn State University, Center for Gravitational Wave Physics | |
Matthew Fleenor | University of N. Carolina, Department of Physics & Astronomy | |
Martin Hendry | University of Glasgow, Dept. of Physics & Astronomy | |
Angela Hugeback | University of Chicago, Dept. of Statistics | |
Woncheol Jang | Duke University, Dept. of Statistics | |
Kristofer Jennings | Purdue University, Dept. of Statistics | |
Chunglee Kim | Northwestern University | |
Hyunsook Lee | Penn State University, Dept. of Statistics | |
Kuo-Ping Li | University of North Carolina | |
Ji Meng Loh | Columbia University, Dept. of Statistics | |
Tom Loredo | Cornell University, Dept. of Astronomy | |
Vicent Martinez | Universitat de València, Observatori Astronomic | |
Haywood Smith | University of Florida, Astronomy Dept. | |
Antonio Uribe | Observatorio Astronómico Nacional, National University of Colombia | |
Francisco Vera | NISS | |
Martin Weinberg | University of Massachusetts Amherst, Dept. of Astronomy | |
David Wittman | University of California, Davis, Dept. of Astronomy |