SAMSI Summer Program on Multiplicity and Reproducibility in Scientific Studies

				19 T.W. Alexander Drive P.O. Box 14006 Research Triangle Park, NC 27709-4006 Tel: 919.685.9350 Fax: 919.685.9360 [email protected]
Programs Propose a Program Visitor Info
		2006 Summer Program on Multiplicity and Reproducibility in Scientific Studies July 10-28, 2006 Research Foci Reproducibility Subgroup Analysis Massive Multiple Testing Description of Activities Opening Workshop Transition Workshop Working Groups Further Information Research Foci Concerns over multiplicities in statistical analysis and reproducibility of scientific experiments are becoming increasingly prominent in almost every scientific discipline, as experimental and computational capabilities have vastly increased in recent years. This 2006 SAMSI summer program will look at the following key issues that arise. *Reproducibility:* A scientist plans and executes an experiment. A clinical trials physician runs a clinical trial assigning patients to treatments at random and blinding who has what treatment. A survey sampling person collects a survey. Scientists use statistical methods to help them judge if something has happened beyond chance. They expect that if others replicate their work, that a similar finding will happen. To clear a drug the FDA requires two studies, each significant at 0.05. A recent paper by Ioannidis (JAMA 2005; 294:218-228) showed startling and disconcerting lack of reproducibility of influential statistical studies published in major medical journals. It found that about 30% of randomized, double blinded medical trials failed to replicate and that 5 out of 6 non-randomized studies failed to replicate - about an 80% failure rate. We aim to explore and clarify the causes of failures to reproduce in more detail, not only in the Ioannidis paper, but also more broadly, identifying commonalities that lead to these problems, and attempting to estimate its prevalence. Multiplicities (both obvious and hidden) will be considered in particular, along with selection biases and regression to the mean. At the conclusion of the program, recommendations for scientific reporting and publication will be made. *Subgroup Analysis:* Large, complex data sets are becoming more commonplace and people want to know which subgroups are responding differently to one another and why. The overall sample is often quite large, but subgroups may be very small and there are often many questions. Genetic data is being collected on clinical trials. Which patients will respond better to a drug and which will have more severe side effects? Disease, drug, or side effects can result from different mechanisms. Identification of subgroups of people where there is a common mechanism is useful for diagnosis and prescribing of treatment. Large educational surveys involve groups with different demographics, different educational resources and subject to different educational practices. What groups are different; how are differences related to resources and practices? What really works and why? Is the finding the result of chance? There is a need for effective statistical methods for finding subgroups that are responding differently. There is a need to be able to identify complex patterns of response and not be fooled by false positive results that come about from multiple testing. Our idea is to bring together statisticians and subject experts to develop and explore statistical strategies to address the subgroup problem. The benefit will be creditable statistical methods that are likely to produce results that will replicate in future studies. *Massive Multiple Testing:* The routine use of massively multiple comparisons in inference for large scale genomic data has generated a controversy and discussion about appropriate ways to adjust for multiplicities. We will study different approaches to formally describe and address the multiplicity problem, including the control of various error rates, decision theoretic approaches, hierarchical modeling, probability models on the space of multiplicities, and model selection techniques. Besides applications in inference for genomic data we will consider similar problems arising in clinical trial design and analysis, record matching problems, classification in spatial inference, anomaly discovery and syndrome surveillance. The goal of this program is to identify the relative merits and limitations of the competing approaches for diverse applications, and to understand which features of reproducibility are addressed. *Program Leaders:* Peter Mueller (M.D. Anderson Cancer Center), Juliet Shaffer (U. Calif. Berkeley), Peter Westfall (Texas Tech. Univ., Chair); Stan Young (NISS, Local Scientific Coordinator); James Berger (SAMSI, Directorate Liaison), and Ray Carroll (Texas A&M, National Advisory Committee Liaison). Description of Activities Workshops The Opening Workshop will be held July 10-12, 2006, and will focus on clear formulation of the challenges in the area. This will set the stage for the subsequent Program research. A Transition Workshop will be held July 27-28, 2006, summarizing the results of the Program research and discussing the remaining challenges. Working Groups: The working groups, one in each of the three research foci areas listed above, will meet during the period July 13-26, 2006, to carry out the Program research. Multiple Testing Reproducibility Subgroups Analysis Further Information For additional information about the program or to apply to participate, write [email protected]. If interested in participating during the entire three week period, please send a letter describing your interest, along with a vita, to the indicated e-mail address. Application forms for workshop participation will be available later.
		Entire site © 2001-2008, Statistical and Applied Mathematical Sciences Institute. All Rights Reserved.