# 2009-10 Program on Space-time Analysis for Environmental Mapping, Epidemiology and Climate Change

### Introduction

This 12 month SAMSI program focused on problems encountered in dealing with random space - time fields, both those that arise in nature and those that are used as statistical representations of other processes. The sub-themes of environmental mapping, spatial epidemiology, and climate change are interrelated both in terms of key issues in underlying science and in the statistical and mathematical methodologies needed to address the science. Researchers from statistics, applied mathematics, environmental sciences, epidemiology and meteorology were involved, and the program promoted the opportunity for interdisciplinary, methodological and theoretical research.

**Organizing Committee:***Program Leaders:* Noel Cressie (Ohio State University), Michael Stein (University of Chicago), Dongchu Sun (University of Missouri), Jim Zidek (University of British Columbia) - Chair *Scientific Advisory Committee:* Peter Diggle (Lancaster University), Peter Guttorp (University of Washington), Jesper Møller (Aalborg) *Local Scientific Coordinators:* Montse Fuentes (N.C. State University), Alan Gelfand (Duke University), Richard Smith (UNC-Chapel Hill) *Directorate Liaison:* Jim Berger (SAMSI)

*National Advisory Committee Liaison:* Jun Liu (Harvard University)

Additional leaders will be appointed from each of the theme areas mentioned below, from among those who will be long-term visitors.

**Fall Course** : Spatial Epidemiology

**Fall Course** : Theory of Continuous Space and Space-Time Processes

**Spring Course** : Spatial Statistics in Climate, Ecology and Atmospherics

**Working Groups**

Paleoclimate

Spatial Exposures and Health Effects

Interaction of Deterministic and Stochastic Models

Computation, Visualization, and Dimension Reduction in Spatio-Temporal Modeling

Spatial Extremes

Fundamentals of Spatial Modeling

Geostats

Spatial Point Processes

Non-Gaussian and Non-stationary Spatial Models

### Research Foci

**Environmental Mapping**

Spatial or spatial-temporal statistical analysis in environmetrics often entails the prediction of unobserved random fields over a dense grid of sites in a geographical domain, based on observational data from a limited number of sites and possibly simulated data generated by deterministic physical models. In important special cases, spatial prediction requires statisticians to estimate spatial covariance functions and generalized regression tools (also called geostatistical methods).

Many commercially available GIS packages include excellent visualization tools, but a dearth of spatial interpolation tools. In particular, the tools available are often not statistically based, and have been shown to perform very poorly compared to geostatistical tools.

Many standard geostatistical packages have the disadvantage that they do not take into account the variability in estimates due to estimating the covariance function. Most also do not incorporate the modern tools available to represent spatial covariance structures for nonstationary processes.

However, such tools for nonstationary processes have not been extended to multivariate fields except through often unrealistic, simple (Kronecker type) structures. Even more complicated are space-time structures that are non-separable, nonstationary in space and in time, or multivariate with structures that are not temporally symmetric.

Methods for spherical data, especially appropriate for climate research, are currently being developed, but they need to address complications similar to those that occur for multivariate random fields.

**Spatial Epidemiology**

Many studies during the past two decades have demonstrated a statistical association between exposure to air pollutants (principally, particulate matter and ozone) with various (mostly acute) human health outcomes, including mortality, hospital admissions, and incidences of specific diseases such as asthma. While a number of different study designs have been used, two dominate. The first, the time series studies, relate variations in daily counts of these adverse health outcomes with variations in ambient air pollution concentrations through multiple regression models that include air pollution concentrations while removing the effects of long-term trends, day of week effects, as well as possible confounders such as meteorology. However, the relative health risks of air pollution are small say compared to smoking. Thus some studies have through Bayesian hierarchical modeling combined the estimated air pollution coefficients for various urban areas to borrow strength.

A different kind of study design is needed for the more challenging problem of estimating the chronic (as against acute) effects of air pollution. This second kind of design involves the use of prospective studies that follow a specific group of individuals for several years or decades, and then relate health outcomes (including mortality, but also specific measures such a heart rate variability) to air pollution after adjusting for personal factors such as age, previous health history, and smoking.

Recently both kinds of studies have been paying more attention than in the past, to spatial effects. Thus, although traditionally, spatial correlations between the cities have been ignored, now multi-city time series studies recognize the increasing evidence pointing to spatially nonhomogeneous associations. As datasets become available that spatially resolve both air pollution and human health outcomes at finer scales, this effect is likely to increase in importance, making it highly desirable to develop spatial and spatio-temporal stochastic processes for the joint distributions of air pollution, human health outcomes and other relevant covariates.

In prospective studies, researchers consider the possible effects of spatially defined covariates such as distance between a residential location and the nearest road. They also recognize the importance of measurement error, in particular the discrepancy between ambient pollution concentrations as measured at monitoring sites and the personal exposure of individuals. In some urban areas, spatial variability in the pollution field is an important component of this error. So some studies have used spatial methods such as kriging and Bayesian prediction to reduce this error by inferring from the ambient measurements, the pollution concentrations at a participant's residence. However, much less work has been done on the logical follow-up question, which is the effect of such variability on the health-effect regression coefficients.

Challenges that face the practitioner of spatial epidemiology, include issues of data availability and quality, confidentiality, exposure assessment, exposure mapping, and study design. Geographic methods of exposure assessment make a number of key assumptions that may limit their applicability in given situations. These include the following:

- equating modeled estimates of exposure (including distance-based measures, or output of EPA exposure numerical models such as SHEDS) with true exposure;
- equating exposure at a point (e.g., place of residence) with total personal exposure, that is, exposure integrated across space and time over the course of daily activities as the individual moves through the spatial exposure field;
- equating group exposure and group exposure-disease relationships with individual exposure and relationships at the individual level, this phenomenon is known as "ecologic fallacy".

Key areas in which further work is needed include:

- Developing methods that account for a subject's movement through spatio-temporal exposure space.
- Developing calibration models whereby spatially sparse direct measurements of exposure can be combined with inexpensive, and therefore spatially dense, surrogates or predictors of exposure, to enable more precise estimation of the true exposure surface.

**Climate Change**

Much of the case for climate change and the estimation of its deleterious effects has relied on deterministic climate models that embrace physical and chemical modeling. The GCM [General Climate (or Circulation) Model] yields simulated climate data at fairly coarse spatial scales that serves as input to the RGCM (Regional GCM) that runs at finer spatial scales.

These models are at best, approximate representations of the real world, and, hence must be continually assessed. Model errors must be identified and characterized to provide statements about confidence in results. Further the computational overhead of these models mandates trade-offs between the number of realizations of a given model versus number of models used, using both current techniques of experimental design, design of computer experiments, as well as the development of new techniques. The current methods of dealing with this - arguably most important - model validity issue are based on statistical spatial modeling techniques; but these techniques have never been tested for the complexity of climate models.

The results of climate models are extremely multi-dimensional. It is very difficult to present all of this information concisely in a manner that can be understood by decision makers. *Dimension reduction* and *data presentation techniques* are needed for contrasting spatial data, explaining what is being presented, and determining how to describe the confidence of projections from non-random samples.

Also available for assessing climate change are observational data from different measurement platforms (satellites, weather balloons, surface thermometers, etc.). Like the simulated data, these can represent very different spatial scales. Many historical time series do not have old data for South America, Africa, or South-east Asia. Even in the satellite era - the most observed period in Earth's climate history - key observational datasets such as those for lower tropospheric temperatures involve significant uncertainties. Understanding, modeling, and analyzing these spatial and temporal uncertainties, in the context of the massive (but sparse) data and the impact on climate change, requires significant methodological and theoretical advances.

Another key observational data set is the record of changes in ocean heat content. To estimate changes in the heat content of the world's oceans from sparse data with time-varying biases and coverage, temperature information must be "infilled" over large volumes of the ocean. This is an area where development and fitting of sophisticated space-time models to sparse data is a critical need.

One more crucial need is taking coarse-resolution projections from global and regional climate models down to *estimates for small areas*. [Indeed downscaling and upscaling issues pervade the study of both simulated data and data.] This is not the usual small-area estimation problem. It is actually the opposite: the 'average' solution needs to be processed through local climate features - a very poorly understood process.

The potential *effects on humans* from climate change are wide ranging, especially since evidence suggests that *extreme events* are increasing in frequency as a result of global warming. Possible effects include the rise in infectious diseases such as malaria, and deaths caused by heat waves such as occurred in Europe in 2003, or wild fires such as occurred in October 2007 in California. The data that suggests these effects is spatial and, again, the scale of the data and the determination of its causal relationship to climate change require new understandings and methodologies.

### Description of Activities

**Workshops:** The *Opening Workshop* was held September 13-16, 2009 at the Radisson RTP in Research Triangle Park, NC. This workshop was aimed to engage as large a part of the statistics, mathematics, and relevant scientific communities as possible, with representative sessions from all of the main program topics. The *Transition Workshop* at the end of the program disseminated program results and charted a path for future research in the area. There were, also, mid-program workshops focused on each of the three key research areas mentioned above.

GEOMED: Spatial Epidemiology 2009 Workshop

November 14-16, 2009

GEOMED 2009 is the 6th international, interdisciplinary conference on geomedical systems. This meeting was a jointly sponsored event with SAMSI and so the meeting also represented a SAMSI workshop on spatial epidemiology as part of the 2009 - 10 SAMSI Program on Space-time Analysis for Environmental Mapping, Epidemiology and Climate Change. Today, more and more issues are arising in public health involving geography and medicine. GEOMED brings together statisticians, geographers, epidemiologists, computer scientists, and public health professionals to discuss methods of spatial analysis, as well as present and debate the results of such analyses.

**Working Groups:** Working groups met throughout the program to pursue particular research topics identified in the kickoff workshop (or subsequently chosen by the working group participants). The working groups consisted of SAMSI visitors, postdoctoral fellows, graduate students, and local faculty and scientists.