# 2008-09 Program on Algebraic Methods in Systems Biology and Statistics

### Introduction

In recent years, methods from algebra, algebraic geometry, and discrete mathematics have found new and unexpected applications in systems biology as well as in statistics, leading to the emerging new fields of "algebraic biology" and "algebraic statistics." Furthermore, there are emerging applications of algebraic statistics to problems in biology. This year-long program provided a focus for the further development and maturation of these two areas of research as well as their interconnections. The unifying theme was provided by the common mathematical tool set as well as the increasingly close interaction between biology and statistics. The program allowed researchers working in algebra, algebraic geometry, discrete mathematics, and mathematical logic to interact with statisticians and biologists and make fundamental advances in the development and application of algebraic methods to systems biology and statistics. The essential involvement of biologists and statisticians in the program provided the applied focus and a sounding board for theoretical research

*Organizing Committee:*

Peter Beerli (School of Computational Sciences and Department of Biological Sciences, Florida State University)

Andreas Dress (Director, CAS-MPG Partner Institute for Computational Biology, Shanghai)

Mathias Drton (Department of Statistics, University of Chicago)

Ina Hoeschele (Department of Statistics, Virginia Tech, and Virginia Bioinformatics Institute)

Christine Heitsch (School of Mathematics, Georgia Tech)

Serkan Hosten (Department of Mathematics, San Francisco State University)

Reinhard Laubenbacher, *Committee Chair* (Department of Mathematics, Virginia Tech and Virginia Bioinformatics Institute)

Bud Mishra (Departments of Computer Science, Mathematics, and Cell Biology, Courant Institute, NYU)

Don Richards (Department of Statistics, Pennsylvania State University)

Seth Sullivant (Department of Mathematics, N.C. State University)

Brett Tyler (Department of Plant Pathology and Weed Science, Virginia Tech, and Virginia Bioinformatics Institute)

Ruriko Yoshida (Department of Statistics, University of Kentucky).

### Research Foci

**Systems Biology** - The development of revolutionary new technologies for high-throughput data generation in molecular biology in the last decades has made it possible for the first time to obtain a system-level view of the molecular networks that govern cellular and organismal function. Whole genome sequencing is now commonplace, gene transcription can be observed at the system level and large-scale protein and metabolite measurements are maturing into a quantitative methodology. The field of systems biology has evolved to take advantage of this new type of data for the construction of large-scale mathematical models. System-level approaches to biochemical network analysis and modeling promise to have a major impact on biomedicine, in particular drug discovery.

**Statistics** - It has long been recognized that the geometry of the parameter spaces of statistical models determines in fundamental ways the behavior of procedures for statistical inference. This connection has in particular been the object of study in the field of information geometry, where differential geometric techniques are applied to obtain an improved understanding of inference procedures in smooth models. Many statistical models, however, have parameter spaces that are not smooth but have singularities. Typical examples include hidden variables models such as the phylogenetic tree models and the hidden Markov models that are ubiquitous in the analysis of biological data. Algebraic geometry provides the necessary mathematical tools to study non-smooth models and is likely to be an influential ingredient in a general statistical theory for non-smooth models.

**Algebraic methods**

*Algebraic biology* is emerging as a new approach to modeling and analysis of biological systems using tools from algebra, algebraic geometry, discrete mathematics, and mathematical logic. Application areas cover a wide range of molecular biology, from the analysis of DNA and protein sequence data to the study of secondary RNA structures, assembly of viruses, modeling of cellular biochemical networks, and algebraic model checking for metabolic networks, to name a few.

*Algebraic statistics* is a new field, less than a decade old, whose precise scope is still emerging. The term itself was coined by Giovanni Pistone, Eva Riccomagno and Henry Wynn. Their book explains how polynomial algebra arises in problems from experimental design and discrete probability, and it demonstrates how computational algebra techniques can be applied to statistics. The first of these applications have focused on categorical data and include the study of Markov bases and conditional inference, disclosure limitation, and parametric inference, to name a few.

The central idea underlying algebraic statistics is that the parameter spaces of many statistical models are (semi-)algebraic sets. The geometry of such possibly non-smooth sets can be studied using tools from algebraic geometry. As shown in the book of Pachter and Sturmfels, many problems in computational biology can be described within this framework. This is where algebraic statistics joins algebraic biology as a new methodology for solving problems in systems biology.

The unifying theme of the program was the development and use of a particular set of tools from algebra, algebraic geometry, and discrete mathematics to solve problems in statistics and biology.

### Description of Activities

**Workshops:** The Kickoff Workshop and Tutorial was held September 14-17, 2008. The principal goal of the workshop was to engage a broadly representative segment of the mathematical, statistical, and life sciences communities to determine research directions to be pursued by working groups during the program.

There were also mid-program workshops organized by the working groups, and a Transition Workshop, at the end of the program, to disseminate program results and chart a path for future research in the area.

**Working Groups:** The working groups met regularly throughout the program to pursue particular research topics identified in the kickoff workshop (or subsequently chosen by the working group participants). The working groups consist of SAMSI visitors, postdoctoral fellows, graduate students, and local faculty and scientists. It was not necessary to be continually resident at SAMSI to maintain a connection to the working groups.

- Dynamics from Structure
- Evolutionary Biology
- Network Inference
- Algebraic Statistics and Experimental Design

### Further Information

If you have questions, send email to algebraicmethods@samsi.info