# 2010-11 Program on Complex Networks

This year-long SAMSI program focused on the emerging area of network science. This highly interdisciplinary field is characterized by novel interactions in the mathematical sciences occurring at the interface of applied mathematics, statistics, computer science, and statistical physics, as well as those areas with network-oriented thrusts in biology, computer networks, engineering, and the social sciences. The program was built around four interconnected research foci:

- network modeling and inference
- flows on networks
- network models for disease transmission
- dynamics of networks

Through empirical, applied and analytical approaches, the researchers in the program worked toward a better understanding of the networked systems we encounter in nature or build for technological purposes.

*Organizing Committee:*

*Program Leaders:* Eric Kolaczyk (Boston University), Alex Vespignani (Indiana University)

*Scientific Advisory Committee:* Pierre Degond (Institut de Mathématiques de Toulouse), Stephen Fienberg (Carnegie Mellon University), Martina Morris (University of Washington)

*Local Scientific Coordinators:* Alun Lloyd (North Carolina State University), Peter Mucha (University of North Carolina)

*Directorate Liaison:* Pierre Gremaud (North Carolina State University)

*National Advisory Committee Liaison:* Bin Yu (University of California-Berkeley)

### Research Foci

### Network modeling and inference

The analysis of network data has become a major endeavor across the sciences, and network modeling plays a key role. Frequently, there is an inferential component to the process of network modeling i.e., inference of network model parameters, of network summary measures, or of the network topology itself. For most standard types of data (e.g., independent and identically distributed, time series, spatial, etc.), there is a well-developed mathematical infrastructure guiding modeling and inference in practice. In the context of network data, however, such an infrastructure is largely lacking.

To date, the majority of the energy on network modeling has been devoted to the specification of network models (e.g., through classes of random graphs or through generative mechanisms). There has been some work in recent years noticeably advancing our understanding about fitting parameters for certain classes of network models (i.e., exponential-family random graph and latent space models primarily, but also a bit involving generative models). There also is a substantial older literature on inference of network summary measures (e.g., triad censuses, centrality measures, etc.), under various sampling designs, and a small but active recent literature picking up some of the older threads in the modern context. Nevertheless, both areas are arguably still in their early and formative stages, falling short of what we would like to demand of them in practice. Moreover, while these two areas have developed along largely distinct paths in the literature to date, the incorporation of the sampling-based perspective of the latter with the model-based perspective of the former is clearly needed in many practical contexts. Finally, there is a large and growing body of literature on the inference of network topology. However, while this area is rich in methodology, it is poor in the supporting concepts and mathematics necessary to carefully quantify issues relating to validation of inferred networks.

Current limitations in this area can perhaps be traced in no small part to the inherent tension between the simplicity of network models needed for tractability (e.g., of simulation, interpretation, and mathematical study) and the complexity needed to accurately describe reality. Realistically, the tasks of model specification and model inference need to be more closely tied together, with each being informed by the other.

### Flows on networks

In their simplest form, network flows are defined on directed graphs. Each edge receives a flow in an amount that cannot exceed the capacity of the edge. Many transport applications correspond to network flows: hydraulics and pipeline flows, rivers, sewer and water systems, traffics and roads, supply chains and cardiovascular systems, to name but a few. Several by now classical problems for network flows such as maximum flow have been solved for static flow. These results only partially carry over to dynamic flows (time extended networks) and much remains to be done. Some applications such as communication systems typically split data into packages. There are obvious technical limitations regarding the fineness of such decompositions that have to be taken into account when seeking (quasi-) optimal solutions. Several of the relevant open questions fall under the umbrella of combinatorial optimization.

Transport problems on networks may behave in unexpected ways due to interactions between their different components. For instance, networks containing closed loops/circuits may exhibit phenomena such as localized and/or sustained oscillations; even simple networks, such as Boolean networks, may exhibit phase transition from ordered to chaotic dynamics. Similarly, the statistical modeling and analysis of various types of network flow measurements includes a number of highly ill-posed but sparse inverse problems. The very topology of the networks therefore plays a fundamental role in the behavior of the problems defined on them. Examples abound such as for instance the properties of 3D networks built by social insects and how the network's topology and geometry influence the traffic organization of insects inside the structure. Neither theory nor numerical methods can/should be devised for such applications by simple "superposition" of existing results or methods for problems in standard domains. The construction of efficient mathematical, numerical and statistical tools for such applications is an important challenge.

### Network models for disease transmission

Network models provide a natural way to model many infectious diseases. Many diseases, such as sexually transmitted infections (STIs), have long been studied in terms of networks, but in recent years the approach has been adopted in a wider range of disease settings, including acute rapidly-spreading infections. Disease transmission networks are highly dependent on the infection of interest: the sexual partnership network across which an STI spreads has a quite different structure to the social network on which a respiratory infection (such as influenza) would spread. Even in the same population, different diseases "see" different networks.

Broadly speaking, network-based disease studies have either involved detailed modeling of a specific infection in a particular setting ('tactical models') or attempted to elucidate general principles of how particular network structures impact disease transmission ('strategic models'). Tactical models require a great deal of information about the structure of the relevant network (in addition to the biological details of the disease transmission process). They draw heavily on statistical efforts to quantify real-world networks. Strategic studies, on the other hand, typically focus on one aspect of network structure (such as distance or clustering) and examine its impact on the spread of infection. Simulation-based studies of this kind are reliant on algorithms that generate prototypical networks having a specified property (e.g. the Watts-Strogatz small world network, the Barabasi-Albert scale-free network or Newman's clustered network). Disease network models stand to benefit from advances in statistical methodologies for sampling and quantifying networks as well as in network-generation algorithms.

A number of analytic approaches have been used to study the spread of infection on networks. The use of percolation and branching process theory has been particularly fruitful, providing results on epidemic thresholds, outbreak probabilities and outbreak size distributions. Moment closure approaches have also been widely used to capture the impact of various aspects of network structure, such as clustering and local spatial structure. Much of this work, however, has assumed that the transmission network is static. In some settings this static picture is unlikely to provide a good description: in a monogamous population, the persistence of a sexually transmitted infection relies upon the break-up and formation of partnerships. Dynamic network structure can be important even for acute, rapidly spreading, infections. The development of analytic techniques that can describe the spread of infection on dynamic networks is a major and important challenge.

### Dynamics of networks

The changing structure of networks over time is inherent in the study of a broad array of phenomena. Examples for which a static transmission network is inadequate abound: from disease transmission to communications networks with changing landscape of connections to political networks where associations and voting similarities vary from one legislative session to the next. While the nature of the underlying processes differs, the flow of generalized information, for all three examples, depends in a nontrivial way on the changes in the node roles, in the structure of communities and in other coarse structural units. The importance of dynamics in networks has been long recognized. The increasing accessibility of network data has led to renewed interest in this area; examples of data include longitudinal data waves and financial correlations with strengths of connection defined over moving windows in time.

So far, most of the theoretical modeling work done on the dynamics of networks has been focused on the statistical equilibria of those models (e.g., growing networks by preferential attachment) or on one-time disruption events (e.g., the effect of knocking out hubs). At the same time, computational tools for analyzing and visualizing time-varying networks remain relatively few in number, especially as compared to the wealth of advances in methods for modeling and analyzing static networks. There is thus both need and opportunity for more thorough mathematical and statistical analysis and modeling of dynamic networks.

### Description of Activities

**Workshops:** The *Opening Workshop* was held August 29-September 1, 2010 at the Radisson RTP. This workshop aimed to engage as large a part of the statistics, mathematics, and relevant scientific communities as possible, with representative sessions from all of the main program topics. The *Transition Workshop* at the end of the program disseminated program results and charted a path for future research in the area. There were, also, mid-program workshops focused on each of the three key research areas mentioned above.

**Working Groups:** Working groups met throughout the program to pursue particular research topics identified in the kickoff workshop (or subsequently chosen by the working group participants). The working groups consist of SAMSI visitors, postdoctoral fellows, graduate students, and local faculty and scientists.

### Further Information

Additional information about the program and opportunities to participate in it:

- For other questions, send E-mail to [email protected]
- Related program: http://www.mitacs.ca/events/index.php?option=com_content&view=article&id=2&Itemid=2