Postdoctoral Fellow Seminars: Spring 2019

January 30, 2019 12:00pm – 2:15pm

Special Guest Lecture: Data Science in the Ad Tech Industry

Location: SAMSI Classroom
Speaker: Gene Ferruzza, Valassis Digital

Bio:
Gene Ferruzza manages the Data Science team at Valassis Digital. Working in data science his entire career he started as a software engineer developing neural network applications in the early days of neural computing. His work over the years has focused on the design and deployment of analytical technologies driving digital consumer communications. At Valassis Digital his work is focused inside of a data science and engineering team working together to enhance AI driven processes that create intelligence and optimize the relevance between consumers and advertisers. Gene has a BS degree in Computer Science and Mathematics from the University of Pittsburgh.

Abstract

This session will outline the functionality and technology challenges of the online advertising display industry and how data science is being leveraged in every aspect of its operation. Over the past 10 years automated or “programmatic” online advertising has grown from non-existent to a major component driving 80% of advertising content that online users see when browsing. Nearly half of the advertising content is delivered through Real Time Bidding (RTB), the buying of ad space via an auction that occurs within the milliseconds it takes for a webpage to load. What will be covered is the underlying process of RTB when delivering online ads every time a browser brings up a website, and a look at how Valassis Digital uses data, expert systems and machine learning to drive intelligence into the operation.


February 6, 2019 1:15pm – 2:15pm

Lecture: Sea Ice and Data Assimilation: Challenges and Proposed Approaches

Location: SAMSI Classroom
Speaker: Christian Sampson, Second-Year SAMSI Postdoctoral Fellow

Abstract

Sea ice dynamics are driven by a complex set of processes from the small to the large scale. The ice has become much more dynamic in recent years with earlier melt onset and lower sea ice extent in late summer. This has also led to a new Arctic made up of less multi-year ice. As the Arctic opens up, accurate numerical sea ice state prediction will become increasingly important for both scientific and operational applications. Today, most large scale sea ice models solve the sea ice momentum balance equation on an Eulerian grid. These first generation models work fairly well for long run climate studies, however, they typically fail to capture important ice characteristics such as lead formation, important for ship navigation and calculation of heat fluxes. The increasingly dynamic Arctic should instead be modeled while keeping in mind what it is, a Lagrangian set of floes interacting with each other and their environments. However, the Lagrangian view can make data assimilation difficult for most sea ice data products. In this talk I will describe two relatively new sea ice models which take a Lagrangian approach to simulating ice dynamics, MPM-ice and neXtSIM. I will discuss some of the issues associated with data assimilation in these models, as well as sea ice in general, and outline some approaches we think will advance our ability to accurately predict sea ice states.

References

No references provided at this time


February 20, 2019 1:15pm – 2:15pm

Lecture: A Method for High-dimensional Non-Gaussian Spatial Data

Location: SAMSI Classroom
Speaker: Yawen Guan, Second-Year SAMSI Postdoctoral Fellow

Abstract

Non-Gaussian spatial data are common in many environmental disciplines. Spatial generalized linear mixed models (SGLMMs) are flexible models for such data, but inference for SGLMMs is computationally expensive, especially when the data are high-dimensional. I will present a new method that replaces high-dimensional spatial random effects with a reduced-dimensional representation based on random projections. I will discuss estimation in a Bayesian framework via Markov chain Monte Carlo (MCMC), as well as maximum likelihood estimation via a Markov chain Monte Carlo Expectation Maximization (MCMC-EM) algorithm.

References

No references provided at this time


February 27, 2019 1:15pm – 2:15pm

Special Guest Lecture: Two Extensions of the Stochastic Block Model for Network Clustering Motivated by Three Datasets in Sociology, Ecology and Ethnobiology

Location: SAMSI Classroom
Speaker: Pierre Barbillon, Agro Paris Tech

Abstract

Clustering nodes by detecting communities for instance, is a standard issue when analyzing network data. The Stochastic Block Model (SBM) is a flexible latent-variable model widely used for unraveling structures in networks. The latent variables are categorical and correspond to a clustering of nodes. Any connection between two nodes is then modeled as a draw of a Bernoulli random variable, the probability of which depends on the latent variables associated with these nodes. In this talk, we will propose two extensions of the SBM in order to handle on the one hand multiplex networks (several possible connections between two nodes) and on the other hand multipartite networks (several networks involving the same nodes). The multiplex extension is motivated by an example on a dataset on French cancer researchers for whom we have their direct connections as an advice network and connections through their labs as a resource sharing network. The multipartite extension is concerned with data on plant – pollinator, plant – ant and plant – bird interactions and with data on seed sharing among farmers and inventories of plant for each farmer.

We will present an inference method for these extensions based on a variational version of the Expectation-Maximization algorithm and a model selection procedure for determining the number of clusters based on a penalized likelihood criterion.

References

No references provided at this time


March 6, 2019 1:15pm – 2:15pm

Lecture: Design and Distributed Algorithms

Location: SAMSI Classroom
Speaker: Cheng Cheng, Second-Year SAMSI Postdoctoral Fellow

Abstract

Graph signal processing provides an innovative framework to process data on graphs. In this talk, I will discuss the graph filter to process data on a sparse graph from the design to distributed algorithms. The Chebyshev polynomial approximation of high order has been widely used in the approximation of the graph multiplier operators. We propose an iterative Chebyshev polynomial approximation (ICPA) algorithm to implement the inverse filtering procedure, which is feasible to eliminate the restoration error even using Chebyshev polynomial approximation of lower order. I will discuss the distributed implementation of the ICPA algorithm on a spatially distributed network, show how can ICPA algorithm can be used in signal denoising.

References

No references provided at this time


March 20, 2019 1:15pm – 2:15pm

Special Guest Lecture: Opportunities for Collaboration between the Duke Master in Interdisciplinary Data Science Program and SAMSI

Location: SAMSI Classroom
Speaker: Thomas Nechyba and Jana Schaich Borg, Duke University

Abstract

In this informal presentation, I will introduce Duke’s new Master in Interdisciplinary Data Science program. I will share what we believe makes the program unique, and discuss multiple ways that it would be great to get SAMSI researchers involved.

Click on image below to view presentation:

 

 

 

 


March 27, 2019 1:15pm – 2:15pm

Lecture: Learning Personalized PDEs for Biological Transport Models from Noisy Data

Location: SAMSI Classroom
Speaker: John Nardini, First-Year SAMSI Postdoctoral Fellow

Abstract

The Fisher-KPP partial differential equation (PDE) model has been widely used to predict and diagnose tumor progression in glioblastoma patients. While this equation has proven to be a useful model in describing tumor progression, we do not know if it is the optimal reaction-diffusion equation to do so. Performing a typical model selection study to investigate this would be computationally prohibitive, so we instead consider the problem of learning the dynamics of a given noisy dataset using sparse regression methods. Recent studies in this area have only been successful in the presence of very small amounts of noise. We accordingly develop a method to denoise noisy data for use in an equation learning framework and demonstrate that this method can correctly identify the PDE model that generated noisy spatiotemporal data. This work is a first step towards developing a methodology to generate data-driven models from patient data.

References

No references provided at this time


April 10, 2019 1:15pm – 2:15pm

Lecture: On the Inference of Applying Gaussian Process Modeling to a Deterministic Function

Location: SAMSI Classroom
Speaker: Wenjia Wang, First-Year SAMSI Postdoctoral Fellow

Abstract

We investigate applying Gaussian process modeling to a deterministic function from prediction and uncertainty quantification perspectives. The upper bound and optimal convergence rate of prediction of Gaussian process modeling has been extensively studied in the literature, while a thorough exploration of the convergence rate and the theoretical study of uncertainty quantification are lacking. We prove that, if we use maximum likelihood estimation, under different choices of nugget parameters, the constructed predictor is not optimal and/or the estimated confidence interval is not reliable. The results suggest that, if one uses Gaussian process modeling to a deterministic function, the reliability of the confidence interval and the optimality of predictors cannot be achieved at the same time, unless further information of the underlying function is known.

References

No references provided at this time


April 17, 2019 1:15pm – 2:15pm

Lecture: Additive Partially Linear Models for Ultra‐High‐Dimensional Regression

Location: SAMSI Classroom
Speaker: Xinyi Li, First-Year SAMSI Postdoctoral Fellow

Abstract

Abstract: We consider a semiparametric additive partially linear regression model (APLM) for analysing ultra‐high‐dimensional data where both the number of linear components and the number of non‐linear components can be much larger than the sample size. We propose a two‐step approach for estimation, selection, and simultaneous inference of the components in the APLM. In the first step, the non‐linear additive components are approximated using polynomial spline basis functions, and a doubly penalized procedure is proposed to select nonzero linear and non‐linear components based on adaptive lasso. In the second step, local linear smoothing is then applied to the data with the selected variables to obtain the asymptotic distribution of the estimators of the nonparametric functions of interest. The proposed method selects the correct model with probability approaching one under regularity conditions. The estimators of both the linear part and the non‐linear part are consistent and asymptotically normal, which enables us to construct confidence intervals and make inferences about the regression coefficients and the component functions. The performance of the method is evaluated by simulation studies. The proposed method is also applied to a data set on the shoot apical meristem of maize genotypes.

References

Supplementary A for “Additive Partially Linear Models for Ultra-high-dimensional Regression” – Xinyi Li< Li Wang and Dan Nettleton


April 24, 2019 1:15pm – 2:15pm

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Pulong Ma, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


May 1, 2019 1:15pm – 2:15pm

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Matthias Sachs, Second-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time