Workshop on Distributed and Parallel Data Analysis (DPDA): September 21-23, 2016

Application Deadline is August 5, 2016


The workshop venue has changed. It will now be held at North Carolina State University:

Institute of Advanced Analytics
NC State Centennial campus
901 Main Campus Drive, Raleigh, NC 27606


The workshop aims to bring academic researchers and industrial engineers together for the exploration and scientific discussions on recent challenges faced by practitioners and related theories and proven best practices in both academia and industries on distributed data analytics.

In recent works of computational mathematics and machine learning, great strides have been made in distributed optimization and distributed learning. For example, using ‘consensus’ on local variables and global variable, the Alternating Direction Method of Multipliers (ADMM) algorithm can be utilized to solve a distributed version of the LASSO problem. On the other hand, classical statistical methodology, theory, and computation are based on the assumption that the entire data are available at a central location; this is a significant shortcoming in modern problem solving. It is known that computing speed at a single machine can be thousands time faster than the data transmission between locations.

Specific goals of the workshop include (i) exposing academic researchers to both the challenges in industrial applications and current computing tools being used in industry, (ii) introducing industrial researchers to the frontiers of applied mathematical and statistical  methods regarding distributed inference, and (iii) educating graduate students and early-career researchers about practical computing and theoretical studies in distributed analytics. The workshop will begin with few tutorial type lectures followed by lectures and panels on state-of-the-art research based methods by leading researchers and practitioners in this emerging field of mathematics.

The workshop will be limited to about 50 participants and funding support priority will be given to U.S. based researchers.

Schedule and Supporting Media

Speakers Titles/Abstracts

Printable Schedule



Wednesday, September 21st
NCSU Centennial Campus


Time Description Speaker Slides Videos
8:50– 9:00 Welcome and Introductory Remarks Sujit Ghosh, SAMSI PDF
9:00 –9:35 Scalable Probabilistic Inference from Big and Complex Data  David Dunson, Duke PDF
9:40-10:15 Asynchronous Parallel Coordinate Update Algorithms Wotao Yin, UCLA PDF
10:50–11:25 Distributed Hyper-Parameter Optimization for Machine Learning Yan Xu, SAS PDF
11:30–12:05 Interaction Selection and Screening for High Dimensional Data Helen Zhang, University of Arizona PDF
1:30–2:05 Privacy-Preserving Methods for Handling Missing Data in Distributed Health Data Networks Qi Long, Emory
2:10–2:45 DPDA Application in Predix Ecosystem for Real-time Monitoring and Diagnostics of Energy Assets Xiaomo Jiang, GE Power
2:50–3:25 A Sequential Split-Conquer-Combine Approach for Gaussian Process Model in Analysis of Big Spatial Data Min-ge Xie, Rutgers PDF
4:00–4:15 Funding Opportunities at NSF Yong, Zeng, NSF
4:15–5:00 Discussion: Lightning Talks Discussion Moderator, Sujit Ghosh, SAMSI Alexander
Liu & Mei

Thursday, September 22nd
NCSU Centennial Campus

Time Description Speaker Slides Videos
9:00-9:35 Distributed Estimation and Inference with Statistical Guarantees Jianqing Fan, Princeton
9:40-10:15 HPDA Growth Constraints in Digital Marketing Samuel Franklin, 360i
10:50-11:25 Bayesian Aggregation for Extraordinarily Large Dataset Guang Cheng, Purdue
11:30-12:05 Blessing of Massive Scale Han Liu, Princeton
1:30-2:05 Bayesian Neural Networks for High Dimensional Nonlinear Variable Selection Faming Liang, University of Florida
2:10-2:45 Strategies & Principles for Distributed Machine Learning Eric Xing, Carnegie Mellon
2:50-3:25 Challenges and Opportunities in Automated Driving and Connected Vehicles Yilu Zhang and Wei Tong, GM PDF
4:00–4:35 Parallel Local Graph Clustering  Kimon Fountoulakis, UC Berkeley

Friday, September 23rd
NCSU Centennial Campus

Time Description Speaker Slides Videos
9:40-10:15 Scalable and Robust Statistical Estimation: a tale of the geometric median Stas Minsker, University of Southern California PDF
10:50-11:25 Uncover Customer Insights with Apache Spark and ML Bo Zhang, IBM
11:30-12:05 Some Recent Development in Spatial Statistics for Large Datasets Raj Guhaniyogi, University of California, Santa Cruz PDF

Questions: email