Postdoctoral Fellow Seminars: Fall 2019

August 28, 2019

Special Guest Lecture: Interpretable Machine Learning: Optimal Decision Trees and Optimal Scoring Systems

Location: SAMSI Classroom
Speaker: Cynthia Rudin, Prof. of Computer Science, Electrical and Computer Engineering, and Statistical Science, Duke University and Assoc. Director, SAMSI

Bio

Cynthia Rudin is a professor of computer science, electrical and computer engineering, and statistical science at Duke University, and directs the Prediction Analysis Lab, whose main focus is in interpretable machine learning. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo, and a PhD from Princeton University. She is a three time winner of the INFORMS Innovative Applications in Analytics Award, was named as one of the “Top 40 Under 40” by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. She is past chair of both the INFORMS Data Mining Section and the Statistical Learning and Data Science section of the American Statistical Association. She has also served on committees for DARPA, the National Institute of Justice, and AAAI. She has served on three committees for the National Academies of Sciences, Engineering and Medicine, including the Committee on Applied and Theoretical Statistics, the Committee on Law and Justice, and the Committee on Analytic Research Foundations for the Next-Generation Electric Grid. She is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics. She will be the Thomas Langford Lecturer at Duke University during the 2019-2020 academic year.

Abstract

How do patients and doctors know that they can trust predictions from a model that they cannot understand? Transparency in machine learning models is critical in high stakes decisions, like those made every day in healthcare. My lab creates machine learning algorithms for predictive models that are interpretable to human experts. I will focus on two historical hard optimization problems whose solutions are important in practice:

(1) Optimal sparse decision trees and optimal sparse rule list models. Our algorithms are highly customized branch and bound procedures. These are an alternative to CART and other greedy decision tree methods. The solutions are globally optimal according to accuracy, regularized by the number of leaves (sparsity). This problem is NP-hard with no polynomial time approximation. I will present the first practical algorithms for this problem.

(2) Optimal scoring systems. Scoring systems are sparse linear models with integer coefficients. Traditionally, scoring systems have been designed using manual feature elimination on logistic regression models, with a post-processing step where coefficients have been rounded. However, this process does not produce optimal solutions. I will present a novel cutting plane method for producing scoring systems from data. The solutions are globally optimal according to the logistic loss, regularized by the number of terms (sparsity), with coefficients constrained to be integers.

These algorithms have been used for many medical applications and criminal justice applications.

Work with Margo Seltzer and Berk Ustun, as well as Elaine Angelino, Nicolas Larus-Stone, Daniel Alabi, Sean Hu, and Jimmy Lin.

References

Presentation Slides

Video


September 4, 2019

Special Guest Lecture: There is a Kernel Method for That

Location: SAMSI Classroom
Speaker: Ernest Fokoue, Professor of Statistics, Rochester Institute of Technology

Bio

Ernest Fokoué is Professor of Statistics in the School of Mathematical Sciences at Rochester Institute of Technology. He enjoys the honor of being the primogenito of SAMSI postdoctoral fellows that got the institute going with the Data Mining and Machine Learning (DMML) program in 2003. He is one of the co-leaders of the 2019-2020 SAMSI Games, Decisions, Risk and Reliability (GDRR) program, and will be spending his whole sabbatical year contributing to its activities. His areas of research and teaching interests are Bayesian Statistics, Statistical Machine Learning, Computational Statistics, Epistemology, Theology and Linguistics.

Abstract

In this lecture, I will present a general tour of some of the most commonly used kernel methods in statistical machine learning and data mining. I will touch on elements of artificial neural networks and then highlight their intricate connections to some general purpose kernel methods like Gaussian process learning machines. I will also resurrect the famous universal approximation theorem and will most likely ignite a [controversial] debate around the theme: could it be that [shallow] networks like radial basis function networks or Gaussian processes are all we need for well-behaved functions? Do we really need many hidden layers as the hype around Deep Neural Network architectures seem to suggest or should we heed Ockham’s principle of parsimony, namely “Entities should not be multiplied beyond necessity.” (“Entia non sunt multiplicanda praeter necessitatem.”) I intend to spend the last 15 minutes of this lecture sharing my personal tips and suggestions with our precious postdoctoral fellows on how to make the most of their experience.

Presentation Slides


September 11, 2019

Special Guest Lecture: Attacking the Curse of Dimensionality using Sums of Separable Functions

Location: SAMSI Classroom
Speaker: Martin Mohlenkamp, Assoc. Professor, Dept. of Mathematics, Ohio University

Abstract

Naive computations involving a function of many variables suffer from the curse of dimensionality: the computational cost grows exponentially with the number of variables. One approach to bypassing the curse is to approximate the function as a sum of products of functions of one variable and compute in this format. When the variables are indices, a function of many variables is called a tensor, and this approach is to approximate and use the tensor in the (so-called) canonical tensor format. In this talk I will describe how such approximations can be used in numerical analysis and in machine learning.


September 18, 2019

Lecture: Multifidelity Computer Model Emulation with High-Dimensional Output

Location: SAMSI Classroom
Speaker: Pulong Ma, Second-Year SAMSI Postdoctoral Fellow

Abstract

Hurricane-driven storm surge is one of the most deadly and costly natural disasters, making precise quantification of the surge hazard of great importance. Physics-based computer models of storm surge can be implemented with a wide range of fidelity due to the nature of the system, though the danger posed by surge makes greater fidelity highly desirable. However, such models and their high-dimensional outputs tend to come at great computational cost, which can make highly detailed studies prohibitive. These needs make the development of an emulator combining high-dimensional output from multiple complex computer models with different fidelity levels important. We propose a parallel partial autoregressive cokriging model that is able to address these issues. Based upon the data-augmentation technique, model parameters are estimated via Monte Carlo expectation-maximization algorithm and prediction is made in a computationally efficient way when input designs across different fidelity levels are not nested. With this methodology, the high-fidelity storm surges can be generated much more quickly in coastal flood studies, and hence can facilitate the risk assessment of storm surge hazards.

References

No references provided at this time


September 25, 2019

Lecture: Analyzing Collective Motion with Machine Learning and Topology

Location: SAMSI Classroom
Speaker: John Nardini, Second-Year SAMSI Postdoctoral Fellow

Abstract

We use topological data analysis and machine learning to study a seminal model of collective motion in biology. This model describes agents interacting nonlinearly via attractive-repulsive social forces and gives rise to collective behaviors such as flocking and milling. To classify the emergent collective motion in a large library of numerical simulations and to recover model parameters from the simulation data, we apply machine learning techniques to two different types of input. First, we input time series of order parameters traditionally used in studies of collective motion. Second, we input measures based in topology that summarize the time-varying persistent homology of simulation data over multiple scales. This topological approach does not require prior knowledge of the expected patterns. For both unsupervised and supervised machine learning methods, the topological approach outperforms the traditional one.

References

No references provided at this time


October 2, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Wenjia Wang, Second-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


October 9, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Matthias Sachs, SAMSI Postdoctoral Fellow and Duke Researcher

Abstract

To be determined

References

No references provided at this time


October 16, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Ruda Zhang, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


October 23, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Xinyi Li, Second-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


October 30, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Bianca Dumitrascu, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


November 6, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Deborshee Sen, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


November 13, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Jaffer Zaidi, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


November 20, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Maggie Mao, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time


November 27, 2019

** NO LECTURE – Thanksgiving Break **


December 4, 2019

Lecture: To be determined

Location: SAMSI Classroom
Speaker: Jason Poulos, First-Year SAMSI Postdoctoral Fellow

Abstract

To be determined

References

No references provided at this time