Spring 2021 Undergraduate Virtual Course: A Survey of Data Science

Registration is by Invitation Only

Deadline is February 5, 2021

Modern data science is driven by applications, and these often entail Big Data and machine learning perspectives. This course reviews key ideas and methods in nonparametric regression (starting with cross-validation and light bootstrap asymptotics, then moving on to the additive model, the generalized additive model, and neural networks). It also covers variable selection, with the Lasso and the Median Model, and describes the p >> n problem in the context of contributions by Candes and Tao, Donoho and Tanner, and Wainwright. The course next treats classification, with emphasis upon Random Forests, support vector machines, boosting, and ensemble strategies such as bagging, stacking, and boosting.  Additionally, there will be some coverage of cluster analysis, multidimensional scaling, deep learning, and topic modeling,  The level of the course is aimed at advanced undergraduates and early graduate students.   Some knowledge of regression is helpful.

Questions: email [email protected]