
Statistical and Applied Mathematical Sciences Institute
19 T.
W. Alexander Drive
P.O. Box 14006
Research Triangle Park, NC
27709-4006
Tel: 919.685.9350 FAX: 919.685.9360
info@samsi.info
Technology Transfer
Short Course:
Data Mining and Machine Learning
NISS/SAMSI Building
July 25-29, 2005
BACKGROUND INFORMATION
SAMSI is instituting a new summer activity—technology transfer short courses designed to consolidate results from (in most cases, earlier years') SAMSI programs, and to make the results available to working professionals in a compact, hands-on format. The first such course is derived from the 2003-04 SAMSI program on Data Mining and Machine Learning (DMML).
The goals of the DMML technology transfer short course are to:
The theoretical component will emphasize ideas over rigor; the software component will sample the major techniques that are now commonly used for visualization, classification, and regression; and the applications component will walk participants through the practical analysis of some famous real-world data sets.
The structure of the short course is that there will be three hours of lecture each morning. Each afternoon will start with a 90 minute computer lab that goes over an application using real data and relevant software, followed by a 90 minute lecture by a guest speaker. There will be several breaks during the day.
The course begins with an introductory overview of data mining: its scope, classical approaches, and the heuristics that guided the initial development of theory and methods. Then the course moves towards the treatment of more modern issues such as boosting, overcompleteness, and large-p small-n problems. This leads to a survey of currently popular techniques, including random forests, support vector machines, wavelets, and PAC bounds.
The main focus is upon a central focus of the SAMSI DMML program—regression inference, a paradigm that informs many data mining applications, but we also discuss clustering, classification, and multidimensional scaling.
The prerequisites for the course are a basic knowledge of applied multivariate inference and a general level of statistical knowledge comparable to a master's degree. Any math will focus upon conveying general insight rather than specific details.
COURSE CONTENTS
INSTRUCTOR
Principal instructor for the course will be David L. Banks, Professor of the Practice of Statistics and Decision Sciences at Duke University, and co-leader of the SAMSI DMML program.
TENTATIVE SCHEDULE
Monday, July 25 |
|
| 9:00 AM - 12:00 N |
Introduction, Cross-Validation, the Bootstrap, Search Strategies, and Smoothing |
| 1:30 - 3:30 PM | Computer lab on G-Gobi visualization and smoothing |
3:45 - 5:15 |
Jack Liu, GlaxoSmithKline: "Visualization and Data Mining for Microarrays" |
Tuesday, July
26 |
|
| 9:00 AM - 12:00 N |
Review and comparison of nonparametric regression methods: AM, GAM, PPR, ACE, AVAS, MARS, CART, neural nets; the backfitting algorithm. |
| 1:30 - 3:30 PM |
Computer lab on the DRAT package for multivariate nonparametric regression |
3:45 - 5:15 |
J. S. Marron, UNC: "Issues with High Dimension, Low Sample Size Data" |
Wednesday, July
27 |
|
| 9:00 AM - 12:00 N |
Classification and Clustering: SVMs, random forests, boosting |
| 1:30 - 3:30 PM | Computer lab on classification and boosting |
| 3:45 - 5:15 |
Feng Liang, Duke: "Model Complexity and Regularization" |
Thursday,
July 28 |
|
| 9:00 AM - 12:00 N | Bases and Wavelets |
| 1:30 - 3:30 PM | Computer lab on SVMs and random forests |
| 3:45 - 5:15 | Merlise Clyde, Duke: "Bayesian Model Averaging" |
Friday,
July 29 |
|
| 9:00 AM - 12:00 N | PAC Bounds and VC Classes |
| 1:30 - 3:30 PM | Computer lab on wavelets (decimated and nondecimated) |
| 3:45 - 5:15 | David Banks, Duke: "Survey of New Ideas in Data Mining" |
APPLICATION
Enrollment in the short course is limited to 25. Applications, including requests for financial support, should be submitted as soon as possible. ONLY ONLINE APPLICATIONS WILL BE ACCEPTED. The application deadline is July 1, 2005. In order to ensure your application is correct, we ask that you:
Refresh/reload the application page to ensure you have all updates;
Type in your information (cutting and pasting will distort the information we receive);
Make any clarifications/corrections, in the Special Requests section;
Click the submit button only once.
You will be notified as soon as possible (generally within three days) whether your application has been accepted, at which point you will be required to submit payment.
© 2005, Statistical and Applied Mathematical Sciences Institute. All rights reserved.