Sparse Additive Modeling

Start: 10/30/2017 - 4:15pm
End  : 10/30/2017 - 5:15pm

Applied Math Seminar

Prof. Noah Simon (Department of Biostatistics, University of Washington)


With recent advances in high-throughput technology it is now common to collect enormous quantities of data on a small number of subjects. In particular, in the biomedical field, we often collect extremely high dimensional biomolecular information (eg. gene expression, dna-sequence, and/or epigenetic information). We are often interested in using this information to predict a phenotype: eg. does a person have a given disease? Or is a tumor susceptible to a particular therapy? In these cases we often believe that only a small subset of the biomolecular features are informative for the phenotypic response (though we generally do not know which). Given this, we would like a model-building procedure that selects and uses only a small subset of the available genomic features for predicting phenotype.

The LASSO is one common method that both performs feature selection, and fits a linear model on that selected subset. The LASSO is an attractive method as it is simple, has good theoretical behaviour, is computationally straightforward to fit, and empirically performs well in many applications. However, a linear model may sometimes not be a good approximation to the true underlying relationship between features and response. In this presentation we will discuss Sparse Additive Modeling, an extension to the LASSO that selects features, and fits a more flexible additive model in those features. This, more flexible framework, shares many of the attractive properties of the LASSO (parsimony, computational tractibility, theoretical guarantees, and empirical performance). In addition, in non-linear scenarios it may more adequately model our data.

(hopefully) Very little background will be assumed in this talk: We will begin by introducing and discussing the LASSO, non-parametric regression, and additive modeling. In addition, we will touch on convex optimization and numerical algorithms for high dimensional minimization.

Emmy Noether Rm Millikan 1021 Pomona College