Prediction intervals for random forests

Start: 04/16/2018 - 4:15pm
End  : 04/16/2018 - 5:15pm

Applied Math Seminar

Jo Hardin (Pomona College)


Although random forests are commonly used for regression, our understanding
of the prediction error associated with random forest predictions of individual re-
sponses is relatively limited. We introduce a novel measure of this error and evaluate
its properties, comparing it with the out-of-bag mean of squared residuals estimator
that, to our knowledge, is the only measure of random forest prediction error that
has been introduced in the literature thus far. We show that our proposed estimator
provides an individualized estimate of the error associated with a particular random
forest prediction, while the out-of-bag mean of squared residuals estimator provides
a more general estimate of the random forest's prediction error as a whole. Through
simulations on benchmark and simulated datasets, we also demonstrate that both
estimators of prediction error may form the bases for valid random forest predic-
tion intervals. Empirically, these prediction intervals performed as well as quantile
regression forest prediction intervals.

Emmy Noether Rm Millikan 1021 Pomona College

Claremont Graduate University | Claremont McKenna | Harvey Mudd | Pitzer | Pomona | Scripps
Proudly Serving Math Community at the Claremont Colleges Since 2007
Copyright © 2018 Claremont Center for the Mathematical Sciences