Title: Variable Importance Measures in Classification and Regression Methods
Abstract: Due to the huge success of machine learning methods in various applications, variable importance measures have become increasingly popular. When building models using different kinds of learners, it is often of interest, not only to compare these models with respect to their performance on a hold out test set, but also to compare the models from a structural point of view, e.g. the “importance” of single predictors. We are interested to know whether there is a conceptual framework that unifies some methods for quantifying variable importance in a given regression or classification setting. Ideally this variable importance measure should be model independent.
In my talk I will first highlight desired properties of a reasonable variable importance measure. Based on that, I will present some of the most common measures used in a linear regression setting, where we will see that, even in a linear regression setting, it is not straightforward to define a reasonable variable importance measure fulfilling most of the desired axioms. Next, I will introduce random forests and talk about several permutation based variable importance measures used for this class of models. This permutation based approach leads us to a unifying measure, which can be applied to all kinds of supervised learning algorithms. At the end of my talk I will present one possible application of the permutation based variable importance measure for a credit scoring dataset.