简体   繁体   中英

Use of formula in information.gain in R

In the function definition for the FSelector information.gain function,

information.gain(formula, data)

what exactly is the purpose of the formula? I'm trying to use the function to do feature selection for a classification task. In the few examples that I've seen online, it seems like the formula defines some kind of relationship between the class label and the features in the dataset. However, if this is the case, I don't know the exact linear relationship between the features and the labels since I'm performing a classification task, so what would the formula be?

You can use . to tell R that you want to analyse the dependency between a class variable and all other variables in the data frame. For example for the iris dataset:

> library(FSelector)
> information.gain(Species~., iris)
                attr_importance
Sepal.Length       0.4521286
Sepal.Width        0.2672750
Petal.Length       0.9402853
Petal.Width        0.9554360

If you want to analyse the interaction with respect to only a subset of the variables, you can use explicit names:

> information.gain(Species~Sepal.Length+Sepal.Width, iris)
                attr_importance
Sepal.Length       0.4521286
Sepal.Width        0.2672750

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM