简体   繁体   中英

How to properly calculate all weights with FSelector package?

I'm trying to calculate weights of a dataset in R by using the FSelector package. The data is taken from this location .

data = read.csv("filepath/Indian Liver Patient Dataset (ILPD).csv")
names(data)<-c("Age","Gender", "TB", "DB", "Alkphos", "Sgpt", "Sgot", "TP", "ALB", "A/G Ratio", "Selector")
library(FSelector)
weights <- gain.ratio(Selector ~., data)
print(weights)

I can't calculate all of the weights. When I use the gain.ratio function, the Age weight is NaN. When I use chi.squared function instead, both Age and A/G Ratio are zeroes. When I take first 200 elements from data and calculate weights, only five of them are calculated corectly, and other are zeroes or NaN.

I tried deleting wrong elements from data by data <- na.omit(data) but it didn't change the result.

How can I calculate weights correctly?

Below is an example of a weight print.

Age             0.0000000
Gender          0.1304229
TB              0.3281865
DB              0.3238010
Alkphos         0.2965842
Sgpt            0.2734633
Sgot            0.3120432
TP              0.2504747
ALB             0.3051724
A/G Ratio       0.0000000

Zero is a valid value for feature importance -- it means that the feature does not have any information with respect to the classification target. The NaNs are caused by a bug in FSelector that divides by 0 if a feature carries no information. I've fixed this in the development version.

The name "A/G Ratio" is not a valid R identifier and therefore causes problems with some of the methods. Below the code that fixes this and installs the development version of FSelector.

data = read.csv("Indian\ Liver\ Patient\ Dataset\ (ILPD).csv")
names(data)<-c("Age","Gender", "TB", "DB", "Alkphos", "Sgpt", "Sgot", "TP", "ALB", "AGRatio", "Selector")

library(devtools)
install_github("larskotthoff/fselector")

library(FSelector)
weights = gain.ratio(Selector~., data)
print(weights)

weights = chi.squared(Selector~., data)
print(weights)

Output:

        attr_importance
Age          0.00000000
Gender       0.01539699
TB           0.09711392
DB           0.11547683
Alkphos      0.06593879
Sgpt         0.06566624
Sgot         0.07667241
TP           0.08836895
ALB          0.07766682
AGRatio      0.15403574

        attr_importance
Age           0.0000000
Gender        0.1304229
TB            0.3281865
DB            0.3238010
Alkphos       0.2965842
Sgpt          0.2734633
Sgot          0.3120432
TP            0.2504747
ALB           0.3051724
AGRatio       0.0000000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM