简体   繁体   English

如何在R中为决策树模型创建增益图表?

[英]How do I create a gain chart in R for a decision tree model?

I have created a decision tree model in R. The target variable is Salary, where we are trying to predict if the salary of a person is above or below 50k based on the other input variables 我在R中创建了一个决策树模型。目标变量是Salary,我们试图根据其他输入变量来预测一个人的工资是高于还是低于50k

df<-salary.data 

train = sample(1:nrow(df), nrow(df)/2)
train = sample(1:nrow(df), size=0.2*nrow(df))
test = - train
training_data = df[train, ]
testing_data = df[test, ]

fit <- rpart(training_data$INCOME ~ ., method="class", data=training_data)##generate tree
testing_data$predictionsOutput = predict(fit, newdata=testing_data, type="class")##make prediction

After that I tried to create a Gain chart by doing the following 之后,我尝试通过执行以下操作创建增益图表

# Gain Chart
pred <- prediction(testing_data$predictionsOutput, testing_data$INCOME)
gain <- performance(pred,"tpr","fpr")
plot(gain, col="orange", lwd=2)

By looking at the reference I am unable to understand how to use the ROCR package to build the chart by using the 'Prediction' function. 通过查看引用,我无法理解如何使用ROCR包通过使用“预测”功能来构建图表。 Is this only for binary target variables? 这仅适用于二进制目标变量吗? I get the error saying 'format of predictions is invalid' 我收到的错误是“预测格式无效”

Any help with this would be much appreciated to help me build a Gain chart for the above model. 任何帮助都将非常感谢帮助我为上述模型构建增益图表。 Thanks!! 谢谢!!

  AGE          EMPLOYER     DEGREE             MSTATUS            JOBTYPE     SEX C.GAIN C.LOSS HOURS
1  39         State-gov  Bachelors       Never-married       Adm-clerical    Male   2174      0    40
2  50  Self-emp-not-inc  Bachelors  Married-civ-spouse    Exec-managerial    Male      0      0    13
3  38           Private    HS-grad            Divorced  Handlers-cleaners    Male      0      0    40

         COUNTRY INCOME
1  United-States  <=50K
2  United-States  <=50K
3  United-States  <=50K

Convert the prediction to a vector, using c() 使用c()将预测转换为向量

library('rpart')
library('ROCR')
setwd('C:\\Users\\John\\Google Drive\\working\\R\\questions')
df<-read.csv(file='salary-class.csv',header=TRUE)

train = sample(1:nrow(df), nrow(df)/2)
train = sample(1:nrow(df), size=0.2*nrow(df))
test = - train
training_data = df[train, ]
testing_data = df[test, ]

fit <- rpart(training_data$INCOME ~ ., method="class", data=training_data)##generate tree
testing_data$predictionsOutput = predict(fit, 
                                         newdata=testing_data, type="class")##make prediction

# Doesn't work
# pred <- prediction(testing_data$predictionsOutput, testing_data$INCOME)
v <- c(pred = testing_data$predictionsOutput)
pred <- prediction(v, testing_data$INCOME)
gain <- performance(pred,"tpr","fpr")
plot(gain, col="orange", lwd=2)

在此输入图像描述

This should work if you change 如果你改变,这应该工作

predict(fit, newdata=testing_data, type="class")

to

predict(fit, newdata=testing_data, type="prob")

The gains chart wants to rank-order by model probability. 增益图表希望按模型概率排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM