简体   繁体   English

如何调整随机森林代码以进行质量预测

[英]How to tune random forest code for quality prediction

I am new to machine learning.I have got this dataset - http://archive.ics.uci.edu/ml/datasets/Wine+Quality . 我是机器学习的新手,我有这个数据集-http: //archive.ics.uci.edu/ml/datasets/Wine+Quality I have to predict Quality of wine which is the last column on the dataset.I thought about applying Neural network or random forest for this, where as NN gave around 55% accuracy, with random forest I managed to get 73% so far. 我必须预测数据集的最后一列``葡萄酒质量'',我考虑为此应用神经网络或随机森林,因为NN的准确率约为55%,到目前为止,随机森林的成功率达到了73%。 I want to improve the accuracy further.Below is code I have written. 我想进一步提高准确性。下面是我编写的代码。

wineq <- read.csv("wine-quality.csv",header = TRUE)
str(wineq)

wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')
wineq$taste[wineq$quality == 6] <- 'normal'
wineq$taste <- as.factor(wineq$taste)
set.seed(54321)
train <- sample(1:nrow(wineq), .75 * nrow(wineq))
wineq_train <- wineq[train, ]
wineq_test  <- wineq[-train, ]

library(randomForest)

rf=randomForest(taste~.- 
quality,data=wineq_train,importance=TRUE,ntree=100)

rf_preds = predict(rf,wineq_test)
rf_preds
table(rf_preds, wineq_test$taste)

Output: 输出:

table(rf_preds, wineq_test$taste) 表(rf_preds,wineq_test $ taste)

rf_preds bad good normal
bad    302   11     81
good     7  163     36
normal  93  101    431

If I want to use tuneRF it gives me below error: 如果我想使用tuneRF则会出现以下错误:

   fgl.res <- tuneRF(x = wineq[train, ], y= wineq[-train, ], 
   stepFactor=1.5)

Error in randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry, randomForest.default(x,y,mtry = mtryStart,ntree = ntreeTry,
: length of response must be the same as predictors :响应时间必须与预测变量相同

You need to pass to tuneRF the feature variables as x and the response variable as y . 您需要传递特征变量x作为tuneRF并将响应变量作为y传递给tuneRF

So, first find the column position of your response variable ( taste ): 因此,首先找到您的响应变量( taste )的列位置:

resp_pos <- which(colnames(wineq) == "taste")

Then: 然后:

fgl.res <- tuneRF(x = wineq[train, -resp_pos ], y= wineq[-train, resp_pos], 
   stepFactor=1.5)

I noticed also that you use wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good') to find your "new" response ( taste ), based on the column quality . 我还注意到,您使用wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')根据列quality来查找“新”响应( taste )。 Note that this is perfectly fine, but you need to remove the column quality before training. 注意,这很好,但是在训练之前,您需要删除色谱柱quality

If you don't do this, your model will be overly optimistic, since it'll pick up that for example: 如果您不这样做,那么您的模型将过于乐观,因为它将使您感到沮丧,例如:

quality < 6 will always mean taste=="bad" quality < 6始终意味着taste=="bad"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM