[英]How to tune random forest code for quality prediction
I am new to machine learning.I have got this dataset - http://archive.ics.uci.edu/ml/datasets/Wine+Quality . 我是机器学习的新手,我有这个数据集-http: //archive.ics.uci.edu/ml/datasets/Wine+Quality 。 I have to predict Quality of wine which is the last column on the dataset.I thought about applying Neural network or random forest for this, where as NN gave around 55% accuracy, with random forest I managed to get 73% so far.
我必须预测数据集的最后一列``葡萄酒质量'',我考虑为此应用神经网络或随机森林,因为NN的准确率约为55%,到目前为止,随机森林的成功率达到了73%。 I want to improve the accuracy further.Below is code I have written.
我想进一步提高准确性。下面是我编写的代码。
wineq <- read.csv("wine-quality.csv",header = TRUE)
str(wineq)
wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')
wineq$taste[wineq$quality == 6] <- 'normal'
wineq$taste <- as.factor(wineq$taste)
set.seed(54321)
train <- sample(1:nrow(wineq), .75 * nrow(wineq))
wineq_train <- wineq[train, ]
wineq_test <- wineq[-train, ]
library(randomForest)
rf=randomForest(taste~.-
quality,data=wineq_train,importance=TRUE,ntree=100)
rf_preds = predict(rf,wineq_test)
rf_preds
table(rf_preds, wineq_test$taste)
Output: 输出:
table(rf_preds, wineq_test$taste)
表(rf_preds,wineq_test $ taste)
rf_preds bad good normal
bad 302 11 81
good 7 163 36
normal 93 101 431
If I want to use tuneRF
it gives me below error: 如果我想使用
tuneRF
则会出现以下错误:
fgl.res <- tuneRF(x = wineq[train, ], y= wineq[-train, ],
stepFactor=1.5)
Error in randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry,
randomForest.default(x,y,mtry = mtryStart,ntree = ntreeTry,
: length of response must be the same as predictors:响应时间必须与预测变量相同
You need to pass to tuneRF
the feature variables as x
and the response variable as y
. 您需要传递特征变量
x
作为tuneRF
并将响应变量作为y
传递给tuneRF
。
So, first find the column position of your response variable ( taste
): 因此,首先找到您的响应变量(
taste
)的列位置:
resp_pos <- which(colnames(wineq) == "taste")
Then: 然后:
fgl.res <- tuneRF(x = wineq[train, -resp_pos ], y= wineq[-train, resp_pos],
stepFactor=1.5)
I noticed also that you use wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')
to find your "new" response ( taste
), based on the column quality
. 我还注意到,您使用
wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')
根据列quality
来查找“新”响应( taste
)。 Note that this is perfectly fine, but you need to remove the column quality
before training. 注意,这很好,但是在训练之前,您需要删除色谱柱
quality
。
If you don't do this, your model will be overly optimistic, since it'll pick up that for example: 如果您不这样做,那么您的模型将过于乐观,因为它将使您感到沮丧,例如:
quality < 6
will always mean taste=="bad"
quality < 6
始终意味着taste=="bad"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.