如何调整随机森林代码以进行质量预测

Question

I am new to machine learning.I have got this dataset - http://archive.ics.uci.edu/ml/datasets/Wine+Quality . 我是机器学习的新手，我有这个数据集-http: //archive.ics.uci.edu/ml/datasets/Wine+Quality 。 I have to predict Quality of wine which is the last column on the dataset.I thought about applying Neural network or random forest for this, where as NN gave around 55% accuracy, with random forest I managed to get 73% so far. 我必须预测数据集的最后一列``葡萄酒质量''，我考虑为此应用神经网络或随机森林，因为NN的准确率约为55％，到目前为止，随机森林的成功率达到了73％。 I want to improve the accuracy further.Below is code I have written. 我想进一步提高准确性。下面是我编写的代码。

wineq <- read.csv("wine-quality.csv",header = TRUE)
str(wineq)

wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')
wineq$taste[wineq$quality == 6] <- 'normal'
wineq$taste <- as.factor(wineq$taste)
set.seed(54321)
train <- sample(1:nrow(wineq), .75 * nrow(wineq))
wineq_train <- wineq[train, ]
wineq_test  <- wineq[-train, ]

library(randomForest)

rf=randomForest(taste~.- 
quality,data=wineq_train,importance=TRUE,ntree=100)

rf_preds = predict(rf,wineq_test)
rf_preds
table(rf_preds, wineq_test$taste)

Output: 输出：

table(rf_preds, wineq_test$taste) 表（rf_preds，wineq_test $ taste）

rf_preds bad good normal
bad    302   11     81
good     7  163     36
normal  93  101    431

If I want to use tuneRF it gives me below error: 如果我想使用tuneRF则会出现以下错误：

   fgl.res <- tuneRF(x = wineq[train, ], y= wineq[-train, ], 
   stepFactor=1.5)

Error in randomForest.default(x, y, mtry = mtryStart, ntree = ntreeTry, randomForest.default（x，y，mtry = mtryStart，ntree = ntreeTry，
: length of response must be the same as predictors ：响应时间必须与预测变量相同

Answer 1

You need to pass to tuneRF the feature variables as x and the response variable as y . 您需要传递特征变量x作为tuneRF并将响应变量作为y传递给tuneRF 。

So, first find the column position of your response variable ( taste ): 因此，首先找到您的响应变量（ taste ）的列位置：

resp_pos <- which(colnames(wineq) == "taste")

Then: 然后：

fgl.res <- tuneRF(x = wineq[train, -resp_pos ], y= wineq[-train, resp_pos], 
   stepFactor=1.5)

I noticed also that you use wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good') to find your "new" response ( taste ), based on the column quality . 我还注意到，您使用wineq$taste <- ifelse(wineq$quality < 6, 'bad', 'good')根据列quality来查找“新”响应（ taste ）。 Note that this is perfectly fine, but you need to remove the column quality before training. 注意，这很好，但是在训练之前，您需要删除色谱柱quality 。

If you don't do this, your model will be overly optimistic, since it'll pick up that for example: 如果您不这样做，那么您的模型将过于乐观，因为它将使您感到沮丧，例如：

quality < 6 will always mean taste=="bad" quality < 6始终意味着taste=="bad"

如何调整随机森林代码以进行质量预测

问题描述

1 个解决方案

解决方案1
0 2019-05-27 15:08:08

如何调整随机森林代码以进行质量预测

问题描述

1 个解决方案

解决方案1 0 2019-05-27 15:08:08

解决方案1
0 2019-05-27 15:08:08