随机森林的动态响应变量

Question

I'm trying to create a dynamic ML app that allows the user to upload a dataset to get a prediction of the first column in the dataset, using a random forest model.我正在尝试创建一个动态 ML 应用程序，该应用程序允许用户使用随机森林 model 上传数据集以获取数据集中第一列的预测。

I am having problems with the randomforest() function, specifically when I try to specifying the response variable as the first column of the dataset.我在使用randomforest() function 时遇到问题，特别是当我尝试将响应变量指定为数据集的第一列时。 For the example below, I use the iris dataset and I've moved the response variable, Species , to be positioned in the first column.对于下面的示例，我使用iris 数据集并将响应变量Species移动到第一列中。

This was my attempt:这是我的尝试：

model <- randomForest(names(DATA[1]) ~ ., data = DATA, ntree = 500, mtry = 3, importance = TRUE)

However, this does not work.但是，这不起作用。 The error I get is:我得到的错误是：

Error: variable lengths differ (found for 'Species')错误：可变长度不同（为“物种”找到）

The app and function only seems to work when I specify the response variable manually like this:该应用程序和 function 似乎仅在我像这样手动指定响应变量时才起作用：

model <- randomForest(Species ~ ., data = DATA, ntree = 500, mtry = 3, importance = TRUE)

I have tried to use the paste() function to work some magic, but I didn't succed.我曾尝试使用paste() function 来发挥一些作用，但我没有成功。

How should I write the code in order to get it to work?我应该如何编写代码才能使其正常工作？

Answer 1

It looks like you want to build a formula from a string.看起来您想从字符串构建公式。 You can use eval and parse to do that.您可以使用eval和parse来做到这一点。 Something like this should work:像这样的东西应该工作：

model <- randomForest(eval(parse(text = paste(names(DATA)[1], "~ ."))), 
                      data = DATA, ntree = 500, mtry = 3, importance = TRUE)

Example using original iris dataset:使用原始 iris 数据集的示例：

model <- randomForest(eval(parse(text = paste(names(iris)[5], "~ ."))), 
                      data = iris, ntree = 500, mtry = 3, importance = TRUE)

model

Call:
 randomForest(formula = eval(parse(text = paste(names(iris)[5],      "~ ."))), data = iris, 
              ntree = 500, mtry = 3, importance = TRUE) 
           Type of random forest: classification
                 Number of trees: 500
No. of variables tried at each split: 3

        OOB estimate of  error rate: 4%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          3        47        0.06

随机森林的动态响应变量

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-11-30 10:49:37

随机森林的动态响应变量

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-11-30 10:49:37

解决方案1
2 已采纳 2020-11-30 10:49:37