简体   繁体   English

根据随机森林对象预测响应填充数据

[英]Fill data based on random forest object predicted response

Using randomForest , I get an RF object. 使用randomForest ,我得到一个RF对象。
Eg forest <- randomForest(as.formula(generic),data=train, mtry=2) ) 例如, forest <- randomForest(as.formula(generic),data=train, mtry=2)

Using predict I can predict the response on a test dataset. 使用predict我可以预测测试数据集的响应。
The response is either A,B or C. 响应是A,B或C.

prediction <- predict(forest, newdata=test, type='class')
mytable <- table(test$class_w,prediction)
sum(mytable[row(mytable) != col(mytable)]) / sum(mytable)#show error

Calling the forest object I get the confusion matrix: 调用forest对象我得到了混淆矩阵:

     A     B    C     class.error
A   498    79   170   0.3333333
B   115    353  237   0.4992908
C   96     99   967   0.1678141

Eg test dataset : 例如测试数据集

id |class_w| valueA | valueB |
1  |  C    |  0.254 |  0.334 |
2  |  A    |  0.654 |  0.334 |
3  |  A    |  0.554 |  0.314 |
4  |  B    |  0.454 |  0.224 |
5  |  C    |  0.354 |  0.332 |
6  |  C    |  0.264 |  0.114 |
7  |  C    |  0.264 |  0.664 |

I would like to know if I can create a new dataset with 2 columns the id of the previous dataset and the predicted response (the RF gave). 我想知道我是否可以创建一个新的数据集,其中包含2列前一个数据集的id和预测的响应(RF给出)。 Eg 例如

row id of test dataset  |  predicted response
1                       |  A  #failed
2                       |  B  #failed
3                       |  B  #failed
4                       |  B  #TRUE!

Thanks in advance for your help. 在此先感谢您的帮助。

I think you may simply be looking to create a new data frame: 我想你可能只是想创建一个新的数据框:

data.frame(id = test$id,response = prediction)

That assumes that id is in fact a column in test , rather than the row names. 这假设id实际上是test一列,而不是行名。 If they are rownames, then you'd want to do: 如果他们是rownames,那么你想做:

data.frame(id = rownames(id),response = prediction)

An another way to do that would be to just write something like this: 另一种方法是写下这样的东西:

yourNewDataSet$someNewColumnCreated= Predict(forest,yourNewDataSet,type="class")

This should give you a new column in your new dataset named 'someNewColumnCreated' 这应该会在新数据集中为您提供一个名为“someNewColumnCreated”的新列

that will contain all the prediction of your model when applied to this new data set. 这将包含应用于此新数据集时模型的所有预测。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM