随机森林文本分类在预测中提供额外的行

Question

I am using random forest for text classification. 我正在使用随机森林进行文本分类。 My input data is having 17197 rows. 我的输入数据有17197行。

> nrow(sparse_4testing)
[1] 17197

I am using 我在用

set.seed(123)
tweetRand = randomForest(label ~ ., data = train_sparse, importance=TRUE, nTree=500)

predicrRand_test=predict(tweetRand, data=sparse_4testing)
q1=data.frame(ifelse(predicrRand_test>0.5,1,0))

The issue is when I am doing a sanity check I am getting extra rows in q1 问题是当我进行完整性检查时，我在第1季度中得到了更多行

> nrow(q1)
[1] 22373

I do not understand the issue. 我不明白这个问题。 I am new to machine learning. 我是机器学习的新手。 Please help me out. 请帮帮我。 I have run the model multiple time. 我已经多次运行模型。 Still getting the same issue. 仍然遇到同样的问题。

> nrow(predicrRand_test)

NULL
> head(predicrRand_test)
            1             3             6             7             9            10 
 1.858321e-01 -8.326673e-17  1.321640e-01  2.222222e-04  2.345304e-02  1.651133e-01 
> head(q1)
   ifelse.predicrRand_test...0.05..1..0.
1                                      1
3                                      0
6                                      1
7                                      0
9                                      0
10                                     1

> length(predicrRand_test)
[1] 22373

Answer 1

The issue is due to wrong argument name in predict - it should be newdata , not data ( docs ): 问题是由于predict参数名称错误-它应该是newdata ，而不是data （ docs ）：

predicrRand_test=predict(tweetRand, newdata=sparse_4testing)

As it is now, your code ignores the data argument, and simply returns the predictions on the training set in the predicrRand_test dataframe. 到目前为止，您的代码将忽略data参数，而仅返回predicrRand_test数据predicrRand_test的训练集的预测。

随机森林文本分类在预测中提供额外的行

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-03-16 15:07:25

随机森林文本分类在预测中提供额外的行

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-03-16 15:07:25

解决方案1
0 已采纳 2018-03-16 15:07:25