简体   繁体   English

森林随机变量长度不同

[英]random forest variable lengths differ

I am trying to run RF using a feature as the response variable. 我正在尝试使用功能作为响应变量来运行RF。 I am having trouble passing a string through a variable to be used as the response in RF. 我无法通过变量传递字符串以用作RF中的响应。 First I try running RF on the string passed through a variable as the response and I am getting a "vector lengths differ error". 首先,我尝试在通过变量作为响应的字符串上运行RF,然后收到“向量长度不同错误”。 After this, I try just inputing the actual string(feature) as the response and it works fine. 在此之后,我尝试仅输入实际的字符串(功能)作为响应,并且效果很好。 Can you shed some light on why the variable lengths are differing? 您能否阐明为什么可变长度不同? Thanks. 谢谢。

> colnames(Data[1])
[1] "feature1"
> rf.file = randomForest(formula =colnames(Data[1])~ ., data = Data, proximity = T,      importance = T, ntree = 500, nodesize = 3)
Error in model.frame.default(formula = colnames(Data[1]) ~ .,  : 
  variable lengths differ (found for 'feature1')

Enter a frame number, or 0 to exit   

1: randomForest(formula = colnames(Data[1]) ~ ., data = Data, proximity = T, importance = T, ntree = 500, nodesize = 3)
2: randomForest.formula(formula = colnames(Data[1]) ~ ., data = brainDataTrim, proximity = T, importance = T, ntree = 500, nodesize = 3)
3: eval(m, parent.frame())
4: eval(expr, envir, enclos)
5: model.frame(formula = colnames(Data[1]) ~ ., data = Data, na.action = function (object, ...) 
6: model.frame.default(formula = colnames(Data[1]) ~ ., data = Data, na.action = function (object, ...) 

Selection: 0



> rf.file = randomForest(formula =feature1~ ., data = Data, proximity = T,      importance = T, ntree = 500, nodesize = 3)
> rf.file

Call:
 randomForest(formula = feature1 ~ ., data = Data,      proximity = T, importance = T, ntree = 500, nodesize = 3) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 3

          Mean of squared residuals: 0.1536834
                    % Var explained: 34.21
> 

You are simply misunderstanding how formulas work. 您只是误解了公式是如何工作的。 Basically, your first attempt isn't supposed to work. 基本上,你的第一次尝试是不应该工作。

Formulas should consist of names of variables, possibly simple functions of them. 公式应包含变量的名称,可能是变量的简单功能。 eg 例如

var1 ~ var2
var1 ~ log(var2)

Note the lack of quotes. 请注意缺少引号。 If you didn't quote it, it's not a string, its a symbol. 如果您未引用,则它不是字符串,而是符号。

So, avoid raw strings, weird evaluation demands (like Data[1] , or any use of $ ) in your formulas. 因此,请避免在公式中使用原始字符串,奇怪的求值要求(例如Data[1]$任何用法)。 To construct a formula from strings, paste it together and then call as.formula on the resulting string. 要根据字符串构造公式,请将其粘贴在一起,然后在生成的字符串上调用as.formula

Keep in mind that the whole point of a formula is that you have provided a symbolic representation of the model, and R will then go look for the specific columns you named in the data frame provided. 请记住,公式的全部要点是您已经提供了模型的符号表示形式,然后R将继续在提供的数据框中查找您命名的特定列。

I think some functions will do the coercion of a string representation of a formula for you (eg "var1 ~ var2" ), but I wouldn't count on, or expect it. 我认为某些函数可以为您执行公式的字符串表示形式的强制转换(例如"var1 ~ var2" ),但是我不会指望或期望它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM