简体   繁体   English

R:ADA:如何在具有分类描述符的数据帧上使用对?

[英]R: ada: how to use pairs on a dataframe with categorical descriptors?

I'm using the ada package for classification where the descriptor variables are both categoric and numeric . 我正在使用ada包进行分类,其中描述符变量既是categoric又是numeric This leads to a problem when calling the pairs function. 这会导致在调用pairs函数时出现问题。 Below is an example which illustrates my problem: 以下是说明我的问题的示例:

require(rpart)
require(ada)

data(car90, package = "rpart")
target = "Country"
input = setdiff(names(car90), target)

# ada only works with two distinct responses
car90 = car90[car90$Country %in% c("USA", "Japan/USA"), ] 

# remove surplus factor levels
car90$Country = as.character(car90$Country) 

adaCar90 = ada(car90[, input], car90[, target])
pairs(adaCar90, car90[, input], vars = 32:33)

# Error in pairs.default(as.matrix(rbind(train.data, test.x))[, vars],   
# lower.panel = panel.low,  :  non-numeric argument to 'pairs'  

Selecting only numeric descriptors using the vars argument of the pairs function doesn't seem to resolve matters. 只选择numeric使用描述vars的的参数pairs功能似乎并没有解决问题。 Does anyone know how I can fix this? 有谁知道我该如何解决?

Cheers. 干杯。

It looks like the maintainer of the ada package didn't anticipate your specific use case. 看起来ada软件包的维护者没想到您的特定用例。 The very last line of ada:::pairs.ada reads: ada:::pairs.ada最后一行显示为:

pairs(as.matrix(rbind(train.data, test.x))[, vars], lower.panel = panel.low, 
    upper.panel = panel.up)

The problem lies in where [, vars] has been placed. 问题在于[, vars]放置位置。 The code binds together train.data and test.x , then turns the whole thing into a matrix, and then subsets. 该代码将train.datatest.x绑定在一起,然后将整个对象转换为矩阵,然后转换为子集。 Because your train.data contains a bunch of non-numeric columns, as.matrix returns a character matrix. 因为您的train.data包含一堆非数字列, as.matrix返回一个字符矩阵。 If you change the last line to this: 如果将最后一行更改为此:

pairs(as.matrix(rbind(train.data, test.x)[, vars]), lower.panel = panel.low, 
    upper.panel = panel.up)

then as.matrix is only called on the subset that contains numeric data, and the function works. 然后仅在包含数字数据的子集上调用as.matrix ,该函数起作用。

EDIT 编辑

I think what I suggested above is a good long-term solution, but there could be an easier short-term fix: pass only those columns of your training data that you will need for the graph instead of using the vars option. 我认为以上建议的是一个很好的长期解决方案,但短期解决方案可能更简单:仅传递图形需要的训练数据列,而不使用vars选项。 That way, only numeric data is ever passed to that final line of code. 这样,只有数字数据才能传递到该代码的最后一行。 That would probably get you your graphs without you needing to hack the function. 这可能会为您提供图形,而无需修改功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM