简体   繁体   English

随机林中的tuneGrid参数问题

[英]Issues with tuneGrid parameter in random forest

I've been dealing with some extremely imbalanced data and I would like to use stratified sampling to created more balanced random forests 我一直在处理一些非常不平衡的数据,我想使用分层抽样来创建更平衡的随机森林

Right now, I'm using the caret package, mainly to for tuning the random forests. 现在,我正在使用插入包,主要用于调整随机森林。 So I try to setup a tuneGrid to pass in the mtry and sampsize parameters into caret train method as follows. 所以我尝试设置一个tuneGrid,将mtry和sampsize参数传递给插入符号列表方法,如下所示。

mtryGrid <- data.frame(.mtry = 100),.sampsize=80)
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               importance = TRUE)

When I run this example, I get the following error 当我运行此示例时,我收到以下错误

The tuning parameter grid should have columns mtry

I've come across discussions like this suggesting that passing in these parameters in should be possible. 我遇到过像这样的讨论,表明应该可以传入这些参数。

On the other hand, this page suggests that the only parameter that can be passed in is mtry 另一方面,此页面表明可以传入的唯一参数是mtry

Can I even pass in sampsize into the random forests via caret? 我甚至可以通过插入符号将sampsize传递到随机森林中吗?

It looks like there is a bracket issue with your mtryGrid . 您的mtryGrid看起来有一个支架问题。 Alternatively, you can also use expand.grid to give the different values of mtry you want to try. 或者,您也可以使用expand.grid来提供您想要尝试的mtry的不同值。 By default the only parameter you can tune for a random forest is mtry . 默认情况下,您可以为随机林调整的唯一参数是mtry However you can still pass the others parameters to train . 但是,您仍然可以将其他参数传递给train But those will have a fix value an so won't be tuned by train . 但那些将具有固定价值,所以不会被train调整。 But you can still ask to use a stratified sample in train . 但你仍然可以要求在train上使用分层样本。 Below is how I would do, assuming that trainY is a boolean variable according which you want to stratify your samples, and that you want samples of size 80 for each category: 下面是我将如何做,假设trainY是一个布尔变量,您希望根据该变量对样本进行分层,并且您希望每个类别的样本大小为80:

mtryGrid <- expand.grid(mtry = 100) # you can put different values for mtry
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = ctrl,
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               strata = factor(trainY),
               sampsize = c(80, 80), 
               importance = TRUE)

I doubt one can directly pass sampsize and strata to train . 我怀疑一个人可以直接通过sampsizestratatrain But from here I believe the solution is to use trControl() . 但是从这里我相信解决方案是使用trControl() That is, 那是,

mtryGrid <- data.frame(.mtry = 100),.sampsize=80)
rfTune<- train(x = trainX,
               y = trainY,
               method = "rf",
               trControl = trainControl(sampling=X),
               metric = "Kappa",
               ntree = 1000,
               tuneGrid = mtryGrid,
               importance = TRUE)

where X can be one of c("up","down","smote","rose") . 其中X可以是c("up","down","smote","rose")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM