简体   繁体   English

插入包:是否可以实现自己的bootstrapping方法?

[英]caret package: Is it possible to implement my own bootstrapping method?

I am using caret package for R to select variables for my model. 我正在使用R的caret包来为我的模型选择变量。 When using rfe command, one should pass rfeControl object, which has a method parameter. 使用rfe命令时,应该传递rfeControl对象,该对象具有方法参数。 Options for this parameter are boot, cv, LOOCV and LGOCV. 此参数的选项为boot,cv,LOOCV和LGOCV。 Since I am dealing with time series data I need to use special bootstrapping/cross-validation techniques as normal ones do not apply for time series data (otherwise distributions get corrupted etc.). 由于我正在处理时间序列数据,我需要使用特殊的自举/交叉验证技术,因为正常情况不适用于时间序列数据(否则分布会被破坏等)。

My question is how would I plug-in my own implementation of bootstrapping but still use caret rfe method, which has every other thing I need. 我的问题是如何插入我自己的bootstrapping实现,但仍然使用caret rfe方法,它有我需要的所有其他东西。

There isn't an easy way. 没有一个简单的方法。 If you study the code for rfe.default() you will note that in cases where method = "boot" the createResample() function is used. 如果你学习rfe.default()的代码,你会注意到在method = "boot"的情况下,使用了createResample()函数。 This is the function that generates the bootstrap samples. 这是生成引导样本的函数。 Similar functions are used for the other CV methods. 类似的函数用于其他CV方法。

There is a hard way; 有一个艰难的方法; overtake the create*() function that is most appropriate; 超越最合适的create*()函数; say you want to do a block bootstrap or ME bootstrap, take over the createResample() function and use method = "boot" , or if you want a special form of CV, use method = "cv" and take over createFolds() . 假设您要执行块引导程序或ME引导程序,接管createResample()函数并使用method = "boot" ,或者如果您需要特殊形式的CV,请使用method = "cv"并接管createFolds()

You will need to write your own create*() function and replace the one in the caret NAMESPACE with your version. 您需要编写自己的create*()函数,并用您的版本替换插入符号NAMESPACE中的那个。 Not easy but eminently doable. 不容易,但非常可行。 Say you write your own createResample() function; 假设您编写自己的createResample()函数; first you need to note that this function creates n = times bootstrap samples returning this in a matrix with times columns and as many rows as your have samples. 首先,您需要注意,此函数会创建n = times引导样本,并将其返回到矩阵中,其中包含times列和与样本一样多的行。 You need to write a custom createResample() function that returns the same object but which performs the time series bootstrapping you want to employ. 您需要编写一个自定义的createResample()函数,该函数返回相同的对象,但该函数执行您要使用的时间序列引导。

Once you have written that function you then need to get it into the caret namespace so that it is used by functions in the caret package. 一旦编写了该函数,就需要将它放入插入符号命名空间,以便插入符号包中的函数使用它。 For this you use assignInNamespace() . 为此,您使用assignInNamespace() Say your new bootstrapping function is called createMyResample() and it is your workspace, to insert this into the caret namespace do: 假设您的新引导函数名为createMyResample()并且它是您的工作区,将其插入到插入符名称空间中:

assignInNamespace("createResample", createMyResample, ns = "caret")

Sorry I can't be more specific but you don't say how you want the bootstrap/CV to be performed nor what R code you want to use to do the resampling. 对不起,我不能更具体,但你没有说你希望如何执行bootstrap / CV,也没有说你想用什么R代码进行重新采样。 If you provide further details on how you would do the resampling I will take another look and see if I can help you write your create*() function. 如果您提供有关如何进行重新采样的更多详细信息,我将再看看,看看我是否可以帮助您编写create*()函数。

Failing all of this, contact Max Kuhn, the author and maintainer of caret; 如果没有这一切,请联系插入符号的作者和维护者Max Kuhn; he may be able to advice further or at least you can suggest this feature as a wish-list for a future version. 他或许可以进一步提出建议,或者至少可以建议将此功能作为未来版本的愿望清单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM