[英]Error using dplyr::group_by() in function within boot::boot() function during bootstrapping
我目前正在尝试对一些数据进行引导分析,最终结果是获得围绕计数数据比例的引导置信区间。
例如,我尝试引导的当前数据将采用以下形式(字符):
> foo
notes
1 a
2 b
3 c
4 c
5 b
6 c
7 b
8 c
9 a
10 a
11 c
12 b
13 d
14 e
15 f
16 f
17 g
18 a
19 b
20 c
21 c
您可以使用dput()
到这里
structure(list(notes = c("a", "b", "c", "c", "b", "c", "b", "c",
"a", "a", "c", "b", "d", "e", "f", "f", "g", "a", "b", "c", "c"
)), class = "data.frame", row.names = c(NA, -21L))
In trying to set up a function that will output a named vector similar to what is needed for the boot package to run properly ( see example here ), I have composed the following function that uses dplyr
code:
library(dplyr)
notes_bootstrap <- function(d, i){
# get global set
global_set <- d %>% distinct()
# take random rows
sampler <- d#[i,]
proportion_table <- sampler %>%
count(.data$notes) %>%
mutate(proportion = n/sum(n)) %>%
ungroup()
# combine with full set to turn NAs to 0s
combined_table <- proportion_table %>% full_join(global_set)
final_table <- combined_table %>%
select(-n) %>%
mutate(proportion = if_else(is.na(proportion),0,proportion))
output <- setNames(final_table$proportion, final_table$notes)
return(output)
}
当这个版本的 function 与boot()
一起运行时,它运行得很好,关键问题是它只是对整个数据集进行采样(由于代码的注释部分而没有执行引导程序)。 如果你运行这个,你会看到每个估计都是一样的。
bootstrap_analysis <- boot(foo, notes_bootstrap, R = 100)
bootstrap_analysis$t
如果我确实运行 function 并使用随机子集变量进行引导分析的部分,如下面的代码所示(与上面相同,但删除了注释):
notes_bootstrap <- function(d, i){
# get global set
global_set <- d %>% distinct()
# take random rows
sampler <- d[i,]
proportion_table <- sampler %>%
count(.data$notes) %>%
mutate(proportion = n/sum(n)) %>%
ungroup()
# combine with full set to turn NAs to 0s
combined_table <- proportion_table %>% full_join(global_set)
final_table <- combined_table %>%
select(-n) %>%
mutate(proportion = if_else(is.na(proportion),0,proportion))
output <- setNames(final_table$proportion, final_table$notes)
return(output)
}
然后我收到以下错误:
> bootstrap_analysis <- boot(foo, notes_bootstrap, R = 100)
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "character"
该问题的解决方案是运行此代码,以便引导分析按书面方式工作(可能是一个整洁的评估问题?),或者让某人建议一种更有效的方法来进行这种引导分析。
有趣的问题。 我试图在不使用引导包的情况下解决这个问题,而是使用基本功能(主要是出于透明目的)。
我可以弄清楚这一点:
#Assigning the provided structure to an object called "df"
df <- structure(list(notes = c("a", "b", "c", "c", "b", "c", "b", "c",
"a", "a", "c", "b", "d", "e", "f", "f", "g", "a", "b",
"c", "c")),
class = "data.frame", row.names = c(NA, -21L))
#Specifying the bootstrap replications (as far as I know, it's, however,
#rather recommended to use 10K replications and more)
B <- 100
#Number of observations (i.e., 21 in this case)
N <- nrow(df)
#setting the seed to ensure pseudo-randomness for the samples
#we just want to generate and, of course, to ensure general reproducibility
set.seed(42, sample.kind = "Rounding")
#bootstrapping the proportion (aka mean) of the note "a"
boot_note_a <- replicate(B, {
#taking random samples of the same sample size of those notes
#and putting back the taken sample in the urn—for each iteration
notes_star <- sample(df$notes, N, replace = T)
#getting the proportion of the note "a" within each bootstrapped sample.
#Hence, we'll get B (100 in this case) times a proportion of the note "a"
#based on the respective bootstrapped sample.
mean(notes_star == "a")
})
#getting the confidence interval (at a confidence level of 95%) of the
#bootstrapped proportion of the note "a" in the bootstrapped sample
quantile(boot_note_a, prob = c(0.025, 0.975))
最后,我们可以快速 plot(双关语)这个结果是这样的:
#calculating the binwidth according to Freedman & Diaconis (1981); see also
#Hyndman (1995)
binw <- 2 * IQR(boot_note_a) / length(boot_note_a)^(1/3)
#plotting
p1 <- qplot(boot_note_a, binwidth = binw, color = I("red") )
p2 <- qplot(sample = scale(boot_note_a), xlab = "theoretical", ylab = "sample")+
geom_abline()
gridExtra::grid.arrange(p1, p2, ncol = 2)
归根结底,我认为你会得到想要的结果——至少对于音符“a”(诚然,我们必须对剩下的六个音符重复这个策略)。 因此,这种解决方案可能不是最有效的方式,但希望是透明的。 如果这个求解策略运行良好,我们可以调整它并通过使用 apply-family 左右来提高它的效率。
干杯,邱!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.