[英]R dplyr: Bootstrap or random sampling
I have a dataset like this:我有一个这样的数据集:
values Pop1
1 611648 Nafr
2 322513 Nafr
3 381089 Jud
4 16941 Jud
5 21454 Jud
6 658802 Jud
I am summarizing the values with the command line:我正在用命令行总结这些值:
df %>% group_by(Pop1) %>% summarize(Mean = mean(x = values))
so that I have the mean for Pop1=Nafr
and for Pop1=Jud
.这样我就有了
Pop1=Nafr
和Pop1=Jud
的平均值。
Before summarizing, I would like to randomly sample the same number of row (50) in each of the two populations (Pop1).在总结之前,我想在两个种群 (Pop1) 中的每一个中随机采样相同数量的行 (50)。
I found the sample_n()
function, which is great.我找到了
sample_n()
函数,这很棒。
df %>% group_by(Pop1) %>% sample_n(size=50) %>% summarize(Mean = mean(x = values))
But I would like to run it 100 times, creating a big df, and then summarize.但是我想运行100次,创建一个大df,然后总结。
Is there a way to add something to my above command line to create a table, where there is 100 times a sampling of 50 rows from the df, adding the column bs, corresponding to the 100 random samplings.有没有办法在我上面的命令行中添加一些东西来创建一个表,其中有 100 次来自 df 的 50 行的采样,添加列 bs,对应于 100 个随机采样。 Something that look like this:
看起来像这样的东西:
bs values Pop1
1 1 611648 Nafr
2 1 322513 Nafr
3 1 381089 Jud
4 1 16941 Jud
5 1 21454 Jud
6 1 658802 Jud
...
1 100 611648 Nafr
2 100 322513 Nafr
3 100 381089 Jud
4 100 16941 Jud
5 100 21454 Jud
6 100 658802 Jud
Then I could run new_df %>% group_by(bs, Pop1) %>% summarize(Mean = mean(x = values))
to get my summary, but also use the table for making plots.然后我可以运行
new_df %>% group_by(bs, Pop1) %>% summarize(Mean = mean(x = values))
来获得我的摘要,但也可以使用表格来制作绘图。
Thanks!谢谢!
You can use purrr::map_dfr
to create a data.frame
of the selected samples that'll be binded by rows, then you can use the command you provided to get the summary:您可以使用
purrr::map_dfr
创建data.frame
行绑定的所选样本的data.frame
,然后您可以使用您提供的命令来获取摘要:
purrr::map_dfr(integer(100), ~ df %>% sample_n(size=50), .id="obs") -> new_df
new_df
#> # A tibble: 5,000 x 3
#> obs values Pop1
#> <chr> <int> <fct>
#> 1 1 381089 Jud
#> 2 1 658802 Jud
#> 3 1 381089 Jud
#> 4 1 611648 Nafr
#> 5 1 381089 Jud
#> 6 1 21454 Jud
#> 7 1 611648 Nafr
#> 8 1 381089 Jud
#> 9 1 21454 Jud
#> 10 1 322513 Nafr
#> # … with 4,990 more rows
new_df %>% group_by(obs, Pop1) %>% summarize(Mean = mean(x = values))
#`summarise()` regrouping output by 'obs' (override with `.groups` argument)
# A tibble: 200 x 3
# Groups: obs [100]
obs Pop1 Mean
<chr> <fct> <dbl>
1 1 Jud 261302.
2 1 Nafr 451017.
3 10 Jud 303711.
4 10 Nafr 474689.
5 100 Jud 236533.
6 100 Nafr 492592.
7 11 Jud 279812.
8 11 Nafr 425776.
9 12 Jud 279725.
10 12 Nafr 455960.
# … with 190 more rows
read.table(text= "values Pop1
611648 Nafr
322513 Nafr
381089 Jud
16941 Jud
21454 Jud
658802 Jud", header=T)->df
tibble(df[rep(1:6, times=5, each=10),])->df
One way you could do this is working with nested tibbles and map
from the purrr
package:一种方法是使用
purrr
包中的嵌套 tibbles 和map
:
library(tidyverse)
df %>% nest(df = everything()) %>%
slice(rep(1, 100)) %>%
mutate(bs = 1:100) %>%
mutate(df_sum = map(df, ~.x%>% group_by(Pop1) %>%
sample_n(size=50) %>%
summarize(Mean = mean(x = values)))) %>%
unnest(df_sum)
Or if you just want a way to stack your data 100 times you can use slice:或者,如果您只是想要一种将数据堆叠 100 次的方法,则可以使用切片:
df %>% slice(rep(1:n(), 100))
Try this尝试这个
library(tidyr)
df %>% expand(bs = 1:100, nesting(values, Pop1))
Output输出
# A tibble: 600 x 3
bs values Pop1
<int> <dbl> <chr>
1 1 16941 Jud
2 1 21454 Jud
3 1 322513 Nafr
4 1 381089 Jud
5 1 611648 Nafr
6 1 658802 Jud
7 2 16941 Jud
8 2 21454 Jud
9 2 322513 Nafr
10 2 381089 Jud
# ... with 590 more rows
You can then continue your pipeline like this然后你可以像这样继续你的管道
df %>%
expand(bs = 1:100, nesting(values, Pop1)) %>%
group_by(bs, Pop1) %>%
sample_n(size = 50) %>%
summarize(Mean = mean(x = values))
Here is a version using a for loop to do the sampling 100 times.这是一个使用 for 循环进行 100 次采样的版本。
df2 <- data.frame(values = numeric(), Pop1 = character(), bs = integer())
for(i in 1:100){
df2 <- df2 %>%
bind_rows(df %>%
group_by(Pop1) %>%
sample_n(size = 50, replace = TRUE) %>%
mutate(bs = i) %>%
ungroup())
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.