[英]Random sampling in R with set of groups that are within a group
I am working with R.我正在使用 R。
I have a data set that looks like this...我有一个看起来像这样的数据集......
structure(
list(
Condition = c(
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1",
"1"
),
category = c(
"work",
"work",
"work",
"work",
"work",
"people",
"people",
"people",
"people",
"people",
"class",
"class",
"class",
"class",
"class",
"beach",
"beach",
"beach",
"beach",
"beach",
"park",
"park",
"park",
"park",
"park",
"house",
"house",
"house",
"house",
"house",
"street",
"street",
"street",
"street",
"street",
"internet",
"internet",
"internet",
"internet",
"internet"
),
Value = c(
7.36,
7.92,
7.66,
6.92,
4.76,
2.82,
3.18,
2.1,
8.28,
7.26,
5.16,
5.72,
7.12,
7.14,
5.06,
5.14,
3.34,
4.74,
NA,
NA,
3.42,
3.87,
5.3,
4.26,
4.46,
5.1,
3.76,
10.4,
3.38,
4.86,
4.14,
4.24,
4.68,
5.18,
4.46,
8.38,
3.92,
4.14,
4.78,
2.94
)
),
row.names = c(NA, -40L),
class = c("tbl_df", "tbl",
"data.frame")
)
So, as you can see the words in the category column repeat themself 5 times.因此,正如您所见,类别列中的单词重复了 5 次。 Those "chunks" of five words are like a group that it is within the condition 1. So, I need a random sample of 4 chunks of words.
五个单词的那些“块”就像一个组,它在条件 1 内。所以,我需要 4 个单词块的随机样本。 That is a total of 20 observations under the value column.
在值列下总共有 20 个观察值。
I expect something like this...我期待这样的事情......
Condition category Value
1 people #
1 people #
1 people #
1 people ...
1 people
1 street
1 street
1 street
1 street
1 street
1 park
1 park
1 park
1 park
1 park
1 class
1 class
1 class
1 class
1 class
Any help would be great.任何帮助都会很棒。 Thanks!
谢谢!
tidyverse tidyverse
set.seed(1)
library(tidyverse)
df %>%
group_nest(Condition, category) %>%
sample_n(tbl = ., size = 4) %>%
unnest(data)
#> # A tibble: 20 x 3
#> Condition category Value
#> <chr> <chr> <dbl>
#> 1 1 beach 5.14
#> 2 1 beach 3.34
#> 3 1 beach 4.74
#> 4 1 beach NA
#> 5 1 beach NA
#> 6 1 internet 8.38
#> 7 1 internet 3.92
#> 8 1 internet 4.14
#> 9 1 internet 4.78
#> 10 1 internet 2.94
#> 11 1 work 7.36
#> 12 1 work 7.92
#> 13 1 work 7.66
#> 14 1 work 6.92
#> 15 1 work 4.76
#> 16 1 class 5.16
#> 17 1 class 5.72
#> 18 1 class 7.12
#> 19 1 class 7.14
#> 20 1 class 5.06
Created on 2021-06-08 by the reprex package (v2.0.0)由reprex package (v2.0.0) 于 2021 年 6 月 8 日创建
data.table data.table
set.seed(1)
library(data.table)
library(magrittr)
setDT(df)[, lapply(.SD, list), by = list(Condition, category)] %>%
.[category %in% sample(category, 4)] %>%
.[, lapply(.SD, unlist)] %>%
.[order(Condition, category)]
#> Condition category Value
#> 1: 1 beach 7.66
#> 2: 1 beach 3.18
#> 3: 1 beach 5.14
#> 4: 1 beach NA
#> 5: 1 beach 4.78
#> 6: 1 internet 6.92
#> 7: 1 internet 2.10
#> 8: 1 internet 3.34
#> 9: 1 internet 8.38
#> 10: 1 internet 2.94
#> 11: 1 people 7.92
#> 12: 1 people 2.82
#> 13: 1 people 7.26
#> 14: 1 people NA
#> 15: 1 people 4.14
#> 16: 1 work 7.36
#> 17: 1 work 4.76
#> 18: 1 work 8.28
#> 19: 1 work 4.74
#> 20: 1 work 3.92
Created on 2021-06-08 by the reprex package (v2.0.0)由reprex package (v2.0.0) 于 2021 年 6 月 8 日创建
If I understand you correctly, you want如果我理解正确,你想要
your_data |>
split(~ category) |>
sample(4) |>
dplyr::bind_rows()
returning返回
# A tibble: 20 x 3
Condition category Value
<chr> <chr> <dbl>
1 1 house 5.1
2 1 house 3.76
3 1 house 10.4
4 1 house 3.38
5 1 house 4.86
6 1 class 5.16
7 1 class 5.72
8 1 class 7.12
9 1 class 7.14
10 1 class 5.06
11 1 internet 8.38
12 1 internet 3.92
13 1 internet 4.14
14 1 internet 4.78
15 1 internet 2.94
16 1 work 7.36
17 1 work 7.92
18 1 work 7.66
19 1 work 6.92
20 1 work 4.76
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.