I have a dataset that looks like the following:
group y x
1 2 0
1 3 0
1 1 0
2 3 1
2 4 1
2 3 1
In the actual dataset, there are 180 groups (though they're not numbered from 1-180). The value of x is either 0 or 1 and is the same within each group. The value of y differs for each individual observation.
I am trying to get a random sample with replacement from the group column. Then, I would like to find a way to combine this with the original data. For example, if I randomly sample the group 1, I would like the final dataset to include all 3 observations included in group 1. If I randomly sample group 1 twice, I would like the final dataset to include each observation from group 1 twice.
Here's an example. If I imagine I have randomly sample 1, 1, and 2, I would like the final dataset to look like this:
group y x
1 2 0
1 3 0
1 1 0
1 2 0
1 3 0
1 1 0
2 3 1
2 4 1
2 3 1
When I sample like below, I get a list of values. I am not sure what to do next to get the results I am looking for.
clusters <- sample(df$group, 180, replace = TRUE)
In Excel, I would use vlookup() to do something like this.
Base R:
set.seed(42)
do.call(rbind, sample(split(dat, dat$group), size = 3, replace = TRUE))
# group y x
# 2.4 2 3 1
# 2.5 2 4 1
# 2.6 2 3 1
# 2.41 2 3 1
# 2.51 2 4 1
# 2.61 2 3 1
# 1.1 1 2 0
# 1.2 1 3 0
# 1.3 1 1 0
(The row names are not pretty, but they are harmless and ignored by most tools.)
Generically, and piece-wise, we see:
dat_spl <- split(dat, dat$group)
inds <- c(1, 1, 2)
### randomly this can be done with:
# inds <- sample(length(dat_spl), size = 3, replace = TRUE)
do.call(rbind, dat_spl[inds])
# group y x
# 1.1 1 2 0
# 1.2 1 3 0
# 1.3 1 1 0
# 1.11 1 2 0
# 1.21 1 3 0
# 1.31 1 1 0
# 2.4 2 3 1
# 2.5 2 4 1
# 2.6 2 3 1
If you want/need it to be pure-tidyverse, an alternative:
library(dplyr)
set.seed(42)
dat %>%
group_by(group) %>%
nest(dat = -group) %>%
ungroup() %>%
sample_n(3, replace = TRUE) %>%
unnest(dat)
# # A tibble: 9 x 3
# group y x
# <int> <int> <int>
# 1 2 3 1
# 2 2 4 1
# 3 2 3 1
# 4 2 3 1
# 5 2 4 1
# 6 2 3 1
# 7 1 2 0
# 8 1 3 0
# 9 1 1 0
Data:
dat <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L), y = c(2L, 3L,
1L, 3L, 4L, 3L), x = c(0L, 0L, 0L, 1L, 1L, 1L)), row.names = c(NA,
-6L), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.