Using this very simple data example below, my goal would be to sample all 3 of A
and only sample 5 out of 7 of B
.
id group
1 A
2 A
3 A
4 B
5 B
6 B
7 B
8 B
9 B
10 B
ex_df <- data.frame(id = 1:10, group = c(rep("A", 3), rep("B", 7)))
Now, normally it'd just be a case of using sample_n
from dplyr
such that the code would be along the lines of
sel_5 <- ex_df %>%
group_by(group) %>%
sample_n(5)
Except this gives the error (for obvious reasons)
Error:
size
must be less or equal than 2 (size of data), setreplace
= TRUE to use sampling with replacement
but sampling with replacement isn't an option. Is there any way that I might be able to set the sample_n
size to be the minimum of 5 or the size of the group?
Or maybe another function that I'm unaware of that would be capable of this?
I've had the same problem, and here's what I did.
library(dplyr)
split_up <- split(ex_df, f = ex_df$group)
#split original dataframe into a list of dataframes for each unique group
sel_5 <- lapply(split_up, function(x) {x %>% sample_n(ifelse(nrow(x) < 5, nrow(x), 5))})
#on each dataframe, subsample to 5 or to the number of rows if there are less than 5
sel_5 <- do.call("rbind", sel_5)
#bind it back up!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.