I have a dataframe of cell barcodes (V1) and cell types (V2), I want to randomly sample 1000 of each cell type, unless there are less than 1000 total then all should be selected.
However slice_sample gives an error when encountering a cell type with less than 1000 rows. Despite the documentation stating "If n is greater than the number of rows in the group (or prop > 1), the result will be silently truncated to the group size." I don't want to sample with replacement. Am I misunderstanding the docs?
sample_cells <- as.data.frame(all_cells) %>% group_by(V2) %>% slice_sample(n=1000)
Error in
slice_sample()
: . Problem while computing indices: ℹ The error occurred in group 10. V2 = "PEC". Caused by error insample.int()
: ! cannot take a sample larger than the population when 'replace = FALSE' Runrlang::last_error()
to see where the error occurred.
I expected to get 1000 rows for cell types where n>1000, and all the rows for cell types where n<1000.
dplyr version 1.0.10
To take a sample larger than your data, you need to define the parameter replace = TRUE
, so:
slice_sample(replace = TRUE)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.