简体   繁体   中英

Function slice_sample gives error cannot take a sample larger than the population

I have a dataframe of cell barcodes (V1) and cell types (V2), I want to randomly sample 1000 of each cell type, unless there are less than 1000 total then all should be selected.

However slice_sample gives an error when encountering a cell type with less than 1000 rows. Despite the documentation stating "If n is greater than the number of rows in the group (or prop > 1), the result will be silently truncated to the group size." I don't want to sample with replacement. Am I misunderstanding the docs?

sample_cells <- as.data.frame(all_cells) %>% group_by(V2) %>% slice_sample(n=1000)

Error in slice_sample() : . Problem while computing indices: ℹ The error occurred in group 10. V2 = "PEC". Caused by error in sample.int() : ! cannot take a sample larger than the population when 'replace = FALSE' Run rlang::last_error() to see where the error occurred.

I expected to get 1000 rows for cell types where n>1000, and all the rows for cell types where n<1000.

dplyr version 1.0.10

To take a sample larger than your data, you need to define the parameter replace = TRUE , so:

slice_sample(replace = TRUE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM