Function slice_sample gives error cannot take a sample larger than the population

Question

I have a dataframe of cell barcodes (V1) and cell types (V2), I want to randomly sample 1000 of each cell type, unless there are less than 1000 total then all should be selected.

However slice_sample gives an error when encountering a cell type with less than 1000 rows. Despite the documentation stating "If n is greater than the number of rows in the group (or prop > 1), the result will be silently truncated to the group size." I don't want to sample with replacement. Am I misunderstanding the docs?

sample_cells <- as.data.frame(all_cells) %>% group_by(V2) %>% slice_sample(n=1000)

Error in slice_sample() : . Problem while computing indices: ℹ The error occurred in group 10. V2 = "PEC". Caused by error in sample.int() : ! cannot take a sample larger than the population when 'replace = FALSE' Run rlang::last_error() to see where the error occurred.

I expected to get 1000 rows for cell types where n>1000, and all the rows for cell types where n<1000.

dplyr version 1.0.10

Answer 1

To take a sample larger than your data, you need to define the parameter replace = TRUE , so:

slice_sample(replace = TRUE)

Function slice_sample gives error cannot take a sample larger than the population

Question

1 answers

solution1
0 2022-12-05 18:03:09

Function slice_sample gives error cannot take a sample larger than the population

Question

1 answers

solution1 0 2022-12-05 18:03:09

solution1
0 2022-12-05 18:03:09