如何在随机抽样中获得 3 个没有重复的列表？ (右)

Question

I have done the first step:我已经完成了第一步：

how many persons have more than 1 point有多少人有超过 1 分
how many persons have more than 3 points有多少人有超过3分
how many persons have more than 6 points有多少人有超过6分

My goal: I need to have random samples (with no duplicates of persons)我的目标：我需要随机样本（没有重复的人）

of 3 persons that have more than 1 point超过 1 分的 3 人
of 3 persons that have more than 3 points超过 3 分的 3 人
of 3 persons that have more than 6 points超过 6 分的 3 人

My dataset looks like this:我的数据集如下所示：

id   person   points
201  rt99   NA
201  rt99   3
201  rt99   2
202  kt     4
202  kt     NA
202  kt     NA
203  rr     4
203  rr     NA
203  rr     NA
204  jk     2
204  jk     2
204  jk     NA
322  knm3   5
322  knm3   NA
322  knm3   3
343  kll2   2
343  kll2   1
343  kll2   5
344  kll    NA
344  kll    7
344  kll    1
345  nn     7
345  nn     NA
490  kk     1
490  kk     NA
490  kk     2
491  ww     1
491  ww     1
489  tt     1
489  tt     1
325  ll     1
325  ll     1
325  ll     NA

That is what I have already tried to code, here is an example of code for finding persons that have more than 1 point:这就是我已经尝试过的代码，这是一个用于查找超过 1 分的人的代码示例：

persons_filtered <- dataset %>%
group_by(person) %>%
dplyr::filter(sum(points, na.rm = T)>1) %>%
distinct(person) %>%
pull()
person_filtered
more_than_1 <- sample(person_filtered, size = 3)

Question: How to write this code better that I could have in the end 3 lists with unique persons.问题：如何更好地编写此代码，以便我最终可以拥有 3 个包含唯一人员的列表。 (I need to prevent to have same persons in the lists) （我需要防止列表中出现相同的人）

Answer 1

Here's a tidyverse solution, where the sampling in the three categories of interest is made at the same time.这是一个tidyverse解决方案，其中三个感兴趣类别的采样是同时进行的。

library(tidyverse)
dataset %>%
  # Group by person
  group_by(person) %>%
  # Get points sum
  summarize(sum_points = sum(points, na.rm = T)) %>%
  # Classify the sum points into categories defined by breaks, (0-1], (1-3] ...
  # I used 100 as the last value so that all sum points between 6 and Inf get classified as (6-Inf]
  mutate(point_class = cut(sum_points, breaks = c(0,1,3,6,Inf))) %>%
  # ungroup
  ungroup() %>%
  # group by point class
  group_by(point_class) %>%
  # Sample 3 rows per point_class
  sample_n(size = 3) %>%
  # Eliminate the sum_points column
  select(-sum_points) %>%
  # If you need this data in lists you can nest the results in the sampled_data column
  nest(sampled_data= -point_class)

如何在随机抽样中获得 3 个没有重复的列表？ (右)

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-27 18:40:09

如何在随机抽样中获得 3 个没有重复的列表？ (右)

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-27 18:40:09

解决方案1
1 已采纳 2020-08-27 18:40:09