简体   繁体   English

如何在随机抽样中获得 3 个没有重复的列表? (右)

[英]How to get 3 lists with no duplicates in a random sampling? (R)

I have done the first step:我已经完成了第一步:

  • how many persons have more than 1 point有多少人有超过 1 分
  • how many persons have more than 3 points有多少人有超过3分
  • how many persons have more than 6 points有多少人有超过6分

My goal: I need to have random samples (with no duplicates of persons)我的目标:我需要随机样本(没有重复的人)

  • of 3 persons that have more than 1 point超过 1 分的 3 人
  • of 3 persons that have more than 3 points超过 3 分的 3 人
  • of 3 persons that have more than 6 points超过 6 分的 3 人

My dataset looks like this:我的数据集如下所示:

id   person   points
201  rt99   NA
201  rt99   3
201  rt99   2
202  kt     4
202  kt     NA
202  kt     NA
203  rr     4
203  rr     NA
203  rr     NA
204  jk     2
204  jk     2
204  jk     NA
322  knm3   5
322  knm3   NA
322  knm3   3
343  kll2   2
343  kll2   1
343  kll2   5
344  kll    NA
344  kll    7
344  kll    1
345  nn     7
345  nn     NA
490  kk     1
490  kk     NA
490  kk     2
491  ww     1
491  ww     1
489  tt     1
489  tt     1
325  ll     1
325  ll     1
325  ll     NA

That is what I have already tried to code, here is an example of code for finding persons that have more than 1 point:这就是我已经尝试过的代码,这是一个用于查找超过 1 分的人的代码示例:

persons_filtered <- dataset %>%
group_by(person) %>%
dplyr::filter(sum(points, na.rm = T)>1) %>%
distinct(person) %>%
pull()
person_filtered
more_than_1 <- sample(person_filtered, size = 3)

Question: How to write this code better that I could have in the end 3 lists with unique persons.问题:如何更好地编写此代码,以便我最终可以拥有 3 个包含唯一人员的列表。 (I need to prevent to have same persons in the lists) (我需要防止列表中出现相同的人)

Here's a tidyverse solution, where the sampling in the three categories of interest is made at the same time.这是一个tidyverse解决方案,其中三个感兴趣类别的采样是同时进行的。

library(tidyverse)
dataset %>%
  # Group by person
  group_by(person) %>%
  # Get points sum
  summarize(sum_points = sum(points, na.rm = T)) %>%
  # Classify the sum points into categories defined by breaks, (0-1], (1-3] ...
  # I used 100 as the last value so that all sum points between 6 and Inf get classified as (6-Inf]
  mutate(point_class = cut(sum_points, breaks = c(0,1,3,6,Inf))) %>%
  # ungroup
  ungroup() %>%
  # group by point class
  group_by(point_class) %>%
  # Sample 3 rows per point_class
  sample_n(size = 3) %>%
  # Eliminate the sum_points column
  select(-sum_points) %>%
  # If you need this data in lists you can nest the results in the sampled_data column
  nest(sampled_data= -point_class)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM