[英]How to get 3 lists with no duplicates in a random sampling? (R)
I have done the first step:我已经完成了第一步:
My goal: I need to have random samples (with no duplicates of persons)我的目标:我需要随机样本(没有重复的人)
My dataset looks like this:我的数据集如下所示:
id person points
201 rt99 NA
201 rt99 3
201 rt99 2
202 kt 4
202 kt NA
202 kt NA
203 rr 4
203 rr NA
203 rr NA
204 jk 2
204 jk 2
204 jk NA
322 knm3 5
322 knm3 NA
322 knm3 3
343 kll2 2
343 kll2 1
343 kll2 5
344 kll NA
344 kll 7
344 kll 1
345 nn 7
345 nn NA
490 kk 1
490 kk NA
490 kk 2
491 ww 1
491 ww 1
489 tt 1
489 tt 1
325 ll 1
325 ll 1
325 ll NA
That is what I have already tried to code, here is an example of code for finding persons that have more than 1 point:这就是我已经尝试过的代码,这是一个用于查找超过 1 分的人的代码示例:
persons_filtered <- dataset %>%
group_by(person) %>%
dplyr::filter(sum(points, na.rm = T)>1) %>%
distinct(person) %>%
pull()
person_filtered
more_than_1 <- sample(person_filtered, size = 3)
Question: How to write this code better that I could have in the end 3 lists with unique persons.问题:如何更好地编写此代码,以便我最终可以拥有 3 个包含唯一人员的列表。 (I need to prevent to have same persons in the lists) (我需要防止列表中出现相同的人)
Here's a tidyverse
solution, where the sampling in the three categories of interest is made at the same time.这是一个tidyverse
解决方案,其中三个感兴趣类别的采样是同时进行的。
library(tidyverse)
dataset %>%
# Group by person
group_by(person) %>%
# Get points sum
summarize(sum_points = sum(points, na.rm = T)) %>%
# Classify the sum points into categories defined by breaks, (0-1], (1-3] ...
# I used 100 as the last value so that all sum points between 6 and Inf get classified as (6-Inf]
mutate(point_class = cut(sum_points, breaks = c(0,1,3,6,Inf))) %>%
# ungroup
ungroup() %>%
# group by point class
group_by(point_class) %>%
# Sample 3 rows per point_class
sample_n(size = 3) %>%
# Eliminate the sum_points column
select(-sum_points) %>%
# If you need this data in lists you can nest the results in the sampled_data column
nest(sampled_data= -point_class)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.