简体   繁体   English

使用 sample_n 在 R 中随机选择行

[英]Randomly select rows in R using sample_n

df <- data.frame(
  id = c(1:12), 
  day = c(1, 1, 1,1, 2, 2,2, 2, 3,3,3,3), 
  endpoint = c(1, 1, 1,1, 2,2,2,2,1,1,1,1))  
df
#>    id day endpoint
#> 1   1   1        1
#> 2   2   1        1
#> 3   3   1        1
#> 4   4   1        1
#> 5   5   2        2
#> 6   6   2        2
#> 7   7   2        2
#> 8   8   2        2
#> 9   9   3        1
#> 10 10   3        1
#> 11 11   3        1
#> 12 12   3        1

In the above data, there some patients(id) reached the endpoint each day .在上面的数据中, day都有一些患者(id)到达endpoint I am trying to randomly select the endpoint number of patients with s = 1 .我正在尝试随机选择s = 1患者的endpoint数量。 For each day, id s on that day and previously days are eligible as long as not previously selected.对于每一天,只要之前未选择,当天和前几天的id都是合格的。 The following code gets what I expected, but I have to manually enter day and endpoint values.以下代码符合我的预期,但我必须手动输入dayendpoint值。 Any suggestions on how to pick those values directly from the data would be appreciated.任何关于如何直接从数据中选择这些值的建议将不胜感激。

library(dplyr)
df$s = 0 
df$s <-ifelse(df$id%in%sample_n(df[df$day<=1 & df$s==0, ], 1)$id, 1, df$s) 
df$s <-ifelse(df$id%in%sample_n(df[df$day<=2 & df$s==0, ], 2)$id, 1, df$s) 
df$s <-ifelse(df$id%in%sample_n(df[df$day<=3 & df$s==0, ], 1)$id, 1, df$s) 
df
#>    id day endpoint s pick_day 
#> 1   1   1        1 0 0
#> 2   2   1        1 1 2
#> 3   3   1        1 1 1
#> 4   4   1        1 1 3
#> 5   5   2        2 1 2
#> 6   6   2        2 0 0
#> 7   7   2        2 0 0
#> 8   8   2        2 0 0
#> 9   9   3        1 0 0
#> 10 10   3        1 0 0
#> 11 11   3        1 0 0
#> 12 12   3        1 0 0

EDIT编辑

Is it possible to add a variable to show the day for which a row was picked, like the above variable pick_day ?是否可以添加一个变量来显示选择行的day ,如上面的变量pick_day Thanks.谢谢。

A way in base R using for loop :在基 R 中使用for循环的一种方法:

df$s = 0 
set.seed(123)

for (i in unique(df$day)) {
   temp <- subset(df, day <= i & s == 0)
   ids <- with(temp, sample(id, endpoint[day == i][1]))
   df$s[df$id %in% ids] <- 1
}

df

#   id day endpoint s
#1   1   1        1 0
#2   2   1        1 0
#3   3   1        1 1
#4   4   1        1 1
#5   5   2        2 1
#6   6   2        2 0
#7   7   2        2 0
#8   8   2        2 1
#9   9   3        1 0
#10 10   3        1 0
#11 11   3        1 0
#12 12   3        1 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM