简体   繁体   English

从 data.table 子集中重复采样选择有替换的行,而替换被关闭

[英]Repeated sampling from data.table subsets selects rows with replacement, while replacement is turned off

I have a data.table and one column is initially empty (NA values).我有一个 data.table 并且一列最初是空的(NA 值)。 I would like to select all rows with NA values in the NA column and then select two random samples and replace the NA value for them with an index variable coming from a loop.我想 select NA 列中具有 NA 值的所有行,然后 select 两个随机样本,并用来自循环的索引变量替换它们的 NA 值。 This step shall be repeated 3 times.此步骤应重复 3 次。

My code somehow seems not to produce correct subsets as already assigned non-NA values are overwritten.我的代码似乎无法生成正确的子集,因为已分配的非 NA 值被覆盖。

Desired possible output:所需的可能 output:

1   3
2   2
3   NA
4   2
5   NA
6   3
7   1
8   NA
9   1

Real possible output (2x3 values should have no NA-values):真正可能的 output(2x3 值应该没有 NA 值):

1   3
2   2
3   NA
4   NA
5   NA
6   3
7   NA
8   NA
9   1

MWE: MWE:

  d <- data.table(a=c(1,2,3,4,5,6,7,8,9), c=numeric())
  col_name <- "c" # 
  for(chunk in seq(1,3)) {
    d[d[is.na(get(col_name)), .I[sample(.N, 2, replace = FALSE)]], toString(col_name) := chunk]
  }

Why is this not working?为什么这不起作用?

Here is one possible solution:这是一种可能的解决方案:

library(data.table)
set.seed(123)

for(chunk in seq(1,3)) {
  d[sample(which(is.na(get(col_name))), 2), (col_name) := chunk]
}
d

#   a  c
#1: 1 NA
#2: 2  2
#3: 3  1
#4: 4  2
#5: 5  3
#6: 6  1
#7: 7 NA
#8: 8 NA
#9: 9  3

This can also be done without a loop:这也可以在没有循环的情况下完成:

n <- 3
d[sample(.N, n * 2), (col_name) := rep(seq_len(n), 2)]
d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM