向 data.frame 添加一个新列，其值是一列的随机样本并以另一列为条件

Question

I want to add a new column ( category ) whose values ( a/b ) are random samples (without replacement) of the id -column, but conditioned on the value ( A/B ) in the group -column.我想添加一个新列（ category ），其值（ a/b ）是id列的随机样本（无替换），但以group列中的值（ A/B ）为条件。 When trying to do so, however, the value in the id column changes--I don't understand why this is happening.但是，当尝试这样做时， id列中的值会发生变化——我不明白为什么会发生这种情况。

set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA

> table(df$group)
1 2 
6 4

df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F),]$category <- "a" 
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F),]$category <- "b" 

> df
    id group category
 1   A     1        a
 2   B     1     <NA>
 3   B     1        a
 4   D     2        b
 5   E     1     <NA>
 6   F     2     <NA>
 7   G     2     <NA>
 8   H     2        b
 9   C     1        a
 10  E     1        a

> df$id==LETTERS[1:10]
 [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
# this should be all TRUE

(Please feel free to edit title and question, if it is not expressed clearly enough) （如果表达不够清楚，请随时编辑标题和问题）

Answer 1

The issue is that, the i is used for 'id'.问题是， i用于 'id'。 It would have worked if the row.names of the dataset is 'id'.如果数据集的row.names是“id”，它会起作用。 Here, we may need to match with 'id'在这里，我们可能需要match 'id'

i1 <- with(df, match(sample(id[group == 1], size = 4, replace = FALSE), id))
df$category[i1] <- 'a'

and similarly for the second case第二种情况类似

i2 <- with(df, match(sample(id[group == 2], size = 2, replace = FALSE), id))
df$category[i2] <- 'b'

-output -输出

df
#   id group category
#1   A     1        a
#2   B     1     <NA>
#3   C     1        a
#4   D     2        b
#5   E     1        a
#6   F     2     <NA>
#7   G     2        b
#8   H     2     <NA>
#9   I     1     <NA>
#10  J     1        a

df$id==LETTERS[1:10]
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Answer 2

This is weird, but it worked when I substitute the $ operator, by including the name of "category" inside the subsetting function.这很奇怪，但是当我替换 $ 运算符时它起作用了，方法是在子集函数中包含“类别”的名称。 Like this:像这样：

set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA

df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F), "category"] <- "a" 
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F), "category"] <- "b"

Resulting this:结果是这样：

   id group category
1   A     1        a
2   B     1     <NA>
3   C     1        a
4   D     2     <NA>
5   E     1     <NA>
6   F     2     <NA>
7   G     2        b
8   H     2        b
9   I     1        a
10  J     1        a


df$id==LETTERS[1:10]

# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

向 data.frame 添加一个新列，其值是一列的随机样本并以另一列为条件

问题描述

2 个解决方案

解决方案1
1 2020-09-15 17:11:18

解决方案2
1 2020-09-15 17:18:01

向 data.frame 添加一个新列，其值是一列的随机样本并以另一列为条件

问题描述

2 个解决方案

解决方案1 1 2020-09-15 17:11:18

解决方案2 1 2020-09-15 17:18:01

解决方案1
1 2020-09-15 17:11:18

解决方案2
1 2020-09-15 17:18:01