[英]Adding a new column to data.frame whose values are random samples of one column and conditioned on another
I want to add a new column ( category
) whose values ( a/b
) are random samples (without replacement) of the id
-column, but conditioned on the value ( A/B
) in the group
-column.我想添加一个新列(
category
),其值( a/b
)是id
列的随机样本(无替换),但以group
列中的值( A/B
)为条件。 When trying to do so, however, the value in the id
column changes--I don't understand why this is happening.但是,当尝试这样做时,
id
列中的值会发生变化——我不明白为什么会发生这种情况。
set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA
> table(df$group)
1 2
6 4
df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F),]$category <- "a"
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F),]$category <- "b"
> df
id group category
1 A 1 a
2 B 1 <NA>
3 B 1 a
4 D 2 b
5 E 1 <NA>
6 F 2 <NA>
7 G 2 <NA>
8 H 2 b
9 C 1 a
10 E 1 a
> df$id==LETTERS[1:10]
[1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE
# this should be all TRUE
(Please feel free to edit title and question, if it is not expressed clearly enough) (如果表达不够清楚,请随时编辑标题和问题)
The issue is that, the i
is used for 'id'.问题是,
i
用于 'id'。 It would have worked if the row.names
of the dataset is 'id'.如果数据集的
row.names
是“id”,它会起作用。 Here, we may need to match
with 'id'在这里,我们可能需要
match
'id'
i1 <- with(df, match(sample(id[group == 1], size = 4, replace = FALSE), id))
df$category[i1] <- 'a'
and similarly for the second case第二种情况类似
i2 <- with(df, match(sample(id[group == 2], size = 2, replace = FALSE), id))
df$category[i2] <- 'b'
-output -输出
df
# id group category
#1 A 1 a
#2 B 1 <NA>
#3 C 1 a
#4 D 2 b
#5 E 1 a
#6 F 2 <NA>
#7 G 2 b
#8 H 2 <NA>
#9 I 1 <NA>
#10 J 1 a
df$id==LETTERS[1:10]
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
This is weird, but it worked when I substitute the $ operator, by including the name of "category" inside the subsetting function.这很奇怪,但是当我替换 $ 运算符时它起作用了,方法是在子集函数中包含“类别”的名称。 Like this:
像这样:
set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA
df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F), "category"] <- "a"
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F), "category"] <- "b"
Resulting this:结果是这样:
id group category
1 A 1 a
2 B 1 <NA>
3 C 1 a
4 D 2 <NA>
5 E 1 <NA>
6 F 2 <NA>
7 G 2 b
8 H 2 b
9 I 1 a
10 J 1 a
df$id==LETTERS[1:10]
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.