简体   繁体   English

向 data.frame 添加一个新列,其值是一列的随机样本并以另一列为条件

[英]Adding a new column to data.frame whose values are random samples of one column and conditioned on another

I want to add a new column ( category ) whose values ( a/b ) are random samples (without replacement) of the id -column, but conditioned on the value ( A/B ) in the group -column.我想添加一个新列( category ),其值( a/b )是id列的随机样本(无替换),但以group列中的值( A/B )为条件。 When trying to do so, however, the value in the id column changes--I don't understand why this is happening.但是,当尝试这样做时, id列中的值会发生变化——我不明白为什么会发生这种情况。

set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA

> table(df$group)
1 2 
6 4

df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F),]$category <- "a" 
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F),]$category <- "b" 

> df
    id group category
 1   A     1        a
 2   B     1     <NA>
 3   B     1        a
 4   D     2        b
 5   E     1     <NA>
 6   F     2     <NA>
 7   G     2     <NA>
 8   H     2        b
 9   C     1        a
 10  E     1        a

> df$id==LETTERS[1:10]
 [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
# this should be all TRUE

(Please feel free to edit title and question, if it is not expressed clearly enough) (如果表达不够清楚,请随时编辑标题和问题)

The issue is that, the i is used for 'id'.问题是, i用于 'id'。 It would have worked if the row.names of the dataset is 'id'.如果数据集的row.names是“id”,它会起作用。 Here, we may need to match with 'id'在这里,我们可能需要match 'id'

i1 <- with(df, match(sample(id[group == 1], size = 4, replace = FALSE), id))
df$category[i1] <- 'a'

and similarly for the second case第二种情况类似

i2 <- with(df, match(sample(id[group == 2], size = 2, replace = FALSE), id))
df$category[i2] <- 'b'

-output -输出

df
#   id group category
#1   A     1        a
#2   B     1     <NA>
#3   C     1        a
#4   D     2        b
#5   E     1        a
#6   F     2     <NA>
#7   G     2        b
#8   H     2     <NA>
#9   I     1     <NA>
#10  J     1        a

df$id==LETTERS[1:10]
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

This is weird, but it worked when I substitute the $ operator, by including the name of "category" inside the subsetting function.这很奇怪,但是当我替换 $ 运算符时它起作用了,方法是在子集函数中包含“类别”的名称。 Like this:像这样:

set.seed(123)
df <- data.frame(id=LETTERS[1:10], group=sample(c("1","2"), size=10, replace=T))
df$category <- NA

df[df$id %in% sample(df[df$group=="1",]$id, size=4, replace=F), "category"] <- "a" 
df[df$id %in% sample(df[df$group=="2",]$id, size=2, replace=F), "category"] <- "b" 

Resulting this:结果是这样:

   id group category
1   A     1        a
2   B     1     <NA>
3   C     1        a
4   D     2     <NA>
5   E     1     <NA>
6   F     2     <NA>
7   G     2        b
8   H     2        b
9   I     1        a
10  J     1        a


df$id==LETTERS[1:10]

# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用一个data.frame中的数据为R中另一个data.frame中的新列生成值 - Using data in one data.frame to generate values for a new column in another data.frame in R 从一个数据框的不同列创建一个新列,该条件以另一个数据框的另一列为条件 - Create a new column from different columns of one data frame conditioned on another column from another data frame 来自data.frame的每列的随机样本 - Random samples from each column of a data.frame 根据2个匹配的列值将值从一个data.frame添加到另一个 - Adding values from one data.frame to another based on 2 matching column values 在R中的条件下,将一个data.frame中的列值乘以另一个data.frame中的列 - Multiply column values in one data.frame by column in another data.frame on a condition in R 向 data.frame 添加一个新列,其因子取决于另一个 data.frame 的条件 - Adding a new column to data.frame with a factor depending conditions from another data.frame 通过添加新列来扩展data.frame - Expand data.frame by adding new column 在r中的data.frame中添加新列 - adding a new column to a data.frame in r 将 data.frame 中的值添加到另一个 data.frame 中匹配两个条件的新列 - add values from data.frame to a new column in another data.frame that matches two criteria 如果前两列都匹配,则将数据框的一列中的值添加到另一数据框的新列中 - adding values from one column of a data frame into a new column of another dataframe if the first two columns in both match
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM