[英]How to create new column using values in an existing column to tell which column the new values will come from?
Here is an example data. 这是一个示例数据。
testdata <- data.frame(A = c(1,0,1,1,0,0),
B = c(2,0,0,0,0,1),
D0 = c("A","A","B","C","A","A"),
D1 = c("B","C","C","A","B","B"),
D2 = c("C", NA,NA,NA,NA,NA),
stringsAsFactors = F)
What I wanted to do is make a new column based on columns A
and B
(eg, columns Aprime
and Bprime
). 我想要做的是根据A
列和B
列创建一个新列(例如,列Aprime
和Bprime
)。 The values that will be placed in the new column will be from columns with D
(eg, D0, D1, and D2
). 将放置在新列中的值将来自具有D
列(例如, D0, D1, and D2
)。 And the value in columns A
and B
tells which D
column to pick. 并且A
列和B
列中的值指示要选择的D
列。 So for example, for the new column Aprime
, the first value will be "B"
because the first row of A
is 1, thus it should take the first row of the D1
column. 因此,例如,对于新列Aprime
,第一个值将是"B"
因为A
的第一行是1,因此它应该采用D1
列的第一行。 For the first row of Bprime, it should have "C"
, because the first B
is 2, thus it should take the first D2
value. 对于Bprime的第一行,它应该具有"C"
,因为第一个B
是2,因此它应该取第一个D2
值。 The result should be something like this: 结果应该是这样的:
A B D0 D1 D2 Aprime Bprime
1 1 2 A B C B C
2 0 0 A C <NA> A A
3 1 0 B C <NA> C B
4 1 0 C A <NA> A C
5 0 0 A B <NA> A A
6 0 1 A B <NA> A B
I used the ifelse statements below to come up with the above results: 我使用下面的ifelse语句来得出上述结果:
testdata$Aprime <- ifelse(testdata$A == 0, testdata$D0, ifelse(testdata$A == 1, testdata$D1, testdata$D2))
testdata$Bprime <- ifelse(testdata$B == 0, testdata$D0, ifelse(testdata$B == 1, testdata$D1, testdata$D2))
However, I would like a more generic one because the D columns are not fixed (eg, there can be D3 up to D20). 但是,我想要一个更通用的,因为D列不是固定的(例如,可以有D3到D20)。 How can I do this one without writing an ifelse for the Ds greater than 0 (ie., D1 and so on)? 如果没有为大于0的Ds写一个ifelse(即,D1等),我怎么能这样做呢?
TIA. TIA。
Here is a base R method using matrix subsetting to select the values and lapply
to loop through columns A and B. 这是一个基本的R方法,使用矩阵子集来选择值,并使用lapply
循环遍历列A和B.
testdata[c("aprime", "bprime")] <-
lapply(testdata[c("A", "B")],
function(x) testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)])
The left side provides names for the new variables. 左侧提供新变量的名称。 On the right, the first argument of lapply provides the set of variables to run through. 在右边,lapply的第一个参数提供了要运行的变量集。 The second argument of lapply
, testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)]
first subsets the data.frame into the indexing columns, (D0-D2), and then provides a matrix for subsetting using cbind
. lapply
的第二个参数, testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)]
首先将data.frame子集化到索引列(D0-D2)中,然后提供用于使用cbind
进行子集化的矩阵。 The row indices are selected with seq_len..nrow
and the columns are selected from the varaibles provided in the first argument of lapply
. 使用seq_len..nrow
选择行索引,并从lapply的第一个参数中提供的变量中选择lapply
。
This returns 这回来了
testdata
A B D0 D1 D2 aprime bprime
1 1 2 A B C B C
2 0 0 A C <NA> A A
3 1 0 B C <NA> C B
4 1 0 C A <NA> A C
5 0 0 A B <NA> A A
6 0 1 A B <NA> A B
For more information on matrix subsetting, take a look at ?"["
. 有关矩阵子集的更多信息,请查看?"["
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.