如何使用现有列中的值创建新列，以告知新值将来自哪一列？

Question

Here is an example data. 这是一个示例数据。

testdata <- data.frame(A = c(1,0,1,1,0,0),
                   B = c(2,0,0,0,0,1),
                   D0 = c("A","A","B","C","A","A"),
                   D1 = c("B","C","C","A","B","B"),
                   D2 = c("C", NA,NA,NA,NA,NA),
                   stringsAsFactors = F)

What I wanted to do is make a new column based on columns A and B (eg, columns Aprime and Bprime ). 我想要做的是根据A列和B列创建一个新列（例如，列Aprime和Bprime ）。 The values that will be placed in the new column will be from columns with D (eg, D0, D1, and D2 ). 将放置在新列中的值将来自具有D列（例如， D0, D1, and D2 ）。 And the value in columns A and B tells which D column to pick. 并且A列和B列中的值指示要选择的D列。 So for example, for the new column Aprime , the first value will be "B" because the first row of A is 1, thus it should take the first row of the D1 column. 因此，例如，对于新列Aprime ，第一个值将是"B"因为A的第一行是1，因此它应该采用D1列的第一行。 For the first row of Bprime, it should have "C" , because the first B is 2, thus it should take the first D2 value. 对于Bprime的第一行，它应该具有"C" ，因为第一个B是2，因此它应该取第一个D2值。 The result should be something like this: 结果应该是这样的：

  A B D0 D1   D2 Aprime Bprime
1 1 2  A  B    C      B      C
2 0 0  A  C <NA>      A      A
3 1 0  B  C <NA>      C      B
4 1 0  C  A <NA>      A      C
5 0 0  A  B <NA>      A      A
6 0 1  A  B <NA>      A      B

I used the ifelse statements below to come up with the above results: 我使用下面的ifelse语句来得出上述结果：

testdata$Aprime <- ifelse(testdata$A == 0, testdata$D0, ifelse(testdata$A == 1, testdata$D1, testdata$D2))
testdata$Bprime <- ifelse(testdata$B == 0, testdata$D0, ifelse(testdata$B == 1, testdata$D1, testdata$D2))

However, I would like a more generic one because the D columns are not fixed (eg, there can be D3 up to D20). 但是，我想要一个更通用的，因为D列不是固定的（例如，可以有D3到D20）。 How can I do this one without writing an ifelse for the Ds greater than 0 (ie., D1 and so on)? 如果没有为大于0的Ds写一个ifelse（即，D1等），我怎么能这样做呢？

TIA. TIA。

Answer 1

Here is a base R method using matrix subsetting to select the values and lapply to loop through columns A and B. 这是一个基本的R方法，使用矩阵子集来选择值，并使用lapply循环遍历列A和B.

testdata[c("aprime", "bprime")] <-
      lapply(testdata[c("A", "B")],
             function(x) testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)])

The left side provides names for the new variables. 左侧提供新变量的名称。 On the right, the first argument of lapply provides the set of variables to run through. 在右边，lapply的第一个参数提供了要运行的变量集。 The second argument of lapply , testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)] first subsets the data.frame into the indexing columns, (D0-D2), and then provides a matrix for subsetting using cbind . lapply的第二个参数， testdata[, 3:5][cbind(seq_len(nrow(testdata)), x + 1)]首先将data.frame子集化到索引列（D0-D2）中，然后提供用于使用cbind进行子集化的矩阵。 The row indices are selected with seq_len..nrow and the columns are selected from the varaibles provided in the first argument of lapply . 使用seq_len..nrow选择行索引，并从lapply的第一个参数中提供的变量中选择lapply 。

This returns 这回来了

testdata
  A B D0 D1   D2 aprime bprime
1 1 2  A  B    C      B      C
2 0 0  A  C <NA>      A      A
3 1 0  B  C <NA>      C      B
4 1 0  C  A <NA>      A      C
5 0 0  A  B <NA>      A      A
6 0 1  A  B <NA>      A      B

For more information on matrix subsetting, take a look at ?"[" . 有关矩阵子集的更多信息，请查看?"[" 。

如何使用现有列中的值创建新列，以告知新值将来自哪一列？

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-05-21 12:42:39

如何使用现有列中的值创建新列，以告知新值将来自哪一列？

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-05-21 12:42:39

解决方案1
3 已采纳 2017-05-21 12:42:39