简体   繁体   English

使用R中的其他列值来选择data.table分配的LHS列和RHS列

[英]Picking LHS column and RHS column of data.table assignment using other column values in R

Here is the code to produce a sample dataset: 这是产生样本数据集的代码:

require(data.table)
testdata <- data.table(
  X = rep(sample(1:3),5),
  Y = rep(sample(1:3),5),
  X1 = rnorm(15),
  X2 = rnorm(15),
  X3 = rnorm(15),
  Y1 = NA_character_,
  Y2 = NA_character_,
  Y3 = NA_character_
)

Initial data table: 初始数据表:

    X Y         X1          X2          X3 Y1 Y2 Y3
 1: 3 3 -0.7098927  0.63342935  0.94470612 NA NA NA
 2: 1 2  0.3008547 -1.40043977  1.53781754 NA NA NA
 3: 2 1  0.3423140  0.34897695 -0.38402565 NA NA NA
 4: 3 3 -0.5726456 -2.24526957 -1.10947867 NA NA NA
 5: 1 2 -1.3239474 -0.53924617 -0.04103982 NA NA NA
 6: 2 1  0.2493801  0.85806647  0.96488021 NA NA NA
 7: 3 3 -2.0653505  0.05481703  1.75161043 NA NA NA
 8: 1 2 -1.3919774  0.34282832  0.50834289 NA NA NA
 9: 2 1  0.5928025 -1.11899399  0.35967102 NA NA NA
10: 3 3 -0.4704720  0.64004313 -0.17343794 NA NA NA
11: 1 2  0.3056093  2.14544631  0.43740447 NA NA NA
12: 2 1 -0.1568971  1.05091249  1.18884487 NA NA NA
13: 3 3 -1.3078670  1.07482123 -0.65367957 NA NA NA
14: 1 2  0.4622123 -0.60308532 -1.11104235 NA NA NA
15: 2 1 -0.7894978  0.33018926 -0.04700393 NA NA NA

Here is the action I want to perform: In each row, 这是我要执行的操作:在每一行中,

if X = 2 and Y = 3 then Y3 <- X2

Expected Output: 预期产量:

    X Y         X1          X2          X3 Y1                 Y2                 Y3
 1: 3 3 -0.7098927  0.63342935  0.94470612 NA                 NA                 0.94470612
 2: 1 2  0.3008547 -1.40043977  1.53781754 NA                 0.3008547          NA
 3: 2 1  0.3423140  0.34897695 -0.38402565 0.34897695         NA                 NA
 4: 3 3 -0.5726456 -2.24526957 -1.10947867 NA                 NA                 -1.10947867
 5: 1 2 -1.3239474 -0.53924617 -0.04103982 NA                 -1.3239474         NA
 6: 2 1  0.2493801  0.85806647  0.96488021 0.85806647         NA                 NA
 7: 3 3 -2.0653505  0.05481703  1.75161043 NA                 NA                 1.75161043
 8: 1 2 -1.3919774  0.34282832  0.50834289 NA                 -1.3919774         NA
 9: 2 1  0.5928025 -1.11899399  0.35967102 -1.11899399        NA                 NA
10: 3 3 -0.4704720  0.64004313 -0.17343794 NA                 NA                 -0.17343794
11: 1 2  0.3056093  2.14544631  0.43740447 NA                 0.3056093          NA
12: 2 1 -0.1568971  1.05091249  1.18884487 1.05091249         NA                 NA
13: 3 3 -1.3078670  1.07482123 -0.65367957 NA                 NA                 -0.65367957
14: 1 2  0.4622123 -0.60308532 -1.11104235 NA                 0.4622123          NA
15: 2 1 -0.7894978  0.33018926 -0.04700393 0.33018926         NA                 NA

How can I achieve this using simple data.table syntax? 如何使用简单的data.table语法实现此目标? I have tried get, eval(parse) etc but running into trouble each time. 我尝试过get,eval(parse)等,但是每次都遇到麻烦。

Note that my actual dataset is quite large(100 plus columns) so I require a solution that doesn't rely on column numbers. 请注意,我的实际数据集非常大(100列以上),因此我需要一个不依赖列号的解决方案。 I can possible write a large number of if statements as well but it looks like a bad way to do this for about 30 odd columns that need to be assigned in a similar way. 我也可以编写大量的if语句,但是对于需要以类似方式分配的约30个奇数列,这样做似乎是一种糟糕的方法。

data.table version is 1.10.4 and the R version is 3.3.2 data.table版本为1.10.4,R版本为3.3.2

Edit: I solved it using a function. 编辑:我解决了使用功能。 Not sure if this is the best way though as it is very very slow. 不确定这是否是最好的方法,因为它非常非常慢。

populateY <- function(input_table) {

  for(i in 1:nrow(input_table)) {
    k <- X
    j <- Y
    tempX <- paste0("input_table$X",k,"[i]")
    tempY <- paste0("input_table$Y",j,"[i]")
    eval(parse(text = paste0(tempY," <- ",tempX)))
  }    
  return(input_table)
}

If you're open to using the tidyverse and tibble data frames, I would do it this way. 如果您愿意使用tidyverse和tibble数据帧,我会这样做。

require(tibble)
testdata <- as_tibble(testdata)

testdata <- testdata %>%
  mutate(Y3 = ifelse(X == 2 & Y == 3, X2, NA))

You can then add all the lines you need easily and legibly in the mutate function. 然后,您可以在mutate函数中轻松,清晰地添加所需的所有行。

Else if you're going to use data.tables for sure, then I'd go with akrun's suggestion, though you'll need change the data type of column Y3 to double, or just not have it exist when you run that code. 否则,如果您肯定要使用data.tables,那么我会同意akrun的建议,尽管您需要将Y3列的数据类型更改为两倍,或者在运行该代码时不存在它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM