简体   繁体   English

data.table有条件地使用另一个data.table中的值替换数据

[英]data.table replace data using values from another data.table, conditionally

This is similar to Update values in data.table with values from another data.table and R data.table replacing an index of values from another data.table , except in my situation the number of variables is very large so I do not want to list them explicitly. 这类似于用另一个data.table中的值更新R.data.table中的值替换另一个data.table中的值的索引更新data.table中的值 ,除了在我的情况下,变量的数量非常大,所以我不想明确列出它们。

What I have is a large data.table (let's call it dt_original ) and a smaller data.table (let's call it dt_newdata ) whose IDs are a subset of the first and it has only some of the variables of the first. 我所拥有的是一个大的data.table (我们将其dt_original )和一个较小的data.table (我们将其dt_newdata ),其ID是第一个的子集,并且仅具有第一个的一些变量。 I would like to update the values in dt_original with the values from dt_newdata . 我想更新中的值dt_original从值dt_newdata For an added twist, I only want to update the values conditionally - in this case, only if the values in dt_newdata are larger than the corresponding values in dt_original . 对于增加的扭曲,我希望有条件地更新值-在这种情况下,仅当值dt_newdata比相应的值较大dt_original

For a reproducible example, here are the data. 对于可重现的示例,以下是数据。 In the real world the tables are much larger: 在现实世界中,表要大得多:

library(data.table)
set.seed(0)

## This data.table with 20 rows and many variables is the existing data set
dt_original <- data.table(id = 1:20)
setkey(dt_original, id)

for(i in 2015:2017) {
  varA <- paste0('varA_', i)
  varB <- paste0('varB_', i)
  varC <- paste0('varC_', i)

  dt_original[, (varA) := rnorm(20)]
  dt_original[, (varB) := rnorm(20)]
  dt_original[, (varC) := rnorm(20)]
}

## This table with a strict subset of IDs from dt_original and only a part of
## the variables is our potential replacement data
dt_newdata <- data.table(id = sample(1:20, 3))
setkey(dt_newdata, id)

newdata_vars <- sample(names(dt_original)[-1], 4)

for(var in newdata_vars) {
  dt_newdata[, (var) := rnorm(3)]
}

Here is a way of doing it using a loop and pmax , but there has to be a better way, right? 这是一种使用loop和pmax ,但是必须有更好的方法,对吗?

for(var in newdata_vars) {
  k <- pmax(dt_newdata[, (var), with = FALSE], dt_original[id %in% dt_newdata$id, (var), with = FALSE])
  dt_original[id %in% dt_newdata$id, (var) := k, with = FALSE]
}

It seems like there should be a way using join syntax, and maybe the prefix i. 似乎应该有一种使用连接语法的方法,也许还有前缀i. and/or .SD or something like that, but nothing I've tried comes close enough to warrant repeating here. 和/或.SD或类似的名称,但我尝试过的任何内容都不足以在此处重复。

This code should work in the current format given your criteria. 根据您的条件,该代码应以当前格式运行。

dt_original[dt_newdata, names(dt_newdata) := Map(pmax, mget(names(dt_newdata)), dt_newdata)]

It joins to the IDs that match between the data.tables and then performs an assignment using := Because we want to return a list, I use Map to run pmax through the columns of data.tables matching by the name of dt_newdata. 它连接到data.tables之间匹配的ID,然后使用:=执行分配。由于我们要返回列表,因此我使用Map通过名称与dt_newdata匹配的data.tables列运行pmax Note that it is necessary that all names of dt_newdata are in dt_original data. 请注意,dt_newdata的所有名称都必须在dt_original数据中。

Following Frank's comment, you can remove the first column of the Map list items and the column names using [-1] because they are IDs, which don't need to be computed. 遵循弗兰克(Frank)的评论,您可以使用[-1]删除“ Map列表项的第一列和列名,因为它们是ID,不需要计算。 Removing the first column from Map avoids one pass of pmax and also preserves the key on id. Map删除第一列可避免一次pmax传递,并保留id上的键。 Thanks to @brian-stamper for pointing out the key preservation in the comments. 感谢@ brian-stamper指出注释中的密钥保留。

dt_original[dt_newdata,
            names(dt_newdata)[-1] := Map(pmax,
                                         mget(names(dt_newdata)[-1]),
                                         dt_newdata[, .SD, .SDcols=-1])]

Note that the use of [-1] assumes that the ID variable is located in the first position of new_data. 注意, [-1]的使用假定ID变量位于new_data的第一位置。 If it is elsewhere, you could change the index manually or use grep . 如果在其他位置,则可以手动更改索引或使用grep

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM