[英]Filling a data.table from smaller data.tables
I am looking for a way to fill a Result data.table from smaller data.tables that come from calculations.我正在寻找一种方法来填充来自计算的较小 data.tables 的结果data.table。 My approach was the following:我的方法如下:
#CREATE EXAMPLE
library(data.table)
# The empty table to be filled
DT <- data.table(
"ID" = c("a", "b", "c", "d"),
"A" = numeric(4),
"B" = numeric(4))
ID A B
1: a 0 0
2: b 0 0
3: c 0 0
4: d 0 0
# Table with part of the results
DT_short <- data.table(
"ID" = c("a", "b", "d"),
"A" = 1:3,
"B" = 1:3)
ID A B
1: a 1 1
2: b 2 2
3: d 3 3
What I would like to do is to fill rows and columns according to their name.我想做的是根据他们的名字填充行和列。 I managed to access the part of the big data.table I want to change by我设法访问了我想要更改的大 data.table 部分
nm1 <- names(DT_short)
DT[ID %in% DT_short[, ID], ..nm1]
#Bonus question: Why do I have to assign nm1 before, how do I make it work directly in []?
Now I would like to replace this part of DT
by the small table DT_short
, but everything I tried (like <-
or :=
, or some kind of merge
) didn't work.现在我想用小表DT_short
替换DT
的这一部分,但是我尝试的所有内容(例如<-
或:=
,或某种merge
)都不起作用。 Eg error object '..nm1' not found
for DT[ID %in% DT_short[, ID], ..nm1] <- DT_short
例如, object '..nm1' not found
DT[ID %in% DT_short[, ID], ..nm1] <- DT_short
错误object '..nm1' not found
Please help me by providing a solution or pointing me in the right direction.请通过提供解决方案或为我指明正确的方向来帮助我。 (Since the data I am working with is rather small - 10^2 columns, 10^2 rows, ~40 small files to be combined, number<10^9 per field - and other people will use my code readability is more important than performance.) (由于我正在处理的数据相当小——10^2 列、10^2 行、~40 个要合并的小文件,每个字段的数量<10^9——其他人会使用我的代码可读性比表现。)
EDIT编辑
In response to Ronak Shah.回应罗纳克·沙阿。 When I test your solution with the code below it works perfectly well without any errors/warnings.当我使用下面的代码测试您的解决方案时,它运行良好,没有任何错误/警告。 Before accepting the solution I would like to make sure it works for others as well / know why it causes warnings for you and not me.在接受解决方案之前,我想确保它也适用于其他人/知道为什么它会为您而不是我引起警告。
library(data.table)
packageVersion('data.table')
#[1] ‘1.12.8’
#the empty table to be filled
DT <- data.table(
"ID" = c("a", "b", "c", "d"),
"A" = numeric(4),
"B" = numeric(4),
"C" = numeric(4)
)
# ID A B C
#1: a 0 0 0
#2: b 0 0 0
#3: c 0 0 0
#4: d 0 0 0
#table with part of the results
DT_short <- data.table(
"ID" = c("a", "b", "d"),
"A" = 1:3,
"B" = 1:3
)
# ID A B
#1: a 1 1
#2: b 2 2
#3: d 3 3
#table with part of the results 2
DT_shorter <- data.table(
"ID" = c("c"),
"A" = 7,
"B" = 70,
"C" = 3.14
)
# ID A B C
#1: c 7 70 3.14
DT[match(DT_short$ID, DT$ID), match(names(DT_short), names(DT))] <- DT_short
DT[match(DT_shorter$ID, DT$ID), match(names(DT_shorter), names(DT))] <- DT_shorter
DT
# ID A B C
#1: a 1 1 0.00
#2: b 2 2 0.00
#3: c 7 70 3.14
#4: d 3 3 0.00
Here is one possible approach.这是一种可能的方法。 For each column in mycols
, you want to assign values from DT_short
.对于mycols
每一列,您希望从DT_short
分配值。 When you do that, you want to use match()
and get indices, and use it to create a new vector.当你这样做时,你想使用match()
并获取索引,并使用它来创建一个新的向量。 Once you create a new data.table, you want to replace NAs with 0.创建新的 data.table 后,您希望将 NA 替换为 0。
library(data.table)
mycols <- names(DT)[2:3]
as.data.table(lapply(mycols, function(x){
DT_short[match(x = DT$ID, table = DT_short$ID), ..x]}))[,
(mycols) := replace(x = .SD, list = is.na(.SD), values = 0),
.SDcols = mycols][]
# A B
#1: 1 1
#2: 2 2
#3: 0 0
#4: 3 3
Another option is to use an update join:另一种选择是使用更新连接:
cols <- setdiff(names(DT_short), "ID")
DT[DT_short, on=.(ID), (cols) := mget(paste0("i.", cols))]
Since you mentioned that you are ok with other solutions, this part is easy to do with base R data.frames by subsetting row and columns of smaller dataframes from the bigger ones and assigning the shorter dataframe.由于您提到您对其他解决方案没问题,因此通过从较大的数据帧中细分较小数据帧的行和列并分配较短的数据帧,这部分很容易使用基本 R data.frames 来完成。
df1 <- data.frame(DT)
df2 <- data.frame(DT_short)
df1[match(df2$ID, df1$ID), match(names(df2), names(df1))] <- df2
df1
# ID A B
#1 a 1 1
#2 b 2 2
#3 c 0 0
#4 d 3 3
I don't think it is right to do the same with data.table
but if we run the above code it works (atleast for the example shared)我认为对data.table
做同样的事情是不对的,但是如果我们运行上面的代码它就可以工作(至少对于共享的示例)
DT[match(DT_short$ID, DT$ID), match(names(DT_short), names(DT))] <- DT_short
but it returns a big warning message which kind of confirms this isn't the right approach for data.tables.但它返回一个很大的警告消息,确认这不是 data.tables 的正确方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.