简体   繁体   English

从较小的 data.tables 填充 data.table

[英]Filling a data.table from smaller data.tables

I am looking for a way to fill a Result data.table from smaller data.tables that come from calculations.我正在寻找一种方法来填充来自计算的较小 data.tables 的结果data.table。 My approach was the following:我的方法如下:

#CREATE EXAMPLE

library(data.table)

# The empty table to be filled

DT <- data.table(
   "ID" = c("a", "b", "c", "d"),
   "A" = numeric(4),
   "B" = numeric(4))

   ID A B
1:  a 0 0
2:  b 0 0
3:  c 0 0
4:  d 0 0

# Table with part of the results
DT_short <- data.table(
         "ID" = c("a", "b", "d"),
         "A" = 1:3,
         "B" = 1:3)

   ID A B
1:  a 1 1
2:  b 2 2
3:  d 3 3

What I would like to do is to fill rows and columns according to their name.我想做的是根据他们的名字填充行和列。 I managed to access the part of the big data.table I want to change by我设法访问了我想要更改的大 data.table 部分

nm1 <- names(DT_short)
DT[ID %in% DT_short[, ID], ..nm1]
#Bonus question: Why do I have to assign nm1 before, how do I make it work directly in []?

Now I would like to replace this part of DT by the small table DT_short , but everything I tried (like <- or := , or some kind of merge ) didn't work.现在我想用小表DT_short替换DT的这一部分,但是我尝试的所有内容(例如<-:= ,或某种merge )都不起作用。 Eg error object '..nm1' not found for DT[ID %in% DT_short[, ID], ..nm1] <- DT_short例如, object '..nm1' not found DT[ID %in% DT_short[, ID], ..nm1] <- DT_short错误object '..nm1' not found

Please help me by providing a solution or pointing me in the right direction.请通过提供解决方案或为我指明正确的方向来帮助我。 (Since the data I am working with is rather small - 10^2 columns, 10^2 rows, ~40 small files to be combined, number<10^9 per field - and other people will use my code readability is more important than performance.) (由于我正在处理的数据相当小——10^2 列、10^2 行、~40 个要合并的小文件,每个字段的数量<10^9——其他人会使用我的代码可读性比表现。)

EDIT编辑

In response to Ronak Shah.回应罗纳克·沙阿。 When I test your solution with the code below it works perfectly well without any errors/warnings.当我使用下面的代码测试您的解决方案时,它运行良好,没有任何错误/警告。 Before accepting the solution I would like to make sure it works for others as well / know why it causes warnings for you and not me.在接受解决方案之前,我想确保它也适用于其他人/知道为什么它会为您而不是我引起警告。

library(data.table)
packageVersion('data.table')
#[1] ‘1.12.8’

#the empty table to be filled
DT <- data.table(
  "ID" = c("a", "b", "c", "d"),
  "A" = numeric(4),
  "B" = numeric(4),
  "C" = numeric(4)
)
#   ID A B C
#1:  a 0 0 0
#2:  b 0 0 0
#3:  c 0 0 0
#4:  d 0 0 0

#table with part of the results
DT_short <- data.table(
  "ID" = c("a", "b", "d"),
  "A" = 1:3,
  "B" = 1:3
)
#   ID A B
#1:  a 1 1
#2:  b 2 2
#3:  d 3 3

#table with part of the results 2
DT_shorter <- data.table(
  "ID" = c("c"),
  "A" = 7,
  "B" = 70,
  "C" = 3.14
)
#   ID A  B    C
#1:  c 7 70 3.14


DT[match(DT_short$ID, DT$ID), match(names(DT_short), names(DT))] <- DT_short
DT[match(DT_shorter$ID, DT$ID), match(names(DT_shorter), names(DT))] <- DT_shorter
DT
#   ID A  B    C
#1:  a 1  1 0.00
#2:  b 2  2 0.00
#3:  c 7 70 3.14
#4:  d 3  3 0.00

Here is one possible approach.这是一种可能的方法。 For each column in mycols , you want to assign values from DT_short .对于mycols每一列,您希望从DT_short分配值。 When you do that, you want to use match() and get indices, and use it to create a new vector.当你这样做时,你想使用match()并获取索引,并使用它来创建一个新的向量。 Once you create a new data.table, you want to replace NAs with 0.创建新的 data.table 后,您希望将 NA 替换为 0。

library(data.table)

mycols <- names(DT)[2:3]

as.data.table(lapply(mycols, function(x){
    DT_short[match(x = DT$ID, table = DT_short$ID), ..x]}))[,
      (mycols) := replace(x = .SD, list = is.na(.SD), values = 0),
      .SDcols = mycols][]

#   A B
#1: 1 1
#2: 2 2
#3: 0 0
#4: 3 3

Another option is to use an update join:另一种选择是使用更新连接:

cols <- setdiff(names(DT_short), "ID")
DT[DT_short, on=.(ID), (cols) := mget(paste0("i.", cols))]

Since you mentioned that you are ok with other solutions, this part is easy to do with base R data.frames by subsetting row and columns of smaller dataframes from the bigger ones and assigning the shorter dataframe.由于您提到您对其他解决方案没问题,因此通过从较大的数据帧中细分较小数据帧的行和列并分配较短的数据帧,这部分很容易使用基本 R data.frames 来完成。

df1 <- data.frame(DT)
df2 <- data.frame(DT_short)
df1[match(df2$ID, df1$ID), match(names(df2), names(df1))] <- df2

df1
#  ID A B
#1  a 1 1
#2  b 2 2
#3  c 0 0
#4  d 3 3

I don't think it is right to do the same with data.table but if we run the above code it works (atleast for the example shared)我认为对data.table做同样的事情是不对的,但是如果我们运行上面的代码它就可以工作(至少对于共享的示例)

DT[match(DT_short$ID, DT$ID), match(names(DT_short), names(DT))] <- DT_short

but it returns a big warning message which kind of confirms this isn't the right approach for data.tables.但它返回一个很大的警告消息,确认这不是 data.tables 的正确方法。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 链接多个 data.table::merge 操作与 data.tables - Chaining multiple data.table::merge operations with data.tables 快速合并两个data.tables - 并行化或data.table - Quickly merging two data.tables - parallelization or data.table 将两个data.tables合并到一个新的data.table - Merge two data.tables to a new data.table 根据 data.table 名称组合 data.tables 列表 - Combining list of data.tables based on data.table name 如何使用占位符data.table替换列表中的空dataframe / data.tables? - How do I replace empty dataframe/data.tables from a list with a placeholder data.table? r 在 data.table 中按组评估条件以过滤行,数据结构:data.tables 列表 - r Evaluate condition by group in data.table to filter rows, data structure: list of data.tables “ data.table :: set”仅在分配后不先设置set的情况下工作。 使用data.tables的data.tables - 'data.table::set' only works after assigning without set first. Working with data.table of data.tables 使用Data.table r在大型data.tables中聚合具有不同公式的多个列 - Aggregate Multiple Columns with Different Formula in large data.tables using Data.table r 将data.table转换为data.tables的行列表,并将函数应用于每一行 - Convert a data.table to list of rows that are data.tables and apply a function to each row 在两个data.tables中更改两个变量的编码,然后合并data.table - Change the coding of two variables in two data.tables and then merge data.table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM