简体   繁体   English

在函数中连接data.tables

[英]Joining data.tables within a function

I would like to change a data.table by doing a join within a function. 我想通过在函数中进行连接来更改data.table。 I understand that data.tables work by reference, so assumed that reassigning a joined version of a data.table to itself would change the original data.table. 我理解data.tables通过引用工作,因此假设将data.table的连接版本重新分配给自身将改变原始data.table。 What simple thing have I misunderstood? 我误解了什么简单的事情?

Thanks! 谢谢!

library('data.table')

# function to restrict DT to subset, by join
join_test <- function(DT) {
    test_dt     = data.table(a = c('a', 'b'), c = c('x', 'y'))
    setkey(test_dt, 'a')
    setkey(DT, 'a')

    DT  <- DT[test_dt]
}

DT  = data.table(a = c("a","b","c"), b = 1:3)
print(DT)
#    a b
# 1: a 1
# 2: b 2
# 3: c 3
haskey(DT)
# [1] FALSE

join_test(DT)
print(DT)
#    a b
# 1: a 1
# 2: b 2
# 3: c 3
haskey(DT)
# [1] TRUE

(haskey calls included just to double-check that some of the by reference changes work) (包括haskey调用只是为了仔细检查一些引用更改是否有效)

You can do it by reference, (since you can join and assign columns by reference based on the joined values, without actually saving the joined table back). 您可以通过引用来完成它(因为您可以根据连接的值通过引用连接和分配列,而无需实际保存连接的表)。 However, you need to explicitly pick the columns you're after 但是,您需要明确选择您所追求的列

join_test <- function(DT) {
    test_dt     = data.table(a = c('a', 'b'), c = c('x', 'y'))
    DT[test_dt, c := c, on = 'a'] 
}

Having your function return the data table and storing the result in DT will get you what you want. 让函数返回数据表并将结果存储在DT中将获得您想要的结果。

join_test <- function(DT) {
  test_dt     = data.table(a = c('a', 'b'), c = c('x', 'y'))
  setkey(test_dt, 'a')
  setkey(DT, 'a')

  DT  <- DT[test_dt]

  return(DT)
}

DT  = data.table(a = c("a","b","c"), b = 1:3)

DT <- join_test(DT)
print(DT)
#    a b c
# 1: a 1 x
# 2: b 2 y

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM