简体   繁体   English

为什么 data.table 通过引用更新名称(DT),即使我分配给另一个变量?

[英]Why does data.table update names(DT) by reference, even if I assign to another variable?

I've stored the names of a data.table as a vector :我已将data.table的名称存储为vector

library(data.table)
set.seed(42)
DT <- data.table(x = runif(100), y = runif(100))
names1 <- names(DT)

As far as I can tell, it's a plain vanilla character vector:据我所知,它是一个普通的香草字符向量:

str(names1)
# chr [1:2] "x" "y"

class(names1)
# [1] "character"

dput(names1)
# c("x", "y")

However, this is no ordinary character vector.然而,这不是普通的字符向量。 It's a magic character vector!这是一个神奇的字符向量! When I add a new column to my data.table , this vector gets updated!当我向data.table添加新列时,此向量会更新!

DT[ , z := runif(100)]
names1
# [1] "x" "y" "z"

I know this has something to do with how := updates by assignment, but this still seems magic to me, as I expect <- to make a copy of the data.table 's names.我知道这与:=通过赋值更新的方式有关,但这对我来说仍然很神奇,因为我希望<-复制data.table的名称。

I can fix this by wrapping the names in c() :我可以通过将名称包装在c()来解决此问题:

library(data.table)
set.seed(42)
DT <- data.table(x = runif(100), y = runif(100))

names1 <- names(DT)
names2 <- c(names(DT))
all.equal(names1, names2)
# [1] TRUE

DT[ , z := runif(100)]
names1
# [1] "x" "y" "z"

names2
# [1] "x" "y"

My question is 2-fold:我的问题有两个:

  1. Why doesn't names1 <- names(DT) create a copy of the data.table 's names?为什么data.table names1 <- names(DT)创建data.table名称的副本? In other instances, we are explicitly warned that <- creates copies, both of data.table s and data.frame s.在其他情况下,我们明确警告<-会创建data.tabledata.frame的副本。
  2. What's the difference between names1 <- names(DT) and names2 <- c(names(DT)) ? names2 <- c(names(DT)) names1 <- names(DT)names2 <- c(names(DT))之间有什么区别?

Update: This is now added in the documentation for ?copy in version 1.9.3.更新:现在在 1.9.3 版的?copy文档中添加了这个。 From NEWS :来自 新闻

  1. Moved ?copy to it's own help page, and documented that dt_names <- copy(names(DT)) is necessary for dt_names to be not modified by reference as a result of updating DT by reference (ex: adding a new column by reference).?copy移动到它自己的帮助页面,并记录了dt_names <- copy(names(DT))对于dt_names由于通过引用更新DT而不会通过引用修改是必要的(例如:通过引用添加新列) . Closes #512 .关闭#512 Thanks to Zach for this SO question and user1971988 for this SO question .感谢 Zach 提出这个 SO 问题和 user1971988这个 SO 问题

Part of your first question makes it a bit unclear to me as to what you really mean about <- operator (at least in the context of data.table ), especially the part: In other instances, we are explicitly warned that <- creates copies, both of data.tables and data.frames.您的第一个问题的一部分让有点不清楚您对<-运算符的真正含义(至少在data.table的上下文中),尤其是部分:在其他情况下,我们明确警告 <- 创建data.tables 和 data.frames 的副本。

So, before answering your actual question, I'll briefly touch it here.因此,在回答您的实际问题之前,我将在这里简要介绍一下。 In case of a data.table a <- (assignment) merely is not sufficient for copying a data.table .data.table的情况下, <- (赋值)仅不足以复制data.table For example:例如:

DT <- data.table(x = 1:5, y= 6:10)
# assign DT2 to DT
DT2 <- DT # assign by reference, no copy taken.
DT2[, z := 11:15]
# DT will also have the z column

If you want to create a copy , then you've to explicitly mention it using copy command.如果要创建copy ,则必须使用copy命令明确提及它。

DT2 <- copy(DT) # copied content to DT2
DT2[, z := 11:15] # only DT2 is affected

From CauchyDistributedRV, I understand what you mean is the assignment names(dt) <- .从CauchyDistributedRV,我明白你的意思是赋值names(dt) <- . that'll result in the warning.这将导致警告。 I'll leave it as such.我就这样吧。


Now, to answer your first question: It seems that names1 <- names(DT) also behaves similarly.现在,回答您的第一个问题: names1 <- names(DT)行为似乎也类似。 I hadn't thought/known about this until now.直到现在我才想到/知道这一点。 The .Internal(inspect(.)) command is very useful here: .Internal(inspect(.))命令在这里非常有用:

.Internal(inspect(names1))
# @7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)] (len=2, tl=100)
#   @7fc86a069f68 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
#   @7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"

.Internal(inspect(names(DT)))
# @7fc86a851480 16 STRSXP g0c7 [MARK,NAM(2)] (len=2, tl=100)
#   @7fc86a069f68 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "x"
#   @7fc86a0f96d8 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "y"

Here, you see that they are pointing to the same memory location @7fc86a851480 .在这里,您会看到它们指向相同的内存位置@7fc86a851480 Even the truelength of names1 is 100 (which is by default allocated in data.table , check ?alloc.col for this).即使是truelengthnames1是100(默认情况下是在分配data.table ,支票?alloc.col了这一点)。

truelength(names1)
# [1] 100

So basically, the assignment names1 <- names(dt) seems to happen by reference.所以基本上,赋值names1 <- names(dt)似乎是通过引用发生的。 That is, names1 is pointing to the same location as dt's column names pointer.也就是说, names1指向与 dt 的列名指针相同的位置。

To answer your second question: The command c(.) seems to create a copy as there is no checking as to whether the contents result due to concatenation operation are different .回答你的第二个问题:命令c(.)似乎创建了一个副本,因为没有检查由于连接操作导致的内容结果是否不同 That is, because c(.) operation can change the contents of the vector, it immediately results in a "copy" being made without checking if the contents are modified are not.也就是说,因为c(.)操作可以改变向量的内容,它会立即导致“复制”而不检查内容是否被修改。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM