在r中使用带有替换函数的data.table

Question

I came across the following problem today and I am wondering if there is a better way to accomplish what I am trying to do. 我今天遇到了以下问题，我想知道是否有更好的方法来完成我想要做的事情。

Let's suppose I have the following data.table (just an hourly timestamp): 假设我有以下data.table （只是每小时时间戳）：

library(data.table)
tdt <- data.table(Timestamp = seq(as.POSIXct("1980-01-01 00:00:00"), as.POSIXct("2015-01-01 00:00:00"), '1 hour'))

> tdt
                  Timestamp
     1: 1980-01-01 00:00:00
     2: 1980-01-01 01:00:00
     3: 1980-01-01 02:00:00
     4: 1980-01-01 03:00:00
     5: 1980-01-01 04:00:00
    ---                    
306813: 2014-12-31 20:00:00
306814: 2014-12-31 21:00:00
306815: 2014-12-31 22:00:00
306816: 2014-12-31 23:00:00
306817: 2015-01-01 00:00:00

My goal is to change the minutes of the timestamp to, say, 10 minutes. 我的目标是将时间戳的分钟更改为10分钟。

I know I can use: 我知道我可以用：

library(lubridate)
minute(tdt$Timestamp) <- 10

but this does not utilize the super fast speed of data table (which I need). 但这并没有利用数据表的超快速度（我需要）。 On my laptop this took: 我的笔记本电脑上有：

> system.time(minute(tdt$Timestamp) <- 10)
   user  system elapsed 
  11.29    0.16   11.45

So, my question is: Can we somehow use a replacement function in the data table syntax so that it will do what I want using data.table 's speed? 所以，我的问题是：我们可以在数据表语法中以某种方式使用替换函数，以便它可以使用data.table的速度执行我想要的操作吗？ If the answer is no, any other data.table solution to do this fast, would be acceptable. 如果答案是否定的，那么快速执行此操作的任何其他data.table解决方案都是可以接受的。

If you wonder one of the things I tried is: 如果你想知道我尝试的其中一件事是：

tdt[, Timestamp2 := minute(Timestamp) <- 10]

which does not work. 这不起作用。

Expected Output (but with data table syntax): 预期输出（但使用数据表语法）：

> tdt
                  Timestamp
     1: 1980-01-01 00:10:00
     2: 1980-01-01 01:10:00
     3: 1980-01-01 02:10:00
     4: 1980-01-01 03:10:00
     5: 1980-01-01 04:10:00
    ---                    
306813: 2014-12-31 20:10:00
306814: 2014-12-31 21:10:00
306815: 2014-12-31 22:10:00
306816: 2014-12-31 23:10:00
306817: 2015-01-01 00:10:00

Answer 1

A POSIXct object is just a double with some attributes POSIXct对象只是一个带有一些属性的double

storage.mode(as.POSIXct("1980-01-01 00:00:00"))
## [1] "double"

So in order to manipulate it efficiently you can just treat it as one, for instance 因此，为了有效地操纵它，您可以将其视为一个

tdt[, Timestamp := Timestamp + 600L]

Will add 600 seconds (10 minutes) to each row by reference 将通过引用向每行添加600秒（10分钟）

Some benchmarks 一些基准

tdt <- data.table(Timestamp = seq(as.POSIXct("1600-01-01 00:00:00"), 
                                  as.POSIXct("2015-01-01 00:00:00"), 
                                  '1 hour'))
system.time(minute(tdt$Timestamp) <- 10)
# user  system elapsed 
# 124.86    1.95  127.68 
system.time(set(tdt, j = 1L, value = `minute<-`(tdt$Timestamp, 10)))
# user  system elapsed 
# 124.99    1.83  128.25 
system.time(tdt[, Timestamp := Timestamp + dminutes(10)])
# user  system elapsed 
# 0.39    0.04    0.42 
system.time(tdt[, Timestamp := Timestamp + 600L])
# user  system elapsed 
# 0.01    0.00    0.01

Answer 2

Replacement functions are run in two steps: 替换功能分两步执行：

A function that creates the desired output, 一个创建所需输出的函数，
That output is then assigned to the result. 然后将该输出分配给结果。

You can run step 1 without running step 2 . 您可以在不运行第2步的情况下运行第1步。 That result can then be used to set the data.table column ( set used here but you could use := as well). 然后可以使用该结果来设置data.table列（此处使用set ，但您也可以使用:= ）。

library(lubridate)
library(data.table)
tdt <- data.table(Timestamp = seq(as.POSIXct("1980-01-01 00:00:00"), as.POSIXct("2015-01-01 00:00:00"), '1 hour'))
minute(tdt$Timestamp) <- 20
print( `minute<-`(tdt$Timestamp,11) )
set( tdt, j=1L,value=`minute<-`(tdt$Timestamp,11)  )

Edit: Small data.table vs. big data.table benchmarking 编辑：小数据。表格与大数据。基准测试

library(lubridate)
library(data.table)
library(microbenchmark)

# Config
tms <- 5L

# Sample data, 1 column
tdt <- data.table(Timestamp = seq(as.POSIXct("1980-01-01 00:00:00"), as.POSIXct("2015-01-01 00:00:00"), '1 hour'))
minute(tdt$Timestamp) <- 20

tdf <- as.data.frame( tdt )


# Sample data, lots of columns
bdf <- cbind( tdf, as.data.frame( replicate( 100, runif(nrow(tdt)) ) ) )
bdt <- as.data.table( bdf )

# Benchmark
microbenchmark(
  `minute<-`(tdt$Timestamp,10), # How long does the operation to generate the new vector itself take?
  set( tdt, j=1L,value=`minute<-`(tdt$Timestamp,11)  ), # One column: How long does it take to generate the new vector and replace the contents in the data.table?
  minute( tdf$Timestamp ) <- 12, # One column: How long does it take to do it with a data.frame?
  set( tdt, j=1L,value=`minute<-`(bdt$Timestamp,13)  ), # Many columns: How long does it take to generate the new vector and replace the contents in the data.table?
  minute( bdf$Timestamp ) <- 14, #  Many columns: How long does it take to do it with a data.frame?
  times = tms
)

Unit: seconds
                                                    expr      min       lq     mean   median       uq      max neval
                           `minute<-`(tdt$Timestamp, 10) 1.304388 1.385883 1.417616 1.389316 1.459166 1.549327     5
 set(tdt, j = 1L, value = `minute<-`(tdt$Timestamp, 11)) 1.314495 1.344277 1.376241 1.352124 1.389083 1.481225     5
                             minute(tdf$Timestamp) <- 12 1.342104 1.349231 1.488639 1.378840 1.380659 1.992358     5
 set(tdt, j = 1L, value = `minute<-`(bdt$Timestamp, 13)) 1.337944 1.383429 1.402802 1.418211 1.418922 1.455503     5
                             minute(bdf$Timestamp) <- 14 1.332482 1.333713 1.355331 1.335728 1.342607 1.432127     5

Looks like it is no faster, which belies my understanding of what is going on. 看起来它并不快，这掩盖了我对正在发生的事情的理解。 Strange. 奇怪。

Answer 3

I guess this should do the trick for you: 我想这应该适合你：

library(data.table)
library(lubridate)

tdt <- data.table(
  Timestamp = seq(as.POSIXct("1980-01-01 00:00:00")
  , as.POSIXct("2015-01-01 00:00:00")
  , '1 hour'))
tdt[, Timestamp := Timestamp + dminutes(10)]

在r中使用带有替换函数的data.table

问题描述

Expected Output (but with data table syntax): 预期输出（但使用数据表语法）：

3 个解决方案

解决方案1
11 2015-07-13 20:16:51

解决方案2
7 已采纳 2015-07-13 20:30:38

解决方案3
3 2015-07-13 20:40:31

在r中使用带有替换函数的data.table

问题描述

Expected Output (but with data table syntax): 预期输出（但使用数据表语法）：

3 个解决方案

解决方案1 11 2015-07-13 20:16:51

解决方案2 7 已采纳 2015-07-13 20:30:38

解决方案3 3 2015-07-13 20:40:31

解决方案1
11 2015-07-13 20:16:51

解决方案2
7 已采纳 2015-07-13 20:30:38

解决方案3
3 2015-07-13 20:40:31