简体   繁体   中英

data.table in R: creating variables from x into i

I have two data tables,

a <- data.table(id=c(1,2,1,2,1,2), time=as.POSIXct(c("2018-01-01 01:10:00","2018-01-01 01:10:00","2018-01-01 01:11:00","2018-01-01 01:11:00","2018-01-01 01:12:00","2018-01-01 01:12:00")), beg=as.POSIXct(c("2018-01-01 01:00:00","2018-01-01 01:05:00","2018-01-01 01:00:00","2018-01-01 01:05:00","2018-01-01 01:01:00","2018-01-01 01:05:00")), end=as.POSIXct(c("2018-01-01 02:00:00","2018-01-01 02:05:00","2018-01-01 02:00:00","2018-01-01 02:05:00","2018-01-01 02:00:00","2018-01-01 02:05:00")))

> a
   id                time                 beg                 end
1:  1 2018-01-01 01:10:00 2018-01-01 01:00:00 2018-01-01 02:00:00
2:  2 2018-01-01 01:10:00 2018-01-01 01:05:00 2018-01-01 02:05:00
3:  1 2018-01-01 01:11:00 2018-01-01 01:00:00 2018-01-01 02:00:00
4:  2 2018-01-01 01:11:00 2018-01-01 01:05:00 2018-01-01 02:05:00
5:  1 2018-01-01 01:12:00 2018-01-01 01:01:00 2018-01-01 02:00:00
6:  2 2018-01-01 01:12:00 2018-01-01 01:05:00 2018-01-01 02:05:00

which has 650m lines by 4 columns, and

b <- data.table(id=c(1,2), abeg=as.POSIXct(c("2018-01-01 01:10:00","2018-01-01 01:11:00")), aend=as.POSIXct(c("2018-01-01 01:11:00","2018-01-01 01:12:00")))

> b
   id                abeg                aend
1:  1 2018-01-01 01:10:00 2018-01-01 01:11:00
2:  2 2018-01-01 01:11:00 2018-01-01 01:12:00

which has about 13m lines by 7 columns.

I would like to join b into a but keep all lines and columns of a . I understand that this is a left-join and would execute it as

b[a, .(id=i.id, time=i.time, beg=i.beg, end=i.end, abeg=x.abeg, aend=x.aend), on=.(id=id, abeg<=time, aend>=time)]

to obtain

   id                time                 beg                 end                abeg                aend
1:  1 2018-01-01 01:10:00 2018-01-01 01:00:00 2018-01-01 02:00:00 2018-01-01 01:10:00 2018-01-01 01:11:00
2:  2 2018-01-01 01:10:00 2018-01-01 01:05:00 2018-01-01 02:05:00                <NA>                <NA>
3:  1 2018-01-01 01:11:00 2018-01-01 01:00:00 2018-01-01 02:00:00 2018-01-01 01:10:00 2018-01-01 01:11:00
4:  2 2018-01-01 01:11:00 2018-01-01 01:05:00 2018-01-01 02:05:00 2018-01-01 01:11:00 2018-01-01 01:12:00
5:  1 2018-01-01 01:12:00 2018-01-01 01:01:00 2018-01-01 02:00:00                <NA>                <NA>
6:  2 2018-01-01 01:12:00 2018-01-01 01:05:00 2018-01-01 02:05:00 2018-01-01 01:11:00 2018-01-01 01:12:00

However, executing this on a Mac takes longer than 7 hours, when I had to abort. I joined on a 50m-rows-subset of a and this took about 8 minutes. I would like to avoid a loop over subsets, so I wonder whether I can make it more efficient.

For example, I suspect the assign command := can be used somehow. In data.table join then add columns to existing data.frame without re-copy it is explained how this can be done when all variables in b are kept and amended by variables from a . However, I seem to have the reverse case: I want to keep all columns in a and amend it by columns from b .

Here's an join with update by reference which I think does what you intend to do:

a[b, on=.(id=id, time>=abeg, time<=aend), `:=`(abeg = i.abeg, aend = i.aend)]

The resulting a is then:

   id                time                 beg                 end                abeg                aend
1:  1 2018-01-01 01:10:00 2018-01-01 01:00:00 2018-01-01 02:00:00 2018-01-01 01:10:00 2018-01-01 01:11:00
2:  2 2018-01-01 01:10:00 2018-01-01 01:05:00 2018-01-01 02:05:00                <NA>                <NA>
3:  1 2018-01-01 01:11:00 2018-01-01 01:00:00 2018-01-01 02:00:00 2018-01-01 01:10:00 2018-01-01 01:11:00
4:  2 2018-01-01 01:11:00 2018-01-01 01:05:00 2018-01-01 02:05:00 2018-01-01 01:11:00 2018-01-01 01:12:00
5:  1 2018-01-01 01:12:00 2018-01-01 01:01:00 2018-01-01 02:00:00                <NA>                <NA>
6:  2 2018-01-01 01:12:00 2018-01-01 01:05:00 2018-01-01 02:05:00 2018-01-01 01:11:00 2018-01-01 01:12:00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM