简体   繁体   中英

R: Calculate differences between rows in data.table

RProf revealed, that the following operation I perform is rather slow:

stockHistory[.(p), stock:=stockHistory[.(p), stock] - (backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog])]

I suppose this is because of the subtraction

backorderedDemands[.(p-1),backlog] - backorderedDemands[.(p),backlog]

Is there any way to speed up this operation?

.(p) subsets the data.table for a period p, .(p-1) subsets the previous period (see example data below). Would it maybe be faster to apply some kind diff() here? I do not know how to do this, though.

Example data:

backorderedDemands<-CJ(period=1:1000, articleID=letters[1:10], backlog=0)[,backlog:=round(runif(10000)*42,0)]
setkey(backorderedDemands,period, articleID)
stockHistory<-CJ(period=1:1000, articleID=letters[1:10], stock=0)[,stock:=round(runif(10000)*42+66,0)]
setkey(stockHistory,period, articleID)

You can first calculate a difference column in backorderedDemands .

backorderedDemands[, diff := c(NA, -diff(backlog)), by=articleID]

Also it is not necessary to use stockHistory[.(p), stock] . It's enough to just use stock .

stockHistoryNew[.(p), stock:=stock - backorderedDemands[.(p), diff]]

If you want to compute first differences of your data, you can do it like below. It is fast...I included step by step computation.

library(data.table)
library(dplyr)

Data

set.seed(1)

backorderedDemands <- 
    CJ(period = 1:1000, 
       articleID = letters[1:10], 
       backlog = 0)[,backlog:= round(runif(10000) * 42, 0)]

stockHistory <- 
    CJ(period = 1:1000, 
       articleID = letters[1:10], 
       stock = 0)[, stock:= round(runif(10000) * 42 + 66, 0)]

Solution

    merge(stockHistory, backorderedDemands, 
      by = c("period", "articleID")) %>% 
    group_by(articleID) %>%
    mutate(lag_backlog = lag(backlog, 1),
           my_backlog_diff = backlog - lag_backlog,
           my_diff = stock + my_backlog_diff) %>% 
    as.data.frame(.) %>% 
    head(., 20)

   period articleID stock backlog lag_backlog my_backlog_diff my_diff
1       1         a    69      11          NA              NA      NA
2       1         b    94      16          NA              NA      NA
3       1         c    97      24          NA              NA      NA
4       1         d    71      38          NA              NA      NA
5       1         e    68       8          NA              NA      NA
6       1         f    71      38          NA              NA      NA
7       1         g   103      40          NA              NA      NA
8       1         h   101      28          NA              NA      NA
9       1         i   102      26          NA              NA      NA
10      1         j    67       3          NA              NA      NA
11      2         a    71       9          11              -2      69
12      2         b    89       7          16              -9      80
13      2         c    71      29          24               5      76
14      2         d    96      16          38             -22      74
15      2         e    96      32           8              24     120
16      2         f    99      21          38             -17      82
17      2         g    92      30          40             -10      82
18      2         h    87      42          28              14     101
19      2         i    85      16          26             -10      75
20      2         j    67      33           3              30      97

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM