I have an R
data.table
. I'd like to copy or carry-forward the value from one column and row to a different column and row based on a key
. I've done this with merge
and rbindlist
, but I'm wondering if there is a more parsimonious solution.
Here's a toy example: Let's say I'm tracking opening and closing balances for different checking accounts on different dates. I want to carry-forward the previous day's ( t-1
) closing balance to current day's ( t
) opening balance.
Have this input:
> DT_in <- data.table(date = c("2017-12-29", "2017-12-29", "2017-12-29", "2018-01-02",
+ "2018-01-02", "2018-01-02", "2018-01-02", "2018-01-03",
+ "2018-01-03", "2018-01-03", "2018-01-03"),
+ account_id = c("A17", "A23", "B21", "A17", "A23", "B21", "C12",
+ "A17", "A23", "B21", "C12"),
+ opening_balance = NA,
+ ending_balance = c(224, 254, 240, 290, 107, 272, 105, 256, 215,
+ 202, 238)
+ )
> DT_in
date account_id opening_balance ending_balance
1: 2017-12-29 A17 NA 224
2: 2017-12-29 A23 NA 254
3: 2017-12-29 B21 NA 240
4: 2018-01-02 A17 NA 290
5: 2018-01-02 A23 NA 107
6: 2018-01-02 B21 NA 272
7: 2018-01-02 C12 NA 105
8: 2018-01-03 A17 NA 256
9: 2018-01-03 A23 NA 215
10: 2018-01-03 B21 NA 202
11: 2018-01-03 C12 NA 238
Want this output:
> DT_out <- data.table(date = c("2017-12-29", "2017-12-29", "2017-12-29", "2018-01-02",
+ "2018-01-02", "2018-01-02", "2018-01-02", "2018-01-03",
+ "2018-01-03", "2018-01-03", "2018-01-03"),
+ account_id = c("A17", "A23", "B21", "A17", "A23", "B21", "C12",
+ "A17", "A23", "B21", "C12"),
+ opening_balance = c(NA, NA, NA, 224, 254, 240, NA, 290, 107, 272, 105),
+ ending_balance = c(224, 254, 240, 290, 107, 272, 105, 256, 215,
+ 202, 238)
+ )
> DT_out
date account_id opening_balance ending_balance
1: 2017-12-29 A17 NA 224
2: 2017-12-29 A23 NA 254
3: 2017-12-29 B21 NA 240
4: 2018-01-02 A17 224 290
5: 2018-01-02 A23 254 107
6: 2018-01-02 B21 240 272
7: 2018-01-02 C12 NA 105
8: 2018-01-03 A17 290 256
9: 2018-01-03 A23 107 215
10: 2018-01-03 B21 272 202
11: 2018-01-03 C12 105 238
Note that account_id
s are not necessarily persistent from one date to the next (new ones can added or old ones removed). While it should not matter, please also note that the dates are business dates and not calendar dates.
If you fix your example data to use the same data type of "opening_balance" and "ending_balance" columns via
...
opening_balance = NA_real_,
...
you can use this code to carry the last observation forward per group (after applying the right order/sorting of the rows!):
setorder(DT_in, account_id, date)
DT_in[, opening_balance := shift(ending_balance, 1), by = .(account_id)]
This results in (sorted output):
> DT_in
date account_id opening_balance ending_balance
1: 2017-12-29 A17 NA 224
2: 2018-01-02 A17 224 290
3: 2018-01-03 A17 290 256
4: 2017-12-29 A23 NA 254
5: 2018-01-02 A23 254 107
6: 2018-01-03 A23 107 215
7: 2017-12-29 B21 NA 240
8: 2018-01-02 B21 240 272
9: 2018-01-03 B21 272 202
10: 2018-01-02 C12 NA 105
11: 2018-01-03 C12 105 238
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.