简体   繁体   中英

Calculations in a column which depends on previous observations in R

Is there any way I can create a variable which depends on its earlier observation in R?

In the below example, the column 'asset' should be depreciated by 1,67% (20%/12) per time period (t)

The first 'end result' is = 'asset - depr'

The problem I face is that from row 2, the value in 'end-result' must depend on the 'end-result' in the previous time period.

From t=2 and onwards, depr. is found as 'end-result, t-1' * 1,67%

Thank you!

(This is how the final product should look) - also sorted on PERMNO (ID)

t   id  asset   depr.   end_result
1   10010   45145   752     44393
2   10010   45145   740     43653
3   10010   45145   728     42925
4   10010   96730   1575    92935
5   10010   96730   1549    91386
6   10010   96730   1523    89863
7   10010   145511  2311    136333
8   10010   145511  2272    134061
9   10010   145511  2234    131827
10  10010   190986  2955    174347
11  10010   190986  2906    171441
12  10010   190986  2857    168584
1   10020   20050   334     19716
2   10020   20050   329     19387
3   10020   20050   323     19064
4   10020   50411   824     48601
5   10020   50411   810     47791
6   10020   50411   797     46995
7   10020   120154  1946    114792
8   10020   120154  1913    112879
9   10020   120154  1881    110998
10  10020   173575  2740    161678
11  10020   173575  2695    158984
12  10020   173575  2650    156334
        

taking this sample data

df <- structure(list(t = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 
11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), 
    id = c(10010L, 10010L, 10010L, 10010L, 10010L, 10010L, 10010L, 
    10010L, 10010L, 10010L, 10010L, 10010L, 10020L, 10020L, 10020L, 
    10020L, 10020L, 10020L, 10020L, 10020L, 10020L, 10020L, 10020L, 
    10020L), asset = c(45145L, 45145L, 45145L, 96730L, 96730L, 
    96730L, 145511L, 145511L, 145511L, 190986L, 190986L, 190986L, 
    20050L, 20050L, 20050L, 50411L, 50411L, 50411L, 120154L, 
    120154L, 120154L, 173575L, 173575L, 173575L)), class = "data.frame", row.names = c(NA, 
-24L))

> df
    t    id  asset
1   1 10010  45145
2   2 10010  45145
3   3 10010  45145
4   4 10010  96730
5   5 10010  96730
6   6 10010  96730
7   7 10010 145511
8   8 10010 145511
9   9 10010 145511
10 10 10010 190986
11 11 10010 190986
12 12 10010 190986
13  1 10020  20050
14  2 10020  20050
15  3 10020  20050
16  4 10020  50411
17  5 10020  50411
18  6 10020  50411
19  7 10020 120154
20  8 10020 120154
21  9 10020 120154
22 10 10020 173575
23 11 10020 173575
24 12 10020 173575

Approach-1 Base R method

  • First calculate the additions during the year by using ave() in base R. Actually, your asset value column is like a cumulative value of asset and it has to be reduced like only additions during the month. (step-1)
  • To apply your iterative logic, I suggest use of baseR function Reduce() with agrument accumulate = TRUE . This will help in generation of your residual value column.(step-2)
  • Thereafter, generate depreciation value column. This will be cumulative depreciation. Therefore reduce it to monthly depreciation only. (step-3)
  • You can delete the column generated in step-1 lastly (which is optional)
#1 calculation of asset_addition column
df$asset_addition <- with(df, ave(asset, id, FUN = function(x){c(x[1], diff(x))}))

#2 desired iteration
df$end_res <- with(df, 
                   ave(replace(asset_addition, 
                               t==1, 
                               asset_addition[t==1]*11.8/12), 
                       id, 
                       FUN = function(z){Reduce(function(x, y){(x+y)*11.8/12}, 
                                                z, 
                                                accumulate = T)}))

#3 depr. column calculation
df$depr. <- with(df, ave(asset - end_res, id, FUN = function(x){c(x[1], diff(x))}))

Check the results

df
    t    id  asset asset_addition   end_res     depr.
1   1 10010  45145          45145  44392.58  752.4167
2   2 10010  45145              0  43652.71  739.8764
3   3 10010  45145              0  42925.16  727.5451
4   4 10010  96730          51585  92934.99 1575.1694
5   5 10010  96730              0  91386.08 1548.9165
6   6 10010  96730              0  89862.97 1523.1013
7   7 10010 145511          48781 136333.24 2310.7329
8   8 10010 145511              0 134061.02 2272.2207
9   9 10010 145511              0 131826.67 2234.3504
10 10 10010 190986          45475 174346.64 2955.0278
11 11 10010 190986              0 171440.87 2905.7774
12 12 10010 190986              0 168583.52 2857.3478
13  1 10020  20050          20050  19715.83  334.1667
14  2 10020  20050              0  19387.24  328.5972
15  3 10020  20050              0  19064.12  323.1206
16  4 10020  50411          30361  48601.36  823.7519
17  5 10020  50411              0  47791.34  810.0227
18  6 10020  50411              0  46994.82  796.5223
19  7 10020 120154          69743 114792.19 1945.6303
20  8 10020 120154              0 112878.99 1913.2031
21  9 10020 120154              0 110997.67 1881.3164
22 10 10020 173575          53421 161678.36 2740.3111
23 11 10020 173575              0 158983.72 2694.6393
24 12 10020 173575              0 156333.99 2649.7286

Approach-2 dplyr approach.

  • This one is slightly different approach. Generated cumulative depreciation percentages and used these for generation of residual column. However, any value in asset, added if any, during the year, is calculated earlier.
library(dplyr)
df %>% group_by(id) %>%
  mutate(end_value = cumsum((asset - lag(asset, default = 0))/((1-.2/12)^(t-1))) * (1-.2/12)^t,
         depr. = asset - end_value,
         depr. = c(depr.[1], diff(depr.)))
# A tibble: 24 x 5
# Groups:   id [2]
       t    id  asset end_value depr.
   <int> <int>  <int>     <dbl> <dbl>
 1     1 10010  45145    44393.  752.
 2     2 10010  45145    43653.  740.
 3     3 10010  45145    42925.  728.
 4     4 10010  96730    92935. 1575.
 5     5 10010  96730    91386. 1549.
 6     6 10010  96730    89863. 1523.
 7     7 10010 145511   136333. 2311.
 8     8 10010 145511   134061. 2272.
 9     9 10010 145511   131827. 2234.
10    10 10010 190986   174347. 2955.
# ... with 14 more rows

Which is exactly same as your desired output

Approach-3 using purrr::accumulate()

#approach-3 similar to approach-1 but using dplyr verbs & purrr::accumulate in one single pipe
library(tidyverse)

df %>% group_by(id) %>%
  mutate(asset_add = c(0, diff(asset)),
         end_res = accumulate(asset_add, ~ (.x + .y)*(11.8/12), .init = asset[1])[-1],
         depr. = asset - end_res,
         depr. = c(depr.[1], diff(depr.))) %>%
  ungroup() %>%
  select(-asset_add)

# A tibble: 24 x 5
       t    id  asset end_res depr.
   <int> <int>  <int>   <dbl> <dbl>
 1     1 10010  45145  44393.  752.
 2     2 10010  45145  43653.  740.
 3     3 10010  45145  42925.  728.
 4     4 10010  96730  92935. 1575.
 5     5 10010  96730  91386. 1549.
 6     6 10010  96730  89863. 1523.
 7     7 10010 145511 136333. 2311.
 8     8 10010 145511 134061. 2272.
 9     9 10010 145511 131827. 2234.
10    10 10010 190986 174347. 2955.
# ... with 14 more rows

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM