简体   繁体   中英

Year over base year growth in R using diff function, restarting every 4th year

I am trying to calculate "year/base year" growth by columns .

I use a matrix M (in the reproducible code below) as my dataset. Every column of the matrix contains data from a year, ie, column 1 means year1, and so on.

My base year needs to restart every 4 years .

If I were to calculate a growth rate using diff function ( (column2/column1) - 1 , then (column3/column2) - 1 ) , I would write:

     estimate_r <- function(M, period = NULL, bycols = TRUE,
                           rm_cols = TRUE, only_dif = FALSE) {
  
  if (is.null(period)) period <- max(dim(M)[2] * bycols, dim(M)[1] * !bycols)
  
  if (dim(M)[2] %% period != 0 & bycols){
    return(cat("Matrix columns are not divisible by ", period))
  }
  


#this part needs to be modified-----------

  # by columns
  if (bycols) {
    d_M <- t(diff(t(M)))
    l <- dim(M)[2]
    

    pop <- seq(period, l-1, period)
    difs <- d_M[,-l]
    if (only_dif) {
      difs[,pop] <- NA
      return(cbind(NA, difs))
    } 
    lag_M <- M[,-l]
    r <- difs / lag_M
    colnames(r) <- colnames(difs)
    r[,pop] <- NA
    r <- cbind(NA, r)
    return(r)
  }
  
   #-----------

  # by rows
  d_M <- diff(M)
  d_M <- t(diff(t(M)))
  l <- dim(M)[1]
  

  # NAs instead of removing rows
  pop <- seq(period, l, period)
  difs <- d_M[-l,]
  if (only_dif) {
    difs[pop,] <- NA
    return(rbind(NA, difs))
  } 
  lag_M <- M[-l,]
  r <- difs / lag_M
  colnames(r) <- colnames(difs)
  r[pop,] <- NA
  r <- rbind(NA, r)
  return(r)
  
}


# ---- reproducible matrices ----
M <- matrix(1:27, ncol = 9)
M
estimate_r(M, period = 3, bycols = TRUE, rm_cols = FALSE, only_dif = FALSE)

This gives me the matrix M (each column represents a year):

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]    1    4    7   10   13   16   19   22   25
[2,]    2    5    8   11   14   17   20   23   26
[3,]    3    6    9   12   15   18   21   24   27

and the other one, with the growth from the previous year (r_t / r_{t-1} - 1), by columns, restarting every 4th year:

     [,1] [,2] [,3] [,4]      [,5]      [,6] [,7]      [,8]      [,9]
[1,]   NA  3.0 0.75   NA 0.3000000 0.2307692   NA 0.1578947 0.1363636
[2,]   NA  1.5 0.60   NA 0.2727273 0.2142857   NA 0.1500000 0.1304348
[3,]   NA  1.0 0.50   NA 0.2500000 0.2000000   NA 0.1428571 0.1250000

But I am unsure whether it is possible to make a variable lag using "diff", choosing 1 or 2 years, depending how far I am from the base year.

I need to do "year 2 over 1 (column 2 over 1)" growth, and then "year 3 over 1". Then restart the function in the year 4 and do the same, "year 5 over 4", etc.

My base years (columns) will be 1, 4 and 7, therefore.

Any help is much appreciated.

How about using data.table::shift(), which takes a variable n value for the extent of the lag. Wrap the code in a function like this:

library(data.table)

get_n_prev_yr_growth <- function(M, yr=1) {
  d = as.data.table(t(M))[,g:=rep(1:(ncol(M)/3),each=3)]
  t(d[, lapply(.SD, \(i) i/shift(i, yr) - 1), g, .SDcols = patterns("^V")][,!c("g")])
}

Then just call the function, providing the input matrix and the lag value (default is 1)

get_n_prev_yr_growth(M=M,yr=1)

   [,1] [,2] [,3] [,4]      [,5]      [,6] [,7]      [,8]      [,9]
V1   NA  3.0 0.75   NA 0.3000000 0.2307692   NA 0.1578947 0.1363636
V2   NA  1.5 0.60   NA 0.2727273 0.2142857   NA 0.1500000 0.1304348
V3   NA  1.0 0.50   NA 0.2500000 0.2000000   NA 0.1428571 0.1250000

Or for two years

get_n_prev_yr_growth(M=M,yr=2)

   [,1] [,2] [,3] [,4] [,5]      [,6] [,7] [,8]      [,9]
V1   NA   NA    6   NA   NA 0.6000000   NA   NA 0.3157895
V2   NA   NA    3   NA   NA 0.5454545   NA   NA 0.3000000
V3   NA   NA    2   NA   NA 0.5000000   NA   NA 0.2857143

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM