R Loop for matched data.table columns

Question

I am really struggling to create a function that runs a model where all the variables a , b , d , g & N have multiple versions as shown in the data.table below which I've named crm :

crm = data.table(
  East = 26500,
  North = c(115000, 120000, 125000, 130000, 135000, 140000), 
  rain = c(1049.61, 1114.31, 1361.61, 1407.2, 1499.56, 1654.13), 
  crop = 'Wheat', area = c(0.1718, 0.1629, 0.1082, 0.0494, 0.02, 0.004), 
  rn = c("10007", "10018", "10023", "10024", "10025", "10026"), 
  N1 = 184.262648839489, N2 = 180.312874871521, N3 = 178.615847839997,
  N4 = 182.531626054579, a1 = 0.186117715072018, a2 = -0.0232731908915799,
  a3 = 0.227017532149122, a4 = 0.162943230565506, b1 = 0.000478900233700419,
  b2 = 0.000787931973696371, b3 = 0.000458478256537521, b4 = 0.000517304324750896,
  d1 = -0.000328164576390286, d2 = -0.000112122093240884, d3 = 0.000112702113716146,
  d4 = 7.40875908059628e-05, g1 = 4.04709473710477e-06, g2 = 3.68724096485995e-06,
  g3 = 3.47214450131546e-06, g4 = 3.55825543257538e-06, key = 'rn'
)

What I'm trying to do is run the function below to calculate a value for lnN and put it into a column that has the same number in the heading as the variables entered into the model. Ie Using a1 , b1 , d1 , g1 & N1 will produce the column lnN1 and so on for all the 2s, 3s and 4s.

n <- 1:4
cols <- paste0("lnN",n)
for(i in 1:length(n)){
crm[,(cols) := lapply(.SD ,function (x) {
  N = crm[,7+i]
  a = crm[,11+i]
  b = crm[,15+i]
  d = crm[,19+i]
  g = crm[,23+i]
a + (b*crm[,rain]) + (g*N) + (d*crm[,rain]*N)}), .SDcols = paste0("N",n)]

}

I've yet to find an example anywhere on how to accomplish this. I've tried using mapply but I can't see how to iterate mapply through all the iterations of each variable. Thank you for your help!

Answer 1

How about:

library(dplyr)
cbind(crm, do.call(cbind, 
  lapply(1:4, function(x) {
    select(crm, c(contains(as.character(x)), rain)) %>% 
      setnames(gsub("[0-9]", "", names(.))) %>%
      transmute(lnN = a + (b*rain) + (g*N) + (d*rain*N)) %>%
      setnames(paste0("lnN", x))
  })
))

The main idea is, for each number, select only the columns that contain the number (and also rain ), rename the columns to remove the numbers, apply the formula, rename the resulting column to append the number, and then cbind the result onto the original table.

Answer 2

So having gone and looked at the comments above and realized there could be issues when trying to change the number of iterations into the thousands as both Frank and Weihuang suggested I reconsidered how I structured my data.

What I did instead was to leave the randomly generated matrices of variables as separate dataframes. e contains multivariate random values for a , b , g & d with N (now called Nit ) in rnm . So crm now only has the first six columns. The code looks like this:

for(i in 1:n){
  a = e[1,i]
  b = e[2,i]
  g = e[3,i]
  d = e[4,i]
Nit = rnm[i]
bob = a + (b*crm$rain) + (g*Nit) + (d*crm$rain*Nit)
data.y <- cbind(data.y, bob)
}
crm <- cbind(crm, data.y)
names(crm)[c(7:n)] = names(bobs)

For each iteration of 1:n, it reads the i value of each parameter (all the 1s, all the 2s, etc.) and places it into the model and creates a column called bob . bob is then merged into an empty data frame that I created before the function ( data.y ). This loops until the desired number of loops is achieved.

I then used cbind to merge the two together before renaming all the bob columns sequentially by using the names stored in a data frame bobs that contains a list of column headings numbered bob.1 to bob.n that is read in from a .csv file I produced in Excel.

Answer 3

Here's a melt -and- dcast ( recast ) version leveraging the newly implemented functionality of melt for patterns to utilize provided names. See the Installation wiki for instructions as this is currently a feature in development.

library(data.table) 1.10.5+
# create character version of 1:N (Number of output columns)
N = paste0(seq_len(length(grep('^b', names(crm)))))
# join crm to a melt & recast version of itself using rain as 
#   the join key (note this will fail if the amount of rain may
#   not be unique -- in this case, we should include some ID in
#   id.vars, like rn, and adjust accordingly)
crm = crm[crm[ , melt(.SD, id.vars = 'rain', 
                      measure.vars = patterns(N = '^N[0-9]', a = '^a[0-9]', 
                                              b = '^b', d = '^d', g = '^g'))
               # use the formula to generate ln
               ][ , ln := a + b*rain + g * N + d * rain * N
                  # reshape wide
                  ][ , dcast(.SD, rain ~ variable, value.var = 'ln')
                     # rename the columns here
                     ][ , setnames(.SD, N, paste0('ln', N))],
          on = 'rain']

# by-reference version
crm[crm[ , melt(.SD, id.vars = 'rain', 
                measure.vars = patterns(N = '^N[0-9]', a = '^a[0-9]', 
                                        b = '^b', d = '^d', g = '^g'))
         # use the formula to generate ln
         ][ , ln := a + b*rain + g * N + d * rain * N
            # reshape wide
            ][ , dcast(.SD, rain ~ variable, value.var = 'ln')],
    # mget tends to be sort of slow, which is why I used the
    #   assign-by-copy approach first above; in larger examples,
    #   this slow-down may be outweighed by the cost of copying
    paste0('ln', N) := mget(N), on = 'rain']

R Loop for matched data.table columns

Question

3 answers

solution1
2 ACCPTED 2016-08-25 14:22:16

solution2
0 2016-08-26 13:08:56

solution3
0 2018-01-18 01:23:57

R Loop for matched data.table columns

Question

3 answers

solution1 2 ACCPTED 2016-08-25 14:22:16

solution2 0 2016-08-26 13:08:56

solution3 0 2018-01-18 01:23:57

solution1
2 ACCPTED 2016-08-25 14:22:16

solution2
0 2016-08-26 13:08:56

solution3
0 2018-01-18 01:23:57