简体   繁体   中英

How to use the apply family of functions to adjust values of a list by grouped index

Ok so I have an excel sheet with a variety of scenarios and values, and what I'd like to do is transform some of those values by using a random distribution. I'm able to do that one scenario at at time, but I'd like to be able to do it in a more compact way, possibly with the apply function family. Here is a small version of my matrix which I use as a data.table with setDT:

matrixfromexcel = 

Scenario char num1 num2 num3 val1 val2 val3

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

1        1    0    4    8    1.22 2.31 7.33

2        5    2    0    1    4.2  5.011 12.542

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542 

2        5    2    0    1    4.2  5.011 12.542

...

1200    66    8    1    0    555  120 1700

So as you can see, the scenario number separates the values into groups, and there is a large number of scenarios, up to 1000's+. Here is what I've used in order to add random numbers from a normally distributed function to the values of one column of one scenario:

matrixfromexcel[Scenario == 1, val1 := val1+rnorm(8, 1.22, 1)]

Where 8 is the number of different random numbers, 1.22 is the value I want the mean centered at, and 1 is the # of standard deviations I want in the random numbers.

So if I wanted to loop around from Scenario 1 to 1000, should I try an apply function or just try to use a loop? If apply function, could you show me your suggestion? Thank you

You can leverage the by argument in data.table and the special operator .N which refers to the # of rows within each group. Here's something to get you started:

library(data.table)
#> Warning: package 'data.table' was built under R version 3.4.4
dt <- data.table(Scenario = rep(c(1,2,3), times = c(8,5,3)), val1 = rep(c(1.22, 4.2, 6), times = c(8,5,3)))
dt[, new_val := val1 + rnorm(.N, val1, 1), keyby = Scenario]

Created on 2019-01-16 by the reprex package (v0.2.1)

For transparency, I created new_val versus overwriting val1 but you can modify that as you see fit. Also note, you currently passed in 1 to the sd parameter for rnorm() . If that's what you intended, great. If not, modify accordingly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM