简体   繁体   中英

How to apply the function to each row?

I want to generate 4 new columns from an existing variable total by random sampling. the results for each row should meet the condition s1 + s2 + s3 + s4 == total . Fro example,

> tabulate(sample.int(4, 100, replace = TRUE))
[1] 22 21 27 30

The following code does not work since the function appears to recycle the first row and applies it column-wise.

 DT <- data.table(total = c(100, 110, 90, 92))
 DT[, c(paste0("s", 1:4)) := tabulate(sample.int(4, total, replace = TRUE))]

> DT
   total s1 s2 s3 s4
1:   100 31 31 31 31
2:   110 25 25 25 25
3:    90 22 22 22 22
4:    92 22 22 22 22

How to get around this? I am clearly missing some basic understanding on how R vector/list work. Your help will be much appreciated.

Edited following edited question:

data.table will expect a list internally when you want to assign to many columns. To get it so each row is unique, then you can do that by adding a by each row:

DT <- data.table(total = c(100, 110, 90, 102, 92))
DT[, c(paste0("s", 1:4)) := {
  as.list(tabulate(sample.int(4, total, replace = TRUE)))
  }, by = seq(NROW(DT))]

Which outputs the following, satisfying the OP criteria:

> DT
   total s1 s2 s3 s4
1:   100 27 28 28 17
2:   110 25 23 36 26
3:    90 26 19 26 19
4:   102 28 24 21 29
5:    92 17 27 22 26
> apply(DT[, 2:5],1, sum)
[1] 100 110  90 102  92

Maybe you can try the code below

DTout <- cbind(
  DT,
  do.call(
    rbind,
    lapply(DT$total, function(x) diff(sort(c(0, sample(x - 1, 3), x))))
  )
)

which gives

   total V1 V2 V3 V4
1:   100 51  5 17 27
2:   110 41  1 40 28
3:    90 32 34 14 10
4:   102  5 73 13 11
5:    92 17 13 17 45

Test

> rowSums(DTout[,-1])
[1] 100 110  90 102  92

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM