how to use different columns as prob in sample function

Question

i have a data frame with different variables. i have merged three columns of probabilities to my data frame. my question is how can i use these columns as probabilities in sample function so that for prob argument take each column as probability. for example for y= 1 take ncol (a), for y=1 take ncol(b) and so on my codes are:

    a    b    c    y
1  0.090 0.12 0.10 1
2  0.015 0.13 0.09 1
3  0.034 0.20 0.34 1
4  0.440 0.44 0.70 1
5  0.090 0.12 0.10 2
6  0.015 0.13 0.09 2

mydata$mig<- sample( 1:3, size = 7, replace = TRUE, prob= ????)

any help would be appreciated

Answer 1

While the "normal" indexing of 2d matrices and frames is the [i,j] method, one can also provide a 2-column matrix to i alone to programmatically combine the rows and columns. We can use this to create a matrix whose first column is merely counting the rows ( 1:6 here), and the second column is taken directly from your y column:

cbind(seq_len(nrow(mydata)), mydata$y)
#      [,1] [,2]
# [1,]    1    1
# [2,]    2    1
# [3,]    3    1
# [4,]    4    1
# [5,]    5    2
# [6,]    6    2
mydata[cbind(seq_len(nrow(mydata)), mydata$y)]
# [1] 0.090 0.015 0.034 0.440 0.120 0.130

Note that in this case, your sample -ing code is not going to work:

true --> TRUE
the length of derived probabilities is not the same length as your 1:3

Answer 2

Using apply function per rows:

df <- read.table(header = TRUE, text="a    b    c    y
1  0.090 0.12 0.10 1
2  0.015 0.13 0.09 1
3  0.034 0.20 0.34 1
4  0.440 0.44 0.70 1
5  0.090 0.12 0.10 2
6  0.015 0.13 0.09 2")
set.seed(12344)
samples1<- apply(X = df[,-4], MARGIN = 1, # MARGIN = 1 indicates you are applying FUN per rows 
             FUN = function(x) sample( 1:3, 
                                 size = 7,
                                 replace= TRUE ,
                                 prob = x))
#You obtain six columns from samples with prob parameter in df's rows
samples1
     1 2 3 4 5 6
[1,] 2 3 3 1 3 2
[2,] 1 2 3 3 2 2
[3,] 2 3 3 3 1 3
[4,] 2 3 3 1 3 2
[5,] 2 2 3 2 3 2
[6,] 1 3 2 3 2 3
[7,] 3 3 3 2 1 2

Update: Given your comment on my answer, I update and propose a new solution using data.table . I leave the previous version for reference if there will be anyone interested.

library(data.table)
setDT(df)
set.seed(78787)
#Column V1 has your 7 samples per group y, with probs taken at random from a,b,c
df[, sample(1:.N,
            size = 7,
            replace = TRUE,
            prob = unlist(.SD)),
   by = y,
   .SDcols = sample(names(df)[-ncol(df)], 1)]

    y V1
 1: 1  4
 2: 1  3
 3: 1  4
 4: 1  3
 5: 1  4
 6: 1  4
 7: 1  4
 8: 2  2
 9: 2  1
10: 2  1
11: 2  1
12: 2  2
13: 2  1
14: 2  1

how to use different columns as prob in sample function

Question

2 answers

solution1
0 2021-03-29 17:35:05

solution2
0 2021-03-29 17:38:44

how to use different columns as prob in sample function

Question

2 answers

solution1 0 2021-03-29 17:35:05

solution2 0 2021-03-29 17:38:44

solution1
0 2021-03-29 17:35:05

solution2
0 2021-03-29 17:38:44