简体   繁体   中英

how to use different columns as prob in sample function

i have a data frame with different variables. i have merged three columns of probabilities to my data frame. my question is how can i use these columns as probabilities in sample function so that for prob argument take each column as probability. for example for y= 1 take ncol (a), for y=1 take ncol(b) and so on my codes are:

    a    b    c    y
1  0.090 0.12 0.10 1
2  0.015 0.13 0.09 1
3  0.034 0.20 0.34 1
4  0.440 0.44 0.70 1
5  0.090 0.12 0.10 2
6  0.015 0.13 0.09 2

mydata$mig<- sample( 1:3, size = 7, replace = TRUE, prob= ????)

any help would be appreciated

While the "normal" indexing of 2d matrices and frames is the [i,j] method, one can also provide a 2-column matrix to i alone to programmatically combine the rows and columns. We can use this to create a matrix whose first column is merely counting the rows ( 1:6 here), and the second column is taken directly from your y column:

cbind(seq_len(nrow(mydata)), mydata$y)
#      [,1] [,2]
# [1,]    1    1
# [2,]    2    1
# [3,]    3    1
# [4,]    4    1
# [5,]    5    2
# [6,]    6    2
mydata[cbind(seq_len(nrow(mydata)), mydata$y)]
# [1] 0.090 0.015 0.034 0.440 0.120 0.130

Note that in this case, your sample -ing code is not going to work:

  • true --> TRUE
  • the length of derived probabilities is not the same length as your 1:3

Using apply function per rows:

df <- read.table(header = TRUE, text="a    b    c    y
1  0.090 0.12 0.10 1
2  0.015 0.13 0.09 1
3  0.034 0.20 0.34 1
4  0.440 0.44 0.70 1
5  0.090 0.12 0.10 2
6  0.015 0.13 0.09 2")
set.seed(12344)
samples1<- apply(X = df[,-4], MARGIN = 1, # MARGIN = 1 indicates you are applying FUN per rows 
             FUN = function(x) sample( 1:3, 
                                 size = 7,
                                 replace= TRUE ,
                                 prob = x))
#You obtain six columns from samples with prob parameter in df's rows
samples1
     1 2 3 4 5 6
[1,] 2 3 3 1 3 2
[2,] 1 2 3 3 2 2
[3,] 2 3 3 3 1 3
[4,] 2 3 3 1 3 2
[5,] 2 2 3 2 3 2
[6,] 1 3 2 3 2 3
[7,] 3 3 3 2 1 2

Update: Given your comment on my answer, I update and propose a new solution using data.table . I leave the previous version for reference if there will be anyone interested.

library(data.table)
setDT(df)
set.seed(78787)
#Column V1 has your 7 samples per group y, with probs taken at random from a,b,c
df[, sample(1:.N,
            size = 7,
            replace = TRUE,
            prob = unlist(.SD)),
   by = y,
   .SDcols = sample(names(df)[-ncol(df)], 1)]

    y V1
 1: 1  4
 2: 1  3
 3: 1  4
 4: 1  3
 5: 1  4
 6: 1  4
 7: 1  4
 8: 2  2
 9: 2  1
10: 2  1
11: 2  1
12: 2  2
13: 2  1
14: 2  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM