i have a data frame with different variables. i have merged three columns of probabilities to my data frame. my question is how can i use these columns as probabilities in sample function so that for prob argument take each column as probability. for example for y= 1 take ncol (a), for y=1 take ncol(b) and so on my codes are:
a b c y
1 0.090 0.12 0.10 1
2 0.015 0.13 0.09 1
3 0.034 0.20 0.34 1
4 0.440 0.44 0.70 1
5 0.090 0.12 0.10 2
6 0.015 0.13 0.09 2
mydata$mig<- sample( 1:3, size = 7, replace = TRUE, prob= ????)
any help would be appreciated
While the "normal" indexing of 2d matrices and frames is the [i,j]
method, one can also provide a 2-column matrix to i
alone to programmatically combine the rows and columns. We can use this to create a matrix whose first column is merely counting the rows ( 1:6
here), and the second column is taken directly from your y
column:
cbind(seq_len(nrow(mydata)), mydata$y)
# [,1] [,2]
# [1,] 1 1
# [2,] 2 1
# [3,] 3 1
# [4,] 4 1
# [5,] 5 2
# [6,] 6 2
mydata[cbind(seq_len(nrow(mydata)), mydata$y)]
# [1] 0.090 0.015 0.034 0.440 0.120 0.130
Note that in this case, your sample
-ing code is not going to work:
true
--> TRUE
1:3
Using apply
function per rows:
df <- read.table(header = TRUE, text="a b c y
1 0.090 0.12 0.10 1
2 0.015 0.13 0.09 1
3 0.034 0.20 0.34 1
4 0.440 0.44 0.70 1
5 0.090 0.12 0.10 2
6 0.015 0.13 0.09 2")
set.seed(12344)
samples1<- apply(X = df[,-4], MARGIN = 1, # MARGIN = 1 indicates you are applying FUN per rows
FUN = function(x) sample( 1:3,
size = 7,
replace= TRUE ,
prob = x))
#You obtain six columns from samples with prob parameter in df's rows
samples1
1 2 3 4 5 6
[1,] 2 3 3 1 3 2
[2,] 1 2 3 3 2 2
[3,] 2 3 3 3 1 3
[4,] 2 3 3 1 3 2
[5,] 2 2 3 2 3 2
[6,] 1 3 2 3 2 3
[7,] 3 3 3 2 1 2
Update: Given your comment on my answer, I update and propose a new solution using data.table
. I leave the previous version for reference if there will be anyone interested.
library(data.table)
setDT(df)
set.seed(78787)
#Column V1 has your 7 samples per group y, with probs taken at random from a,b,c
df[, sample(1:.N,
size = 7,
replace = TRUE,
prob = unlist(.SD)),
by = y,
.SDcols = sample(names(df)[-ncol(df)], 1)]
y V1
1: 1 4
2: 1 3
3: 1 4
4: 1 3
5: 1 4
6: 1 4
7: 1 4
8: 2 2
9: 2 1
10: 2 1
11: 2 1
12: 2 2
13: 2 1
14: 2 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.