[英]Simulations in R using probability
myfunction3 <- function(seq2,z)
for(j in 1:100)
{
if(z[j]>0.7)
{
if(seq2[j] =='A') replace(seq2,j,sample(c("C","G","T"),1))
else if(seq2[j] =='G') replace(seq2,j,sample(c("C","A","T"),1))
else if(seq2[j] =='T') replace(seq2,j,sample(c("C","G","A"),1))
else if(seq2[j] =='C') replace(seq2,j,sample(c("A","G","T"),1))
else if(seq2[j]=='E') replace(seq2,j,'T')
}
}
return(seq2)
我已经根据概率向量z编写了此函数来模拟给定的DNA序列seq2,其中,如果概率大于0.7,则新序列中可以包含其他三个核苷酸(A,G,T,C)中的任何一个地点。 但是每次它都返回一个NULL向量。
这是您的函数的紧凑变体:
myfunction3 <- function(seq2,z) {
for(j in which(z>0.7))
seq2[j] <- switch(seq2[j],
A=sample(c("C","G","T"),1),
G=sample(c("C","A","T"),1),
T=sample(c("C","G","A"),1),
C=sample(c("A","G","T"),1),
E="T"
)
return(seq2)
}
下面是它的工作原理:
set.seed(42)
z <- sample(1:10)/10
seq <- sample(c("A","G","T", "C"), 10, repl=TRUE)
data.frame(seq, z, seq2=myfunction3(seq,z))
# seq z seq2
# 1 G 1.0 T
# 2 T 0.9 C
# 3 C 0.3 C
# 4 G 0.6 G
# 5 G 0.4 G
# 6 C 0.8 T
# 7 C 0.5 C
# 8 A 0.1 A
# 9 G 0.2 G
# 10 T 0.7 T
测试最后一个条件(E =“ T”):
set.seed(42)
z <- sample(3:17)/10
seq <- sample(c("A","G","T", "C", "E"), length(z), repl=TRUE)
data.frame(seq, z, seq2=myfunction3(seq,z))
我假设seq2
是一个字符向量,并且z
是样本长度的向量,并且您想对seq2
中的位置进行突变,其中z > 0.7
一种方法是首先创建一个有效取代列表,用核苷酸作为键,然后编写一个突变函数,然后sapply
函数应用于seq2
向量,其中z > 0.7
:
substitutions <- list(A = c("C","G","T"),
G = c("A","C","T"),
T = c("A","C","G"),
C = c("A","G","T"),
E = c("T"))
mutate <- function(nucleotide){
sample(substitutions[[nucleotide]],1)
}
myfunc <- function(seq2,z){
to.change <- which(z > 0.7)
seq2[to.change] <- sapply(seq2[to.change],mutate)
seq2
}
例如:
> s <- sample(c("A","T","G","C","E"),10, replace = T)
> z <- sample(c(0,0.8),10, replace = T)
> rbind(s,z,myfunc(s,z))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
s "E" "A" "C" "G" "E" "C" "E" "T" "E" "A"
z "0.8" "0" "0" "0.8" "0" "0.8" "0.8" "0.8" "0" "0.8"
"T" "A" "C" "C" "E" "A" "T" "G" "E" "T"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.