[英]Generating Synthetic DNA Sequence with Substitution Rate
鉴于这些输入:
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
我想生成:
一千个长度 - 10个标签
标签中每个位置的替代率为0.003
产量如下:
AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags
在Perl中有一种紧凑的方式吗?
我坚持使用这个脚本的逻辑作为核心:
#!/usr/bin/perl
my $init_seq = "AAAAAAAAAA" #length 10 bp
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );
$i = 0;
while ($i < length($init_seq)) {
$roll = int(rand 4) + 1; # $roll is now an integer between 1 and 4
if ($roll == 1) {$base = A;}
elsif ($roll == 2) {$base = T;}
elsif ($roll == 3) {$base = C;}
elsif ($roll == 4) {$base = G;};
print $base;
}
continue {
$i++;
}
作为一个小优化,替换:
$roll = int(rand 4) + 1; # $roll is now an integer between 1 and 4
if ($roll == 1) {$base = A;}
elsif ($roll == 2) {$base = T;}
elsif ($roll == 3) {$base = C;}
elsif ($roll == 4) {$base = G;};
同
$base = $dna[int(rand 4)];
编辑:假设替代率在0.001到1.000的范围内:
和$roll
,生成[1..1000]范围内的另一个(伪)随机数,如果它小于或等于(1000 * $ sub_rate)则执行替换,否则什么都不做(即输出'A' “)。
请注意,除非知道随机数生成器的属性,否则可能会引入微妙的偏差。
不完全是你想要的,但我建议你看看BioPerl的Bio :: SeqEvolution :: DNAPoint模块。 但它并不以突变率作为参数。 相反,它询问与您想要的原始序列同一性的下限。
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqEvolution::Factory;
my $seq = Bio::Seq->new(-seq => 'AAAAAAAAAA', -alphabet => 'dna');
my $evolve = Bio::SeqEvolution::Factory->new (
-rate => 2, # transition/transversion rate
-seq => $seq
-identity => 50 # At least 50% identity with the original
);
my @mutated;
for (1..1000) { push @mutated, $evolve->next_seq }
所有1000个突变序列将存储在@mutated数组中,它们的序列可以通过seq
方法访问。
如果替换,您希望从可能性中排除当前基数 :
my @other_bases = grep { $_ ne substr($init_seq, $i, 1) } @dna;
$base = @other_bases[int(rand 3)];
另请参阅Mitch Wheat关于如何实施替代率的答案 。
我不知道我是否理解正确,但我会做这样的事情(伪代码):
digits = 'ATCG'
base = 'AAAAAAAAAA'
MAX = 1000
for i = 1 to len(base)
# check if we have to mutate
mutate = 1+rand(MAX) <= rate*MAX
if mutate then
# find current A:0 T:1 C:2 G:3
current = digits.find(base[i])
# get a new position
# but ensure that it is not current
new = (j+1+rand(3)) mod 4
base[i] = digits[new]
end if
end for
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.