用取代率生成合成DNA序列

Question

Given these inputs: 鉴于这些输入：

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

I want to generate: 我想生成：

One thousand length-10 tags 一千个长度 - 10个标签
Substitution rate for each position in a tag is 0.003 标签中每个位置的替代率为0.003

Yielding output like: 产量如下：

AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags

Is there a compact way to do it in Perl? 在Perl中有一种紧凑的方式吗？

I am stuck with the logic of this script as core: 我坚持使用这个脚本的逻辑作为核心：

#!/usr/bin/perl

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

    $i = 0;
    while ($i < length($init_seq)) {
        $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

        if ($roll == 1) {$base = A;}
        elsif ($roll == 2) {$base = T;}
        elsif ($roll == 3) {$base = C;}
        elsif ($roll == 4) {$base = G;};

        print $base;
    }
    continue {
        $i++;
    }

Answer 1

As a small optimisation, replace: 作为一个小优化，替换：

    $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

    if ($roll == 1) {$base = A;}
    elsif ($roll == 2) {$base = T;}
    elsif ($roll == 3) {$base = C;}
    elsif ($roll == 4) {$base = G;};

with 同

    $base = $dna[int(rand 4)];

Answer 2

EDIT: Assuming substitution rate is in the range 0.001 to 1.000: 编辑：假设替代率在0.001到1.000的范围内：

As well as $roll , generate another (pseudo)random number in the range [1..1000], if it is less than or equal to (1000 * $sub_rate) then perform the substitution, otherwise do nothing (ie output 'A'). 和$roll ，生成[1..1000]范围内的另一个（伪）随机数，如果它小于或等于（1000 * $ sub_rate）则执行替换，否则什么都不做（即输出'A' “）。

Be aware that you may introduce subtle bias unless the properties of your random number generator are known. 请注意，除非知道随机数生成器的属性，否则可能会引入微妙的偏差。

Answer 3

Not exactly what you are looking for, but I suggest you take a look at BioPerl's Bio::SeqEvolution::DNAPoint module. 不完全是你想要的，但我建议你看看BioPerl的Bio :: SeqEvolution :: DNAPoint模块。 It does not take mutation rate as a parameter though. 但它并不以突变率作为参数。 Rather, it asks what the lower bound of sequence identity with the original you want. 相反，它询问与您想要的原始序列同一性的下限。

use strict;
use warnings;
use Bio::Seq;
use Bio::SeqEvolution::Factory;

my $seq = Bio::Seq->new(-seq => 'AAAAAAAAAA', -alphabet => 'dna');

my $evolve = Bio::SeqEvolution::Factory->new (
   -rate     => 2,      # transition/transversion rate
   -seq      => $seq
   -identity => 50      # At least 50% identity with the original
);


my @mutated;
for (1..1000) { push @mutated, $evolve->next_seq }

All 1000 mutated sequences will be stored in the @mutated array, their sequences can be accessed via the seq method. 所有1000个突变序列将存储在@mutated数组中，它们的序列可以通过seq方法访问。

Answer 4

In the event of a substitution, you want to exclude the current base from the possibilities: 如果替换，您希望从可能性中排除当前基数 ：

my @other_bases = grep { $_ ne substr($init_seq, $i, 1) } @dna;
$base = @other_bases[int(rand 3)];

Also please see Mitch Wheat's answer for how to implement the substitution rate. 另请参阅Mitch Wheat关于如何实施替代率的答案。

Answer 5

I don't know if I understand correctly but I'd do something like this (pseudocode): 我不知道我是否理解正确，但我会做这样的事情（伪代码）：

digits = 'ATCG'
base = 'AAAAAAAAAA'
MAX = 1000
for i = 1 to len(base)
  # check if we have to mutate
  mutate = 1+rand(MAX) <= rate*MAX
  if mutate then
    # find current A:0 T:1 C:2 G:3
    current = digits.find(base[i])
    # get a new position 
    # but ensure that it is not current
    new = (j+1+rand(3)) mod 4        
    base[i] = digits[new]
  end if
end for

用取代率生成合成DNA序列

问题描述

5 个解决方案

解决方案1
5 已采纳 2009-03-02 10:21:58

解决方案2
3 2009-03-02 09:32:01

解决方案3
2 2009-03-02 20:14:57

解决方案4
1 2009-03-03 12:31:18

解决方案5
1 2009-03-05 04:41:19

用取代率生成合成DNA序列

问题描述

5 个解决方案

解决方案1 5 已采纳 2009-03-02 10:21:58

解决方案2 3 2009-03-02 09:32:01

解决方案3 2 2009-03-02 20:14:57

解决方案4 1 2009-03-03 12:31:18

解决方案5 1 2009-03-05 04:41:19

解决方案1
5 已采纳 2009-03-02 10:21:58

解决方案2
3 2009-03-02 09:32:01

解决方案3
2 2009-03-02 20:14:57

解决方案4
1 2009-03-03 12:31:18

解决方案5
1 2009-03-05 04:41:19