簡體   English   中英

用取代率生成合成DNA序列

[英]Generating Synthetic DNA Sequence with Substitution Rate

鑒於這些輸入:

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

我想生成:

  1. 一千個長度 - 10個標簽

  2. 標簽中每個位置的替代率為0.003

產量如下:

AAAAAAAAAA
AATAACAAAA
.....
AAGGAAAAGA # 1000th tags

在Perl中有一種緊湊的方式嗎?

我堅持使用這個腳本的邏輯作為核心:

#!/usr/bin/perl

my $init_seq = "AAAAAAAAAA" #length 10 bp 
my $sub_rate = 0.003;
my $nof_tags = 1000;
my @dna = qw( A C G T );

    $i = 0;
    while ($i < length($init_seq)) {
        $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

        if ($roll == 1) {$base = A;}
        elsif ($roll == 2) {$base = T;}
        elsif ($roll == 3) {$base = C;}
        elsif ($roll == 4) {$base = G;};

        print $base;
    }
    continue {
        $i++;
    }

作為一個小優化,替換:

    $roll = int(rand 4) + 1;       # $roll is now an integer between 1 and 4

    if ($roll == 1) {$base = A;}
    elsif ($roll == 2) {$base = T;}
    elsif ($roll == 3) {$base = C;}
    elsif ($roll == 4) {$base = G;};

    $base = $dna[int(rand 4)];

編輯:假設替代率在0.001到1.000的范圍內:

$roll ,生成[1..1000]范圍內的另一個(偽)隨機數,如果它小於或等於(1000 * $ sub_rate)則執行替換,否則什么都不做(即輸出'A' “)。

請注意,除非知道隨機數生成器的屬性,否則可能會引入微妙的偏差。

不完全是你想要的,但我建議你看看BioPerl的Bio :: SeqEvolution :: DNAPoint模塊。 但它並不以突變率作為參數。 相反,它詢問與您想要的原始序列同一性的下限。

use strict;
use warnings;
use Bio::Seq;
use Bio::SeqEvolution::Factory;

my $seq = Bio::Seq->new(-seq => 'AAAAAAAAAA', -alphabet => 'dna');

my $evolve = Bio::SeqEvolution::Factory->new (
   -rate     => 2,      # transition/transversion rate
   -seq      => $seq
   -identity => 50      # At least 50% identity with the original
);


my @mutated;
for (1..1000) { push @mutated, $evolve->next_seq }

所有1000個突變序列將存儲在@mutated數組中,它們的序列可以通過seq方法訪問。

如果替換,您希望從可能性中排除當前基數

my @other_bases = grep { $_ ne substr($init_seq, $i, 1) } @dna;
$base = @other_bases[int(rand 3)];

另請參閱Mitch Wheat關於如何實施替代率的答案

我不知道我是否理解正確,但我會做這樣的事情(偽代碼):

digits = 'ATCG'
base = 'AAAAAAAAAA'
MAX = 1000
for i = 1 to len(base)
  # check if we have to mutate
  mutate = 1+rand(MAX) <= rate*MAX
  if mutate then
    # find current A:0 T:1 C:2 G:3
    current = digits.find(base[i])
    # get a new position 
    # but ensure that it is not current
    new = (j+1+rand(3)) mod 4        
    base[i] = digits[new]
  end if
end for

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM