简体   繁体   English

Perl:为变量分配3个可能值之一

[英]Perl: Assigning a variable one of 3 possible values

I have a DNA sequence. 我有一个DNA序列。 Let's call it "ATCG". 我们称之为“ ATCG”。 I have 2 small databases of DNA sequences in 2 separate files, which we will call "db1.txt" and "db2.txt". 我在2个单独的文件中拥有2个小型的DNA序列数据库,我们将其称为“ db1.txt”和“ db2.txt”。 Both databases are formatted as follows: 这两个数据库的格式如下:

>name of sequence
EXAMPLESEQUENCEATCGATCG
>name of another sequence
ASECONDEXAMPLESEQUENCEATCGATCG

I want to know if my DNA sequence is contained in one of the databases, and if so which one. 我想知道我的DNA序列是否包含在其中一个数据库中,如果包含。 My result, then, has 3 possible values: my sequence is in neither database, in db1, or in db2. 这样,我的结果就有3个可能的值:我的序列既不在数据库中,不在db1中,也不在db2中。 Here's my code: 这是我的代码:

use warnings;
use strict;
my $entry = 'ATCG';
my $returnval = "The sequence is from neither database";

#if in db1
    my $name1;
    my $seq1;
    open (my $database1, "<", "db1.txt") or die "Can't find db1";
    while (<$database1>){
        chomp ($name1 = <$database1>);
        chomp ($seq1 = <$database1>);
        if (
            index($seq1, $entry) != -1
            || index($entry, $seq1) != -1
        ) {
            $returnval = "The sequence is from db1: ". $name1;
            last;
        }
    }

#If in db2:
    my $name2;
    my $seq2;
    open (my $database2, "<", "db2.txt") or die "Can't find db2";
    while (<$database2>){
        chomp ($name2 = <$database2>);
        chomp ($seq2 = <$database2>);
        if(
            index($seq2, $entry) != -1
            || index($entry, $seq2) != -1
        ) {
            $returnval = "The sequence is from db2: ". $name2;
            last;
        }

    }
    print $returnval . "\n";

There are a few problems with this code (probably more than a few). 此代码存在一些问题(可能不止几个)。 No matter what my sequence, $returnval = "The sequence is from db2: " with no name at the end. 不管我的序列如何,$ returnval =“该序列来自db2:”,结尾没有名称。 Furthermore, it seems that $name2 and $seq2 are uninitialized values, even though the code is identical to that for db1. 而且,即使代码与db1的代码相同,$ name2和$ seq2似乎也是未初始化的值。 If I remove the entire section for testing for db2, the code only returns "the sequence is from db1: " followed by the appropriate name for some sequences I copied and pasted from the database, while it returns "the sequence is from neither database" for others. 如果删除整个部分以测试db2,则代码仅返回“该序列来自db1:”,然后返回从数据库复制和粘贴的某些序列的适当名称,而返回“该序列来自两个数据库”为他人。

What am I doing wrong? 我究竟做错了什么? How do I fix the uninitialized values, and why is the code for db2 not working? 如何修复未初始化的值,为什么db2的代码不起作用?

EDIT: I forgot to mention that outputting that the sequence is in db2 takes precedence over outputting that it is in db1, should a sequence be in both. 编辑:我忘了提一下,如果序列在db2中,则在db2中输出该序列优先于在db1中输出该序列。

The main issue is in the conditions of the while loops, which read and discard a line each iteration and prevent the $name and $seq variables from containing a name and sequence each time. 主要问题在于while循环的条件,该循环每次迭代读取和丢弃一行,并防止$name$seq变量每次都包含名称和序列。 Removing that condition and placing the check for end-of-file inside the loop should fix the problem. 删除该条件并将循环文件末尾检查放在循环内应该可以解决此问题。 It's also possible to loop over the two databases and apply the same logic to both, so you'll only need one loop to examine the contents of each file. 也可以遍历两个数据库并对它们应用相同的逻辑,因此您只需要一个循环即可检查每个文件的内容。

use warnings;
use strict;
my $entry = 'ATCG';
my $returnval = "The sequence is from neither database";
my @files = qw(db2 db1);

FILE:
for my $file (@files) {
    open my $fh, '<', "$file.txt" or die "Error opening $file: $!";
    while (1) {
        my $name = <$fh>;
        my $seq  = <$fh>;
        if (not defined $seq) {
            warn "Odd number of lines in $file" if defined $name;
            last; # Reached end of file
        }
        chomp($name, $seq);
        if (
            index($seq, $entry) != -1
            or index($entry, $seq) != -1
        ) {
            $returnval = "The sequence is from $file: $name";
            last FILE; # No need to search the others
        }
    }
}

print "$returnval\n";

I would wrap the comparison in a subroutine, especially since you have to do the same thing multiple times 我会将比较结果包装在一个子例程中,尤其是因为您必须多次执行相同的操作

This solution implements a subroutine matches , which returns the name of the matching sequence in the file, or a false value if it was not found 此解决方案实现了一个子例程matches ,该子例程返回文件中匹配序列的名称,如果未找到则返回一个

I have altered the record separator $/ to the > character so that sequences are split automatically, with each record consisting of the name up to the first newline character, and the sequence thereafter. 我将记录分隔符$/更改为>字符,以便自动拆分序列,每条记录的名称由第一个换行符开始,其后为序列。 The tr/\\n//d call removes any newlines from the sequence (so it will handle multi-line sequences as the FAST format supports) and a comparison is made for each sequence tr/\\n//d调用从序列中删除任何换行符(因此它将处理FAST格式支持的多行序列),并对每个序列进行比较

The calling code just uses a for loop to call the subroutine for each file name. 调用代码仅使用for循环来调用每个文件名的子例程。 The loop exits as soon as a match is found, leaving $name and $file set to the details of the match 找到匹配项后,循环立即退出,将$name$file设置为匹配项的详细信息

The message is built and printed according to whether $name ends up true 该消息是根据$name是否最终为true来构建和打印的

use strict;
use warnings 'all';
use feature 'say';

my $entry = 'ATCG';

my ($file, $name);

for $file ( qw/ db2 db1 / ) {
    last if $name = matches($entry, "$file.txt");
}

say $name ?
    "The sequence is from $file: $name" :
    "The sequence is from neither database";


sub matches {
    my ($seq, $file) = @_;

    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};

    local $/ = '>';

    while ( <$fh> ) {
        chomp;
        my ($name, $file_seq) = split /\n/, $_, 2;
        $file_seq =~ tr/\n//d;

        return $name if index($file_seq, $seq) >= 0 or index($seq, $file_seq) >= 0;
    }

    return;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM