简体   繁体   中英

Compound print statement overwrites part of variable

I have some very bizarre behavior in a script that I wrote and have used for years but, for some reason, fails to run on one particular file.

Recognizing that the script is failing to identify a key that should be in a hash, I added some test print statements to read the keys. My normal strategy involves placing asterisks before and after the variable to detect potential hidden characters. Clearly, the keys are corrupt. Relevant code block:

foreach my $fastaRecord (@GenomeList) {

    my ($ID, $Seq) = split(/\n/, $fastaRecord, 2);

# uncomment next line to strip everything off sequence
# header except trailing numeric identifiers
#    $ID =~ s/.+?(\d+$)/$1/;

    $Seq =~ s/[^A-Za-z-]//g; # remove any kind of new line characters

    $RefSeqLen = length($Seq);

    $GenomeLenHash{$ID} = $RefSeqLen;

    print "$ID\n";

    print "*$ID**\n";

}

This produces the following output:

supercont3
**upercont3
Mitochondrion
**itochondrion
Chr1
**hr1
Chr2
**hr2
Chr3
**hr3
Chr4
**hr4

Normally, I'd suspect "illegal" newline characters as being involved. However, I manually replaced all newlines in the input file to try and solve the problem. What in the input file could be causing the script to execute in this way? I could imagine that maybe, despite my efforts, there is still an illegal newline after the ID variable, but then why are neither the first asterisk, nor newline characters after the double asterisk not printed, and why is the double asterisk printed at the beginning of the line in a way that overwrites the first asterisk as well as the first two characters of the variable "value"?

When you see these sorts of effects, look at the data in a file or in a hexdump. The terminal is going to hide data if it interprets backspace, carriage returns, and ansi sequences.

% perl script.pl | hexdump -C

Here's a simple example. I echo a , b , carriage return, then c . My terminal sees the carriage return and moves the cursor to the beginning of the line. After that, the output continues. The c masks the a :

% echo $'ab\rc'
cb

With a hex dump, I can see the 0d that represents the carriage return:

% echo $'ab\rc' | hexdump -C
00000000  61 62 0d 63 0a                                    |ab.c.|
00000005

Also, when you try to remove "any sort of newline" from $Seq , you might just remove vertical whitespace:

$target =~ s/\v//g;

You might also use the generalized newline to

$target =~ s/\R//g;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM