繁体   English   中英

为什么在我的Perl脚本打印输出中打印空白行

[英]Why are blank lines printing in my perl script print output

脚本执行的操作的细节并不重要,但是我在对我来说很重要的行中添加了注释,我只关心为什么我的输出中出现空白行

当我运行命令

./script.pl temp temp.txt tempF `wc -l temp | awk '{print $1}'`

临时文件包含

1   27800000    120700000   4
1   27800000    124300000   4
1   154800000   247249719   3
3   32100000    71800000    9
3   32100000    87200000    2
3   54400000    74200000    15
4   76500000    155100000   20
4   76500000    182600000   3
4   76500000    88200000    77
4   88200000    124000000   2
5   58900000    180857866   8
5   58900000    76400000    2
5   58900000    97300000    4
5   76400000    143100000   14
5   97300000    147200000   6
6   7000000 29900000    2
6   63500000    70000000    73
6   63500000    92100000    4
6   70000000    113900000   70
6   70000000    139100000   57
6   92100000    113900000   3

我正在获取表格的输出

hs1 27800000    124300000   4


hs3 32100000    87200000    2
hs3 54400000    74200000    15

hs4 76500000    182600000   3
hs4 76500000    88200000    77
hs4 88200000    124000000   2

hs5 58900000    76400000    2
hs5 58900000    97300000    4
hs5 76400000    143100000   14
hs5 97300000    147200000   6


hs6 63500000    92100000    4

hs6 70000000    139100000   57
hs6 92100000    113900000   3

对于标准输出(大约8行也被打印到temp.txt文件中,但这些行的格式正确)

这是下面的脚本

#!/usr/bin/perl

# ARGV[0] is the name of the file which data will be read from(may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the data  
# ARGV[3] is the number of lines that ARGV[0] will contain

use warnings;

my $file  = "./$ARGV[0]";
my @lines = do {
    open my $fh, '<', $file or die "Can't open $file -- $!";
    <$fh>;
};

my $file2 = "./$ARGV[2]/$ARGV[1]";
open( my $files, ">", "$file2" ) or die "Can't open > $file2: $!";

my $i = 0;
while ( $i < $ARGV[3] - 1 ) {

    my @ref_fields = split( '\s+', $lines[$i] );

    print $files
        "$ref_fields[0]", "\t",
        $ref_fields[1], "\t",
        $ref_fields[2], "\t",
        $ref_fields[3], "\n";

    for my $j ( $i + 1 .. $ARGV[3] - 1 ) {

        $i = $j;

        # @curr_fields is initialized here

        my @curr_fields = split /\s+/, $lines[$j];

        if ( $ref_fields[0] eq $curr_fields[0] && $ref_fields[2] > $curr_fields[1] ) {

            if ( defined( $curr_fields[0] ) && $curr_fields[0] !~ /\s+/ ) {

                chomp $curr_fields[3];

                # the line below is the one that is printing to standard output
                print
                    $curr_fields[0], "\t",
                    $curr_fields[1], "\t",
                    $curr_fields[2], "\t",
                    $curr_fields[3], "\n";
            }
        }
        else {
            last;
        }
    }

    print "\n";
}

编辑:

我从运行命令时发布的答案中运行脚本时发现错误

./script.pl temp1 temp10.txt folder

temp1所在的位置

12  58100000    96200000    0.04348
3   74200000    87200000    0.04348
5   130600000   168500000   0.04348
6   61000000    114600000   0.04348
6   75900000    114600000   0.04348
6   88000000    114600000   0.04348
6   88000000    139000000   0.04348
6   93100000    161000000   0.04348
6   105500000   139000000   0.04348
6   130300000   139000000   0.04348
7   59900000    77500000    0.04348
7   98000000    132600000   0.04348
X   67800000    76000000    0.08696
Y   28800000    59373566    0.04348

我懂了

6   75900000    114600000   0.04348
6   88000000    114600000   0.04348
6   88000000    139000000   0.04348
6   93100000    161000000   0.04348
6   105500000   139000000   0.04348

并且temp10.txt包含

12  58100000    96200000    0.04348
3   74200000    87200000    0.04348
5   130600000   168500000   0.04348
6   61000000    114600000   0.04348
6   130300000   139000000   0.04348
7   59900000    77500000    0.04348
7   98000000    132600000   0.04348
X   67800000    76000000    0.08696

线

Y   28800000    59373566    0.04348

既不在输出中也不在temp10.txt中。 它似乎消失了,但应该打印到其中之一

似乎很明显空白行正在打印,因为您有一行

print "\n";

在你的代码中

我无能为力,因为您说“脚本执行的细节并不重要” ,因此我们不提供该脚本含义。

但是,只要第一列与上一行的第一列匹配并且第二字段小于上一行的第三字段,您编写的内容就会从输入文件中打印行。 任何时候,如果您收到的行不符合这种要求,您就在打印空白行



您可能更喜欢代码的这种重构,其行为相同,但我认为它更具可读性。 它还具有只将一次输入文件中的每一行拆分一次的优点,并且不需要第四个参数,因为行数只是@lines数组的大小。 读取时,空行将从文件中删除,因此不再需要检查第一个字段的定义

#!/usr/bin/perl

# ARGV[0] is the name of the file which data will be read from (may have overlaps)
# ARGV[1] is the name of the file which will be produced that will have no overlaps
# ARGV[2] is the name of the folder which will hold all the circos data file (mitelmanAll, mitelmanProstate, etc.)

use strict;
use warnings 'all';

use File::Path 'make_path';
use File::Spec::Functions 'catfile';

my ($file, $newfile, $dir) = @ARGV;
$newfile = catfile($dir, $newfile);

my @lines = do {
    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};
    map { [ split ] } grep /\S/, <$fh>;
};

make_path($dir);
open my $out_fh, '>', $newfile or die qq{Unable to open "$newfile" for output: $!};

for ( my $i = 0; $i < $#lines; ) {

    my $ref_fields = $lines[$i];

    print $out_fh join("\t", @$ref_fields[0..3]), "\n";

    for my $j ( $i + 1 .. $#lines ) {

        $i = $j;

        my $curr_fields = $lines[$j];

        last unless $curr_fields->[0] == $ref_fields->[0];
        last unless $curr_fields->[1] <  $ref_fields->[2];

        print join("\t", @$curr_fields[0..3]), "\n";
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM