简体   繁体   English

如何在Perl中读取二进制文件

[英]How to read binary file in Perl

I'm having an issue with writing a Perl script to read a binary file. 我在编写Perl脚本来读取二进制文件时遇到问题。

My code is as the following whereby the $file are files in binary format. 我的代码如下所示, $file是二进制格式的文件。 I tried to search through the web and apply in my code, tried to print it out, but it seems it doesn't work well. 我试图通过网络搜索并在我的代码中应用,试图将其打印出来,但似乎它不能正常工作。

Currently it only prints the '&&&&&&&&&&&" and ""ppppppppppp", but what I really want is it can print out each of the $line , so that I can do some other post processing later. 目前,它仅打印“&&&&&&&&&&&‘和‘’ppppppppppp’,但我真正想要的是它可以打印出各的$line ,这样我以后可以做一些其他的后处理。 Also, I'm not quite sure what the $data is as I see it is part of the code from sample in article, stating suppose to be a scalar. 另外,我不太确定$data是什么,因为我认为它是文章中示例代码的一部分,说明是一个标量。 I need somebody who can pin point me where the error goes wrong in my code. 我需要一个可以指出我代码中错误出错的人。 Below is what I did. 以下是我的所作所为。

my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my @dirs = readdir(TEMP1);
closedir(TEMP1);

foreach my $dirs (@dirs) {
    next if ($dirs eq "." || $dirs eq "..");
    print "---->$dirs\n";
    my $d = "$basedir/$key/$dirs";
    if (-d "$d") {
        opendir (TEMP2, $d) || die $!;
        my @files = readdir (TEMP2); # This should read binary files
        closedir (TEMP2);

        #my $buffer = "";
        #opendir (FILE, $d) || die $!;
        #binmode (FILE);
        #my @files =  readdir (FILE, $buffer, 169108570);
        #closedir (FILE);

        foreach my $file (@files) {
            next if ($file eq "." || $file eq "..");
            my $f = "$d/$file";
            print "==>$file\n";
            open FILE, $file || die $!;
            binmode FILE;
            foreach ($line = read (FILE, $data, 169108570)) {
                print "&&&&&&&&&&&$line\n";
                print "ppppppppppp$data\n";
            }
            close FILE;
        }
    }
}

I have altered my code so that it goes like as below. 我已经改变了我的代码,所以它如下所示。 Now I can read the $data. 现在我可以阅读$ data了。 Thanks J-16 SDiZ for pointing out that. 感谢J-16 SDiZ指出这一点。 I'm trying to push the info I got from the binary file to an array called "@array", thinkking to grep data from the array for string whichever match "p04" but fail. 我正在尝试将我从二进制文件中获取的信息推送到名为“@array”的数组,想要从数组中获取数据,以获取字符串中哪个匹配“p04”但是失败。 Can someone point out where is the error? 有人可以指出错误在哪里?

my $tmp = "$basedir/$key";
opendir (TEMP1, "$tmp");
my @dirs = readdir (TEMP1);
closedir (TEMP1);

foreach my $dirs (@dirs) {
    next if ($dirs eq "." || $dirs eq "..");
    print "---->$dirs\n";
    my $d = "$basedir/$key/$dirs";
    if (-d "$d") {
        opendir (TEMP2, $d) || die $!;
        my @files = readdir (TEMP2); #This should read binary files
        closedir (TEMP2);

        foreach my $file (@files) {
            next if ($file eq "." || $file eq "..");
            my $f = "$d/$file";
            print "==>$file\n";
            open FILE, $file || die $!;
            binmode FILE;
            foreach ($line = read (FILE, $data, 169108570)) {
                print "&&&&&&&&&&&$line\n";
                print "ppppppppppp$data\n";
                push @array, $data;
            }
            close FILE;
        }
    }
}

foreach $item (@array) {
    #print "==>$item<==\n"; # It prints out content of binary file without the ==> and <== if I uncomment this.. weird!
    if ($item =~ /p04(.*)/) {
        print "=>$item<===============\n"; # It prints "=><===============" according to the number of binary file I have.  This is wrong that I aspect it to print the content of each binary file instead :(
        next if ($item !~ /^w+/);
        open (LOG, ">log") or die $!;
        #print LOG $item;
        close LOG;
    }
}

Again, I changed my code as following, but it still doesn't work as it do not able to grep the "p04" correctly by checking on the "log" file. 同样,我改变了我的代码如下,但它仍然不起作用,因为它无法通过检查“日志”文件正确地grep“p04”。 It did grep the whole file including binary like this "@^@^@^@^G^D^@^@^@^^@p04bbhi06^@^^@^@^@^@^@^@^@^@hh^R^@^@^@^^@^@^@p04lohhj09^@^@^@^^@@" . 它确实grep整个文件包括二进制这样的“@ ^ @ ^ @ ^ @ ^ G ^ D ^ @ ^ @ ^ @ ^^ @ p04bbhi06 ^ @ ^^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @ ^ @hh ^ R ^ @ ^ @ ^ @ ^^ @ ^ @ ^ @ p04lohhj09 ^ @ ^ @ ^ @ ^^ @@“。 What I'm aspecting is it do grep the anything with p04 only such as grepping p04bbhi06 and p04lohhj09. 我所面对的是它只用grep p04bbhi06和p04lohhj09进行grep。 Here is how my code goes:- 以下是我的代码:

foreach my $file (@files) {
    next if ($file eq "." || $file eq "..");
    my $f = "$d/$file";
    print "==>$file\n";
    open FILE, $f || die $!;
    binmode FILE;
    my @lines = <FILE>;
    close FILE;
    foreach $cell (@lines) {
        if ($cell =~ /b12/) {
            push @array, $cell;
        }
    }
}

#my @matches = grep /p04/, @lines;
#foreach $item (@matches) {
foreach $item (@array) {
    #print "-->$item<--";
    open (LOG, ">log") or die $!;
    print LOG $item;
    close LOG;
}

Use: 采用:

$line = read (FILE, $data, 169108570);

The data is in $data ; 数据是$data ; and $line is the number of bytes read. $line是读取的字节数。

       my $f = "$d/$file" ;
       print "==>$file\n" ;
       open FILE, $file || die $! ;

I guess the full path is in $f , but you are opening $file . 我想完整路径是$f ,但你打开$file (In my testing -- even $f is not the full path, but I guess you may have some other glue code...) (在我的测试中 - 即使$f不是完整的路径,但我想你可能还有其他的胶水代码...)

If you just want to walk all the files in a directory, try File::DirWalk or File::Find . 如果您只想遍历目录中的所有文件,请尝试File::DirWalkFile::Find

I am not sure if I understood you right. 我不确定我是否理解你。

If you need to read a binary file, you can do the same as for a text file: 如果需要读取二进制文件,则可以执行与文本文件相同的操作:

open F, "/bin/bash";
my $file = do { local $/; <F> };
close F;

Under Windows you may need to add binmode F; 在Windows下,您可能需要添加binmode F; under *nix it works without it. 在* nix下它没有它。

If you need to find which lines in an array contains some word, you can use grep function: 如果需要查找数组中哪些行包含某些单词,可以使用grep函数:

my @matches = grep /something/, @array_to_grep;

You will get all matched lines in the new array @matches . 您将在新数组@matches获得所有匹配的行。

BTW: I don't think it's a good idea to read tons of binary files into memory at once. 顺便说一句:我不认为一次将大量二进制文件读入内存是个好主意。 You can search them 1 by 1... 你可以逐个搜索它们......

If you need to find where the match occurs you can use another standard function, index : 如果你需要找到的匹配时,你可以使用另一个标准函数, index

my $offset = index('myword', $file);

I'm not sure I'll be able to answer the OP question exactly, but here are some notes that may be related. 我不确定我是否能完全回答OP问题,但这里有一些可能相关的注释。 (edit: this is the same approach as answer by @Dimanoid, but with more detail) (编辑:这与@Dimanoid的答案相同,但更详细)

Say you have a file, which is a mix of ASCII data, and binary. 假设您有一个文件,它是ASCII数据和二进制文件的混合。 Here is an example in a bash terminal: 这是一个bash终端的例子:

$ echo -e "aa aa\x00\x0abb bb" | tee tester.txt
aa aa
bb bb
$ du -b tester.txt 
13  tester.txt
$ hexdump -C tester.txt 
00000000  61 61 20 61 61 00 0a 62  62 20 62 62 0a           |aa aa..bb bb.|
0000000d

Note that byte 00 (specified as \\x00 ) is a non-printable character, (and in C , it also means "end of a string") - thereby, its presence makes tester.txt a binary file. 请注意,字节00 (指定为\\x00 )是不可打印的字符(在C ,它也表示“字符串结尾”) - 因此,它的存在使tester.txt成为二进制文件。 The file has size of 13 bytes as seen by du , because of the trailing \\n added by the echo (as it can be seen from hexdump ). du看到的文件大小为13个字节,因为echo添加尾随\\n (从hexdump可以看出)。

Now, let's see what happens when we try to read it with perl 's <> diamond operator (see also What's the use of <> in perl? ): 现在,让我们看看当我们尝试使用perl<>菱形运算符读取它时会发生什么 (另请参阅perl中<>的用法是什么? ):

$ perl -e '
open IN, "<./tester.txt";
binmode(IN);
$data = <IN>; # does this slurp entire file in one go?
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'

length is: 7
data is: --aa aa
--

Clearly, the entire file didn't get slurped - it broke at the line end \\n (and not at the binary \\x00 ). 很明显,整个文件没有被玷污 - 它在行结束时打破\\n (而不是在二进制文件\\x00 )。 That is because the diamond filehandle <FH> operator is actually shortcut for readline (see Perl Cookbook: Chapter 8, File Contents ) 这是因为diamond filehandle <FH>操作符实际上是readline快捷方式(参见Perl Cookbook:第8章,文件内容

The same link tells that one should undef the input record separator, \\$ (which by default is set to \\n ), in order to slurp the entire file. 相同的链接告诉我们应该取消输入记录分隔符\\n \\$ (默认情况下设置为\\n ),以便粘贴整个文件。 You may want to have this change be only local, which is why the braces and local are used instead of undef (see Perl Idioms Explained - my $string = do { local $/; }; ); 您可能希望将此更改仅限于本地,这就是使用大括号和local而不是undef (请参阅Perl Idioms Explained - my $ string = do {local $ /;}; ); so we have: 所以我们有:

$ perl -e '
open IN, "<./tester.txt";
print "_$/_\n"; # check if $/ is \n
binmode(IN);
{
local $/; # undef $/; is global
$data = <IN>; # this should slurp one go now
};
print "_$/_\n"; # check again if $/ is \n
close(IN);
print "length is: " . length($data) . "\n";
print "data is: --$data--\n";
'

_
_
_
_
length is: 13
data is: --aa aa
bb bb
--

... and now we can see the file is slurped in its entirety. ...现在我们可以看到该文件完全被淹没了。

Since binary data implies unprintable characters, you may want to inspect the actual contents of $data by printing via sprintf or pack / unpack instead. 由于二进制数据意味着不可打印的字符,您可能希望通过sprintfpack / unpack打印来检查$data的实际内容。

Hope this helps someone, 希望这有助于某人,
Cheers! 干杯!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM