简体   繁体   中英

How can I access the nth byte of a binary scalar in Perl?

Thanks to everyone in advance.

I'd like to access the nth byte of a binary scalar. For example you could get all the file data in one scalar variable...

Imagine that the binary data is collected into scalar...

open(SOURCE, "<", "wl.jpg"); 
my $thisByteData = undef; 
while(<SOURCE>){$thisByteData .= $_;} 
close SOURCE; 

$thisByteData is raw binary data. When I use length($thisByteData) I get the byte count back, so Perl does know how big it is. My question is how can I access the Nth byte?

Side note: My function is going to receive this binary scalar, its in my function that I want to access the Nth byte. The help regarding how to collect this data is appreciated but not what I'm looking for. Whichever way the other programmer wants to collect the binary data is up to them, my job is to get the Nth byte when its passed to me :)

Again thanks so much for the help to all!


Thanks to @muteW who has gotten me further than ever. I guess I'm not understanding unpack(...) correctly.

print(unpack("N1", $thisByteData));
print(unpack("x N1", $thisByteData));
print(unpack("x0 N1", $thisByteData));

Is returning the following:

4292411360
3640647680
4292411360

I would assume those 3 lines would all access the same (first) byte. Using no "x" just an "x" and "x$pos" is giving unexpected results.

I also tried this...

print(unpack("x0 N1", $thisByteData));
print(unpack("x1 N1", $thisByteData));
print(unpack("x2 N1", $thisByteData));

Which returns... the same thing as the last test...

4292411360
3640647680
4292411360

I'm definatly missing something about how unpack works.


If I do this...

print(oct("0x". unpack("x0 H2", $thisByteData)));
print(oct("0x". unpack("x1 H2", $thisByteData)));
print(oct("0x". unpack("x2 H2", $thisByteData)));

I get what I was expecting...

255
216
255

Can't unpack give this to me itself without having to use oct()?


As a side note: I think I'm getting the 2's complement of these byte integers when using "x$pos N1". I'm expecting these as the first 3 bytes.

255
216
255

Thanks again for the help to all.


Special thanks to @brian d foy and @muteW ... I now know how to access the Nth byte of my binary scalar using unpack(...). I have a new problem to solve now, which isn't related to this question. Again thanks for all the help guys!

This gave me the desired result...

print(unpack("x0 C1", $thisByteData));
print(unpack("x1 C1", $thisByteData));
print(unpack("x2 C1", $thisByteData));

unpack(...) has a ton of options so I recommend that anyone else who reads this read the pack/unpack documentation to get the byte data result of their choice. I also didn't try using the Tie options @brian mentioned, I wanted to keep the code as simple as possible.

If you have the data in a string and you want to get to a certain byte, use substr , as long as you are treating the string as bytes to start with.

However, you can read it directly from the file without all this string nonsense people have been filling your head with. :) Open the file with sysopen and the right options, use seek to put yourself where you want, and read what you need with sysread .

You skip all the workarounds for the stuff that open and readline are trying to do for you. If you're just going to turn off all of their features, don't even use them.

I think the correct answer involves pack/unpack, but this might also work:

use bytes;
while( $bytestring =~ /(.)/g ){
   my $byte = $1;
   ...
}

"use bytes" ensures that you never see characters -- but if you have a character string and are processing it as bytes, you are doing something wrong. Perl's internal character encoding is undefined, so the data you see in the string under "use bytes" is nearly meaningless.

Since you already have the file contents in $thisByteData you could use pack / unpack to access the n-th byte.

sub getNthByte {
  my ($pos) = @_;
  return unpack("x$pos b1", $thisByteData);
}

#x$pos - treats $pos bytes as null bytes(effectively skipping over them) 
#b1    - returns the next byte as a bit string

Read through the pack documentation to get a sense of the parameters you can use in the template to get different return values.

EDIT - Your comment below shows that you are missing the high-order nybble ('f') of the first byte. I am not sure why this is happening but here is an alternative method that works, in the meantime I'll have a further look into unpack's behavior.

sub getNthByte {
  my ($pos) = @_;
  return unpack("x[$pos]H2", $binData);
}

(my $hex = unpack("H*", $binData)) =~ s/(..)/$1 /g;
#To convert the entire data in one go

Using this the output for the first four bytes are - 0xff 0xd8 0xff 0xe0 which matches the documentation .

The Perl built-in variable $/ (or $INPUT_RECORD_SEPARATOR in if you're use ing English ) controls Perl's idea of a "line". By default it is set to "\\n" , so lines are separated by newline characters (duh), but you can change this to any other string. Or change it to a reference to a number:

$/ = \1;
while(<FILE>) {
  # read file
}

Setting it to a reference to a number will tell Perl that a "line" is that number of bytes.

Now, what exactly are you trying to do? There's probably a number of modules that will do what you're trying to do, and possibly more efficiently. If you're just trying to learn how to do it, go ahead, but if you have a specific task in mind, consider not reinventing the wheel (unless you want to).

EDIT: Thanks to jrockway in the comments...

If you have Unicode data, this may not read one byte, but one character, but if this happens, you should be able to use bytes; to turn off automatic byte-to-character translation.

Now, you say you want to read the data all at once and then pass it to a function. Let's do this:

my $data;
{
  local $/;
  $data = <FILE>;
}

Or this:

my $data = join("", <FILE>);

Or some will suggest the File::Slurp module, but I think it's a bit overkill. However, let's get an entire file into an array of bytes:

use bytes;

...

my @data = split(//, join("", <FILE>));

And then we have an array of bytes that we can pass to a function. Like?

Without knowing much more about what you're trying to do with your data, something like this will iterate over the bytes in the file:

open(SOURCE, "wl.jpg");
my $byte;
while(read SOURCE, $byte, 1) {
    # Do something with the contents of $byte
}
close SOURCE;

Be careful with the concatanation used in your example; you may end up with newline conversions, which is definitely not what you want to happen while reading binary files. (It's also inefficient to continually expand the scalar while reading it.) This is the idiomatic way to schlep an entire file into a Perl scalar:

open(SOURCE, "<", "wl.jpg");
local $/ = undef;
my $big_binary_data = <SOURCE>;
close SOURCE;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM