简体   繁体   English

Python 从二进制文件中读取“20”而不是“00”

[英]Python reads “20” instead of “00” from binary file

I'm writing a code meant to read a binary file and print the hex representation of its data as a csv, using NULL values as a separator.我正在编写一个代码,用于读取二进制文件并将其数据的十六进制表示打印为 csv,使用 NULL 值作为分隔符。 When looking at a file in a binary/hex viewer, it shows me this sequence as part of the file:在二进制/十六进制查看器中查看文件时,它向我显示此序列作为文件的一部分:

41 73 73 65 6d 62 6c 79 c8 2d 01 00 04 00 00 00 07 00 00 00 00

However, reading the file with this part of code:但是,使用这部分代码读取文件:

with open(file_in, "rb") as f:
    while (byte := f.read(1)):
        h_value = hex(ord(byte))
        h_value = ("0" + h_value[2:])[-2:]
        #print(byte)
        #print(h_value)
        if h_value != '00':
            data_read.append(h_value)
        else:
            data_read.append(h_value)
            if data_read:
                with open(file_out, 'a', newline = '') as c:
                    w = csv.writer(c)
                    w.writerow(data_read)
            data_read = []

Gives me this for that section instead:给我这个部分:

41,73,73,65,6d,62,6c,79,c3,88,2d,01,20,04,20,20,20,07,20,20,20,20

Which is relevant, because there are actual "20" values elsewhere in the file as data.这是相关的,因为文件中的其他地方有实际的“20”值作为数据。 Using the "print(byte)" and "print(h_value)" return b' ' and 20 respectively, which makes me think that it's Python reading the file wrong, not just the output being converted.使用 "print(byte)" 和 "print(h_value)" 分别返回b' '20 ,这让我认为是 Python读取文件错误,而不仅仅是转换的输出。 Is there anything I can do to preserve these NULL values through the process?我可以做些什么来在整个过程中保留这些 NULL 值?

Edit 1: Additional info, this is running Python 3.8.2 using IDLE.编辑 1:附加信息,这是使用 IDLE 运行 Python 3.8.2。 No idea if the compiler would make a difference for this, but I'm going to see if Visual Studio gives me different results.不知道编译器是否会对此有所不同,但我将看看 Visual Studio 是否会给我不同的结果。 The binary viewer is simply named Binary Viewer, version 6.17.二进制查看器简称为二进制查看器,版本 6.17。

There's nothing wrong with Python's reading of the file nor with the CSV creation, as evidenced by the following program: Python读取文件和 CSV 创建都没有问题,如下程序所示:

import os, csv

os.system("od -xcb qq.in") # Show file as byte dump.

data_read = []
with open("qq.in", "rb") as f:
    byte = f.read(1)
    while (byte):
        h_value = hex(ord(byte))
        h_value = ("0" + h_value[2:])[-2:]
        data_read.append(h_value)
        print(ord(byte), h_value) # Check individual bytes.
        byte = f.read(1)

print(data_read)
with open("file_out.csv", 'w') as c:
    w = csv.writer(c)
    w.writerow(data_read)
os.system("cat file_out.csv") # Show final CSV output.

The output of that program is:该程序的输出是:

0000000    7341    6573    626d    796c    2dc8    0001    0004    0000
          A   s   s   e   m   b   l   y 310   - 001  \0 004  \0  \0  \0
        101 163 163 145 155 142 154 171 310 055 001 000 004 000 000 000
0000020    0007    0000    0000
         \a  \0  \0  \0  \0
        007 000 000 000 000
0000025
(65, '41')
(115, '73')
(115, '73')
(101, '65')
(109, '6d')
(98, '62')
(108, '6c')
(121, '79')
(200, 'c8')
(45, '2d')
(1, '01')
(0, '00')
(4, '04')
(0, '00')
(0, '00')
(0, '00')
(7, '07')
(0, '00')
(0, '00')
(0, '00')
(0, '00')
['41', '73', '73', '65', '6d', '62', '6c', '79', 'c8', '2d', '01', '00', '04', '00', '00', '00', '07', '00', '00', '00', '00']
41,73,73,65,6d,62,6c,79,c8,2d,01,00,04,00,00,00,07,00,00,00,00

Hence I would start by looking at your input file a little more closely, it's likely that it is the problem.因此,我将通过在输入文件更贴切一点开始看,很可能是的问题。

Especially since there appears to be another change from your input, the c8 byte has been changed into c3 88 - this is a Unicode encoding transformation.特别是因为您的输入似乎有另一个变化,所以c8字节已更改为c3 88 - 这是一个 Unicode 编码转换。

As you can see from this answer , 0xc8 is in the two-byte UTF-8 section:这个答案中可以看出, 0xc8位于两字节的 UTF-8 部分:

Range              Encoding  Binary value
-----------------  --------  --------------------------
U+000080-U+0007ff  110yyyxx  00000yyy xxxxxxxx
                   10xxxxxx

The code point c8 is the bit sequence 000 1100 1000 so will be transformed into UTF-8 as 1100 0011 1000 1000 , or c3 88 .代码点c8是位序列000 1100 1000因此将被转换为 UTF-8 为1100 0011 1000 1000c3 88

With the information from the comments and paxdiablo's answer, I decided there must be something wrong with the file itself, since by all counts the problem shouldn't be with Python.根据评论中的信息和 paxdiablo 的回答,我决定文件本身一定有问题,因为从各方面来看,问题不应该出在 Python 上。 I opened it in the binary viewer again and exported it as a new .BIN file.我再次在二进制查看器中打开它并将其导出为一个新的 .BIN 文件。 The new file reads the way it's supposed to, so it looks like that's the solution.新文件以它应该的方式读取,所以看起来这就是解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM