简体   繁体   English

Python 将二进制数据解码回文件

[英]Python Decoding binary data back to file

i have a database in MSSQL with compressed and converted files, looks like this:我在 MSSQL 中有一个包含压缩和转换文件的数据库,如下所示:

screenshot of values(every of them is 40k symbols long值的屏幕截图(每个都是 40k 符号长

i need to decode these files to pdf, docx and png files.我需要将这些文件解码为 pdf、docx 和 png 文件。

i've tried to do this via base64, but it didn't build correct files.我试图通过 base64 来做到这一点,但它没有构建正确的文件。

Do you have any ideas how could i decode all of them and build to correct file?你有什么想法我怎么能解码所有这些并构建正确的文件?

Your data appears to be a PNG with something pre-pended to the front of it.您的数据似乎是一个 PNG,前面有一些东西。 If you strip the first 12 bytes with dd and then revert the hex to binary with xxd you can recover the start of a PNG file:如果用dd去除前 12 个字节,然后用xxd将十六进制恢复为二进制,则可以恢复 PNG 文件的开头:

dd bs=12 skip=1 if=YOURFILE | xxd -r -p > image.png

You can then check that PNG file and see its size and the fact that it is truncated like this:然后,您可以检查该 PNG 文件并查看它的大小以及它被截断的事实,如下所示:

pngcheck -v image.png 

Sample Output样品 Output

File: image.png (21833 bytes)
  chunk IHDR at offset 0x0000c, length 13
    2164 x 835 image, 24-bit RGB, non-interlaced
  chunk sRGB at offset 0x00025, length 1
    rendering intent = perceptual
  chunk gAMA at offset 0x00032, length 4: 0.45455
  chunk pHYs at offset 0x00042, length 9: 3779x3779 pixels/meter (96 dpi)
  chunk IDAT at offset 0x00057, length 65445:  EOF while reading data
ERRORS DETECTED in image.png

The data is hex-encoded, try:数据是十六进制编码的,试试:

from base64 import b16decode

# Data 
encoded = '0x48656C6C6F'
decoded = b16decode(encoded[2:])
print(decoded)

Outputs b'Hello'输出b'Hello'

As your learning the hard way stuffing blobs into a text database is probably the worst sin a data manager could commit as a novice, bloated unwieldy and slow it is best if the source files are left in their fast natural native compressed state and simply referenced in the DB by a related unique ID and file storage name.当您学习将 blob 填充到文本数据库中的艰难方法时,可能是数据管理员作为新手可能犯下的最严重的罪过,臃肿笨拙且速度缓慢,最好将源文件留在其快速自然的本机压缩 state 中并简单地在DB 通过相关的唯一 ID 和文件存储名称。 Rant over.吐槽一下。

The fact that they are fixed size blocks of 40K suggests they are chunked in pieces thus several odd chunks needed to create one whole BLOB.它们是 40K 的固定大小块这一事实表明它们被分块,因此需要几个奇数块来创建一个完整的 BLOB。

The blob you presented appears to be just part of a PNG image that should be, if I am interpreting correctly =如果我解释正确,您呈现的 blob 似乎只是 PNG 图像的一部分 =

2164 pixels wide by 835 pixels high

HOWEVER the output is only 5 pixels high within that oddly suspect size canvas, which might be correct if its just the first part of a much longer truncated stream.然而,output 在奇怪的可疑尺寸 canvas 内只有 5 个像素高,如果它只是截断更长的 ZF7B44CFAFD5C52223D5498196C8A2E 的第一部分,这可能是正确的

Your 40K chunk translates to 22K binary with the characteristics of a PNG BUT a PNG STARTS WITH 89 so you are having a problem since that is prefixed with 0x 00 22 40 DD BF您的 40K 块转换为具有 PNG 特征的 22K 二进制文件,但 PNG 以 89 开头,因此您遇到了问题,因为它的前缀为 0x 00 22 40 DD BF

We can discard the 0x as the signature for a Hex stream and use the remainder as I did above, but what is the significance of the ODD 00 22 40 DD BF (most likely contains in part an indicator of the final full length size or pointer to the next chunk)我们可以丢弃 0x 作为 Hex stream 的签名,并像我上面所做的那样使用余数,但是 ODD 00 22 40 DD BF 的意义是什么(很可能部分包含最终全长大小或指针的指示符到下一个块)

What you need to do is extract that image by your normal method and compare the total expected file size, since translated into 22Kb binary it may only equate to a small percent of the total to be expected.您需要做的是通过常规方法提取该图像并比较预期的总文件大小,因为转换为 22Kb 二进制文件可能仅相当于预期总数的一小部分。 In that case you need to determine how & where the rest of the image is stored in order to concatenate all the parts into one homogeneous blob ie a single image.在这种情况下,您需要确定图像的 rest 的存储方式和位置,以便将所有部分连接成一个同质 blob,即单个图像。

You need to have sight of the method where chunks are extracted slowly converted slowly and stitched together slowly, but using some measure of expected file size.您需要了解块被缓慢提取并缓慢转换并缓慢拼接在一起的方法,但使用一些预期文件大小的度量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM