简体   繁体   中英

Python Decoding binary data back to file

i have a database in MSSQL with compressed and converted files, looks like this:

screenshot of values(every of them is 40k symbols long

i need to decode these files to pdf, docx and png files.

i've tried to do this via base64, but it didn't build correct files.

Do you have any ideas how could i decode all of them and build to correct file?

Your data appears to be a PNG with something pre-pended to the front of it. If you strip the first 12 bytes with dd and then revert the hex to binary with xxd you can recover the start of a PNG file:

dd bs=12 skip=1 if=YOURFILE | xxd -r -p > image.png

You can then check that PNG file and see its size and the fact that it is truncated like this:

pngcheck -v image.png 

Sample Output

File: image.png (21833 bytes)
  chunk IHDR at offset 0x0000c, length 13
    2164 x 835 image, 24-bit RGB, non-interlaced
  chunk sRGB at offset 0x00025, length 1
    rendering intent = perceptual
  chunk gAMA at offset 0x00032, length 4: 0.45455
  chunk pHYs at offset 0x00042, length 9: 3779x3779 pixels/meter (96 dpi)
  chunk IDAT at offset 0x00057, length 65445:  EOF while reading data
ERRORS DETECTED in image.png

The data is hex-encoded, try:

from base64 import b16decode

# Data 
encoded = '0x48656C6C6F'
decoded = b16decode(encoded[2:])
print(decoded)

Outputs b'Hello'

As your learning the hard way stuffing blobs into a text database is probably the worst sin a data manager could commit as a novice, bloated unwieldy and slow it is best if the source files are left in their fast natural native compressed state and simply referenced in the DB by a related unique ID and file storage name. Rant over.

The fact that they are fixed size blocks of 40K suggests they are chunked in pieces thus several odd chunks needed to create one whole BLOB.

The blob you presented appears to be just part of a PNG image that should be, if I am interpreting correctly =

2164 pixels wide by 835 pixels high

HOWEVER the output is only 5 pixels high within that oddly suspect size canvas, which might be correct if its just the first part of a much longer truncated stream.

Your 40K chunk translates to 22K binary with the characteristics of a PNG BUT a PNG STARTS WITH 89 so you are having a problem since that is prefixed with 0x 00 22 40 DD BF

We can discard the 0x as the signature for a Hex stream and use the remainder as I did above, but what is the significance of the ODD 00 22 40 DD BF (most likely contains in part an indicator of the final full length size or pointer to the next chunk)

What you need to do is extract that image by your normal method and compare the total expected file size, since translated into 22Kb binary it may only equate to a small percent of the total to be expected. In that case you need to determine how & where the rest of the image is stored in order to concatenate all the parts into one homogeneous blob ie a single image.

You need to have sight of the method where chunks are extracted slowly converted slowly and stitched together slowly, but using some measure of expected file size.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM