简体   繁体   English

在python中解码Base64 Gzip

[英]Decode Base64 Gzip in python

I'm trying to decode a gzip garmin activity file using Python. 我正在尝试使用Python解码gzip garmin活动文件。 According to Garmin the file is a base64 gz file. 根据Garmin所说,该文件是base64 gz文件。 I'm uploading the file from the browser via post and receiving the data in a Django App. 我正在通过邮局从浏览器上传文件,并在Django应用中接收数据。

The beginning of the file looks like this. 文件的开头看起来像这样。

begin-base64 644 data.xml.gz\\nH4sIAAAAAAAAA y9a4 lx3Hn d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu begin-base64 644 data.xml.gz \\ nH4sIAAAAAAAAA y9a4 lx3Hn d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu

I've used the following code to adjust for padding and decode base64: 我使用以下代码来调整填充和解码base64:

import base64
padding_factor = (4 - len(data) % 4) % 4
data += "="*padding_factor
data_decoded = base64.b64decode(unicode(data).translate(dict(zip(map(ord, u'-_'), u'+/'))))

The beginning of data_decoded looks like this on the screen: 屏幕上data_decoded的开头看起来像这样:

\\xe8"\\x9f\\xe6\\xda\\xb1\\xee\\xb8\\xeb\\x8e\\x1dj\\xd6\\xb1\\x9aX3\\x1f\\x8b\\x08\\x00\\x00\\x00\\x00\\x00\\x00\\x03/Z\\xe2\\w\\x1ewz~\\x0b\\x81\\xec\\x9c\\xcd\\xb8fd6(\\r}\\xda\\xc0\\x19\\xb3\\x00a\\xa5\\xde5\\xf6\\xcf\\xa2U\\xe9\\x95\\x88\\x91H\\x81n\\xcb\\xf7\\xb4\\x9f\\xcc\\xa7y%\\xbd\\x95\\x9e\\x13\\xcd\\x10\\xf9Th\\x04\\x8d\\xdf\\xdf\\xa6\\xba\\xa9\\xcd\\xf9=s\\xf8G\\xfc \\ xe8“ \\ x9f \\ xe6 \\ xda \\ xb1 \\ xee \\ xb8 \\ xeb \\ x8e \\ x1dj \\ xd6 \\ xb1 \\ x9aX3 \\ x1f \\ x8b \\ x08 \\ x00 \\ x00 \\ x00 \\ x00 \\ x00 \\ x00 \\ x00 \\ x03 / Z \\ xe2 \\ w \\ x1ewz〜\\ x0b \\ x81 \\ xec \\ x9c \\ xcd \\ xb8fd6(\\ r} \\ xda \\ xc0 \\ x19 \\ xb3 \\ x00a \\ xa5 \\ xde5 \\ xf6 \\ xcf \\ xa2U \\ xe9 \\ x95 \\ x88 \\ x91H \\ x81n \\ xcb \\ xf7 \\ xb4 \\ x9f \\ xcc \\ xa7y%\\ xbd \\ x95 \\ x9e \\ x13 \\ xcd \\ x10 \\ xf9Th \\ x04 \\ x8d \\ xdf \\ xdf \\ xa6 \\ xba \\ xa9 \\ xcd \\ xf9 = s \\ xf8G \\ xfc

print data_decoded looks like this: print data_decoded看起来像这样:

}???a??5?ϢU镈?H?n????̧y%?????Th??ߦ????=s?G? } ??? a ?? 5?ϢU镈?H?n ????̧y%????? Th ??ߦ??????? = s?G?

I then try to unzip the file using the following: 然后,我尝试使用以下命令将文件解压缩:

from cStringIO import StringIO
from gzip import GzipFile
sio = StringIO(data_decoded)
gzf = gzip.GzipFile(fileobj=sio)
guff = gzf.read()

After which I get the following error: 之后,我得到以下错误:

  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 245, in read
    self._read(readsize)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 287, in _read
    self._read_gzip_header()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/gzip.py", line 181, in _read_gzip_header
    raise IOError, 'Not a gzipped file'
IOError: Not a gzipped file

I also tried saving the file directly to disk and running gunzip from the command line and that also results in the same error. 我还尝试将文件直接保存到磁盘并从命令行运行gunzip,这也会导致相同的错误。

Any help would be much appreciated. 任何帮助将非常感激。

You need to strip off the beginning of the file since it is not part of the base64 data. 您需要删除文件的开头,因为它不是base64数据的一部分。 If you know that the \\n will be part of every file you can use it as a delimiter: 如果您知道\\n将成为每个文件的一部分,则可以将其用作定界符:

index = data.find('\\n')
if index > 0:
    data = data[index+2:]

It looks like you're decoding the entire thing, including the begin-base64 644 data.xml.gz part, so you're getting a bunch of garbage at the start: 看起来您正在解码整个内容,包括begin-base64 644 data.xml.gz部分,因此开始时会遇到一堆垃圾:

b1 = '''begin-base64 644 data.xml.gz\nH4sIAAAAAAAAA y9a4 lx3Hn
d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu'''

b2 = '''\nH4sIAAAAAAAAA y9a4 lx3Hn
d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu'''

If you run your algorithm on b2, you get something starting with this: 如果您在b2上运行算法,则会得到以下内容:

m\xe8"\x9d\xb6\xac{\xae

(I don't know how you lost the m in copying and pasting, but either way, it's not valid.) (我不知道您是如何在复制和粘贴过程中丢失m ,但是无论哪种方式,它都是无效的。)

If you run it on b2 , you get something starting with this: 如果在b2上运行它,您将获得以下内容:

\x1f\x8b\x08\x00\x00\x00

That's what you want. 那正是你想要的。

Of course taking off the '\\n' has the same effect, since base64 ignores whitespace. 当然,取消'\\n'具有相同的效果,因为base64会忽略空格。 So most likely, it's being used as a delimiter. 因此最有可能被用作分隔符。 If that's actually a '\\\\n' (aka r'\\n' ) rathern than a '\\n' , you have to remove it to get the right answer. 如果实际上是'\\\\n' (aka r'\\n' )而不是'\\n'必须将其删除才能得到正确的答案。

Also, you seem to be doing a lot of extra work for no good reason. 另外,您似乎没有充分的理由要做很多额外的工作。 Most likely the data is actually correctly padded, but that part may be worthwhile. 实际上,很可能正确地填充了数据,但这部分可能是值得的。 But the whole translate(dict(zip(map(ord, u'-_'), u'+/'))) does the same thing as passing an altchars argument to b64decode , but less efficiently and harder to read (if it's correct). 但是整个translate(dict(zip(map(ord, u'-_'), u'+/')))作用与将altchars参数传递给b64decode ,但是效率较低且较难读取(如果它是正确)。 (By the way, if you were doing translate as an optimization against the cost of calling replace twice, the conversion to and from Unicode is almost certain to overwhelm the savings. Even if you had profiled and determined that it made a difference, you'd probably want to generate the translate map above—both for efficiency, so you don't do it once per string, and, more importantly, for readability.) (顺便说一句,如果您translate作为对两次调用replace的开销的优化,则几乎可以肯定到Unicode的转换是不节省费用的。即使您已经分析并确定它有所作为,您也可以d可能想生成上面的translate图-两者都是为了提高效率,因此您不必为每个字符串做一次转换,更重要的是,为了提高可读性。)

Putting it together: 把它放在一起:

data = '''begin-base64 644 data.xml.gz\nH4sIAAAAAAAAA y9a4 lx3Hn
d6fguB7JzNuGZkNigNfdrAGbMAYaXeNfbPolXplYiRSIFu'''
_, data = data.split('\n', 1)
padding_factor = (4 - len(data) % 4) % 4
data += "="*padding_factor
data_decoded = base64.b64decode(data, '-_')

Again, if you've got a '\\\\n' rather than a '\\n' , change the split line accordingly. 同样,如果您使用的是'\\\\n'而不是'\\n' ,请相应地更改split线。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM