使用Python 3.4解压缩bz2-TypeError：“ str”不支持缓冲区接口

Question

有类似的错误，但我找不到bz2的解决方案。

以下程序在解压缩时失败：

import bz2

un =  'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw =  'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'
decoded_un = bz2.decompress(un)
decoded_pw = bz2.decompress(pw)

print(decoded_un)
print(decoded_pw)

我尝试使用bytes(un, 'UTF-8)但这不起作用。 我想我在Python 3.3中没有这个问题。

编辑：这是针对Python的挑战，我有两段代码可以工作，这要感谢Martijn：

import bz2

un_saved =  'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw_saved =  'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'
print(bz2.decompress(un_saved.encode('latin1')))
print(bz2.decompress(pw_saved.encode('latin1')))

这可以从网页上进行：

# http://www.pythonchallenge.com/pc/def/integrity.html

import urllib.request
import re
import os.path
import bz2

fname = "008.html"

if not os.path.isfile(fname):
    url = 'http://www.pythonchallenge.com/pc/def/integrity.html'
    response = urllib.request.urlopen(url)
    webpage = response.read().decode("utf-8")
    with open(fname, "w") as fh:
        fh.write(webpage)

with open(fname, "r") as fh:
    webpage = fh.read()
    re_un = '\\nun: \'(.*)\'\\n'
    m = re.search(re_un, webpage)
    un = m.group(1)
    print(un)

    pw_un = '\\npw: \'(.*)\'\\n'
    m = re.search(pw_un, webpage)
    pw = m.group(1)
    print(pw)

    unde = un.encode('latin-1').decode('unicode_escape').encode('latin1')
    pwde = pw.encode('latin-1').decode('unicode_escape').encode('latin1')
    decoded_un = bz2.decompress(unde)
    decoded_pw = bz2.decompress(pwde)

    print(decoded_un)
    print(decoded_pw)

Answer 1

bz2库处理bytes对象 ，而不是字符串：

un = b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
pw = b'BZh91AY&SY\x94$|\x0e\x00\x00\x00\x81\x00\x03$ \x00!\x9ah3M\x13<]\xc9\x14\xe1BBP\x91\xf08'

换句话说，使用bytes()可以正常工作，只需确保使用正确的编码即可。 UTF-8不是那种编码。 如果您将字节掩码作为字符串字符代码点，请改用Latin-1进行编码； Latin 1将字符一对一映射到字节：

un = un.encode('latin1')

要么

un = bytes(un, 'latin1')

另请参阅Python Unicode HOWTO ：

Latin-1，也称为ISO-8859-1，是类似的编码。 Unicode代码点0–255与Latin-1值相同，因此要转换为这种编码，只需要将代码点转换为字节值即可。 如果遇到大于255的代码点，则无法将字符串编码为Latin-1。

我将解码交给您。 尽情享受Python挑战吧！

请注意，如果您是从网页上加载这些字符的，则它们不会以现成的字节为单位！ 您将拥有字符'\\' ， 'x' ， 8和2而不是具有十六进制值82的代码点。您需要首先将这些序列解释为Python字符串文字：

>>> un = r'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'
>>> un
'BZh91AY&SYA\\xaf\\x82\\r\\x00\\x00\\x01\\x01\\x80\\x02\\xc0\\x02\\x00 \\x00!\\x9ah3M\\x07<]\\xc9\\x14\\xe1BA\\x06\\xbe\\x084'
>>> un.encode('latin-1').decode('unicode_escape')
'BZh91AY&SYA¯\x82\r\x00\x00\x01\x01\x80\x02À\x02\x00 \x00!\x9ah3M\x07<]É\x14áBA\x06¾\x084'
>>> un.encode('latin-1').decode('unicode_escape').encode('latin1')
b'BZh91AY&SYA\xaf\x82\r\x00\x00\x01\x01\x80\x02\xc0\x02\x00 \x00!\x9ah3M\x07<]\xc9\x14\xe1BA\x06\xbe\x084'

请注意un表示中的双反斜杠。 只有最后的bytes结果才可以解压缩！

使用Python 3.4解压缩bz2-TypeError：“ str”不支持缓冲区接口

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-12-24 21:13:44

使用Python 3.4解压缩bz2-TypeError：“ str”不支持缓冲区接口

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-12-24 21:13:44

解决方案1
1 已采纳 2014-12-24 21:13:44