简体   繁体   中英

Python LZMA Corrupt data error when trying to decompress

response = requests.get('http://content.warframe.com/PublicExport/index_en.txt.lzma')
data = lzma.decompress(response.content)

The error I am getting is:

_lzma.LZMAError: Corrupt input data

I don't think the data is corrupt because I can download it from browser and extract it fine using 7zip. I have tried to find a solution online but there does not seem to be a lot of information about this problem. I have also tried using a different way of decompressing it with no luck. ( Python LZMA: Compressed data ended before the end-of-stream marker was reached )

Edit: This is the current solution that "works". Pretty much much, chop off the end and ignore EOF errors.

def fix():
    response = requests.get('http://content.warframe.com/PublicExport/index_en.txt.lzma')
    data = response.content
    byt = bytes(data)
    length = len(data)
    stay = True
    while stay:
        stay = False
        try:
            decompress_lzma(byt[0:length])
        except LZMAError:
            length -= 1
            stay = True

    print(decompress_lzma(byt[0:length]))

# FROM: https://stackoverflow.com/a/37400585/15041587
def decompress_lzma(data):
    results = []
    while True:
        decomp = LZMADecompressor(FORMAT_AUTO, None, None)
        try:
            res = decomp.decompress(data)
        except LZMAError:
            if results:
                break  # Leftover data is not a valid LZMA/XZ stream; ignore it.
            else:
                raise  # Error on the first iteration; bail out.
        results.append(res)
        data = decomp.unused_data
        if not data:
            break
        if not decomp.eof:
            raise LZMAError("Compressed data ended before the end-of-stream marker was reached")
    return b"".join(results)

I was also able to open the file with 7zip. But after trying to decompress the above-linked file with xz and seeing

$ xz --format=lzma --decompress -t index_en.txt.lzma
xz: index_en.txt.lzma: Compressed data is corrupt

I'm not entirely certain, but I suspect that the file might actually be corrupt or non-standard in some way , ie the way 7zip is able to decompress this file successfully is not the norm.

To further support this, if I create a new LZMA file via xz , eg

xz --format=lzma --compress -k <file>

and try to decompress and read that file with lzma.open() in Python, it works without any issues.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM