How to open a json.gz.part file using Python?

Question

I have lots of json.gz files in a directory and some them are json.gz.part. Supposedly, when saving them, some of the files were too large and they were splitted.

I tried to open them as normally using:

with gzip.open(file, 'r') as fin:
        json_bytes = fin.read()  
    json_str = json_bytes.decode('utf-8')            # 2. string (i.e. JSON)
    bb = json.loads(json_str)

But when it comes to the .gz.part files I get an error:

uncompress = self._decompressor.decompress(buf, size)

error: Error -3 while decompressing data: invalid code lengths set

I've tried the jiffyclub's solution, but I get the following error:

    _read_eof = gzip.GzipFile._read_eof

AttributeError: type object 'GzipFile' has no attribute '_read_eof'

EDIT:

If I read line by line I'm able to read most of the content file, until I get an error:

with gzip.open(file2,'r') as fin:        
        for line in fin: 
            print(line.decode('utf-8'))

After printing most of the content I get:

error: Error -3 while decompressing data: invalid code lengths set

But using this last method I cannot convert its content to a json file.

Answer 1

import gzip
import shutil

# open the .gz file
with gzip.open('file.gz.part', 'rb') as f_in:
    # open the decompressed file
    with open('file.part', 'wb') as f_out:
        # decompress the .gz file and write the decompressed data to the decompressed file
        shutil.copyfileobj(f_in, f_out)

# now you can open the decompressed file
with open('file.part', 'r') as f:
    # do something with the file
    contents = f.read()

This code will open the.gz.part file, decompress the data, and write the decompressed data to a new file called file.part. You can then open the file.part file and read its contents just like you would with any other text file.

How to open a json.gz.part file using Python?

Question

1 answers

solution1
2 2023-01-06 16:32:35

How to open a json.gz.part file using Python?

Question

1 answers

solution1 2 2023-01-06 16:32:35

solution1
2 2023-01-06 16:32:35