I have lots of json.gz files in a directory and some them are json.gz.part. Supposedly, when saving them, some of the files were too large and they were splitted.
I tried to open them as normally using:
with gzip.open(file, 'r') as fin:
json_bytes = fin.read()
json_str = json_bytes.decode('utf-8') # 2. string (i.e. JSON)
bb = json.loads(json_str)
But when it comes to the .gz.part
files I get an error:
uncompress = self._decompressor.decompress(buf, size)
error: Error -3 while decompressing data: invalid code lengths set
I've tried the jiffyclub's solution, but I get the following error:
_read_eof = gzip.GzipFile._read_eof
AttributeError: type object 'GzipFile' has no attribute '_read_eof'
EDIT:
If I read line by line I'm able to read most of the content file, until I get an error:
with gzip.open(file2,'r') as fin:
for line in fin:
print(line.decode('utf-8'))
After printing most of the content I get:
error: Error -3 while decompressing data: invalid code lengths set
But using this last method I cannot convert its content to a json file.
import gzip
import shutil
# open the .gz file
with gzip.open('file.gz.part', 'rb') as f_in:
# open the decompressed file
with open('file.part', 'wb') as f_out:
# decompress the .gz file and write the decompressed data to the decompressed file
shutil.copyfileobj(f_in, f_out)
# now you can open the decompressed file
with open('file.part', 'r') as f:
# do something with the file
contents = f.read()
This code will open the.gz.part file, decompress the data, and write the decompressed data to a new file called file.part. You can then open the file.part file and read its contents just like you would with any other text file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.