简体   繁体   中英

How to read json string from gzip

I am trying to read a json object from a.gz file.

Here is the code:

with gzip.open("C:/Users/shaya/Downloads/sample.gz", 'rb') as fin:
    json_bytes = fin.read()  
    
json_str = json_bytes.decode('utf-8')
data = json.loads(json_str)
print(data)

I am getting this error:

JSONDecodeError: Extra data: line 1 column 2 (char 1)

json string is not able to convert into json object.

EDIT . As suggested by @CharlesDuffy you have gzipped tar archive with JSON inside. See Second Version for reading gzipped tars. First Version is for reading gzip only.

First Version

I think you compressed/decompressed your JSON data somehow wrongly, as it contains non-JSON leading bytes after decompression.

Either you have to cut/remove leading non-JSON bytes from your decompressed data or re-create your data like in code below. For your case to remove leading wrong bytes do json_str = json_str[json_str.find('{'):] before json.loads(...) .

Down below is full working code of step-by-step json encoding / gzip compressing / writing to file / reading from file / gzip decompressing / json decoding:

Try it online!

import json, gzip

# Encode/Write

pydata = {
    'a': [1,2,3],
    'b': False,
}

jdata = json.dumps(pydata, indent = 4)

serial = jdata.encode('utf-8')

with open('data.json.gz', 'wb') as f:
    f.write(gzip.compress(serial))

# Read/Decode
   
serial, pydata, jdata = None, None, None
    
with open('data.json.gz', 'rb') as f:
    serial = gzip.decompress(f.read())
    
jdata = serial.decode('utf-8')

pydata = json.loads(jdata)

print(pydata)

Output:

{'a': [1, 2, 3], 'b': False}

Second Version

Down below is code for reading JSON inside gzipped tar files. It reads first JSON file from tar, you may replace fname =... with correct file name of JSON file if there are several json files.

import json, gzip, tarfile, io

with open('data.json.tar.gz', 'rb') as f:
    tserial = gzip.decompress(f.read())
    
with tarfile.open(fileobj = io.BytesIO(tserial), mode = 'r') as f:
    fname = [e for e in f.getnames() if e.lower().endswith('.json')][0]
    serial = f.extractfile(fname).read()
    
jdata = serial.decode('utf-8')

pydata = json.loads(jdata)

print(pydata)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM