I am trying to read a json object from a.gz file.
Here is the code:
with gzip.open("C:/Users/shaya/Downloads/sample.gz", 'rb') as fin:
json_bytes = fin.read()
json_str = json_bytes.decode('utf-8')
data = json.loads(json_str)
print(data)
I am getting this error:
JSONDecodeError: Extra data: line 1 column 2 (char 1)
json string is not able to convert into json object.
EDIT . As suggested by @CharlesDuffy you have gzipped tar archive with JSON inside. See Second Version for reading gzipped tars. First Version is for reading gzip only.
First Version
I think you compressed/decompressed your JSON data somehow wrongly, as it contains non-JSON leading bytes after decompression.
Either you have to cut/remove leading non-JSON bytes from your decompressed data or re-create your data like in code below. For your case to remove leading wrong bytes do json_str = json_str[json_str.find('{'):]
before json.loads(...)
.
Down below is full working code of step-by-step json encoding / gzip compressing / writing to file / reading from file / gzip decompressing / json decoding:
import json, gzip
# Encode/Write
pydata = {
'a': [1,2,3],
'b': False,
}
jdata = json.dumps(pydata, indent = 4)
serial = jdata.encode('utf-8')
with open('data.json.gz', 'wb') as f:
f.write(gzip.compress(serial))
# Read/Decode
serial, pydata, jdata = None, None, None
with open('data.json.gz', 'rb') as f:
serial = gzip.decompress(f.read())
jdata = serial.decode('utf-8')
pydata = json.loads(jdata)
print(pydata)
Output:
{'a': [1, 2, 3], 'b': False}
Second Version
Down below is code for reading JSON inside gzipped tar files. It reads first JSON file from tar, you may replace fname =...
with correct file name of JSON file if there are several json files.
import json, gzip, tarfile, io
with open('data.json.tar.gz', 'rb') as f:
tserial = gzip.decompress(f.read())
with tarfile.open(fileobj = io.BytesIO(tserial), mode = 'r') as f:
fname = [e for e in f.getnames() if e.lower().endswith('.json')][0]
serial = f.extractfile(fname).read()
jdata = serial.decode('utf-8')
pydata = json.loads(jdata)
print(pydata)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.