[英]What is correct way to load json file stored in the BytesIO object?
Data which I'm receiving is bytes
therefore I need temporary file-like container.我收到的数据是bytes
因此我需要类似文件的临时容器。 To my best knowledge BytesIO
is file-like object, but json.load()
doesn't work on it:据我所知, BytesIO
是类文件对象,但json.load()
对它不起作用:
In [1]: import json
...: from io import BytesIO, TextIOWrapper
In [2]: d, b = dict(a=1, b=2), BytesIO()
In [3]: b.write(json.dumps(d).encode())
Out[3]: 16
In [4]: b.seek(0)
Out[4]: 0
In [5]: b.read()
Out[5]: b'{"a": 1, "b": 2}'
In [6]: b.seek(0)
Out[6]: 0
In [7]: json.load(b)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-233ac51d2711> in <module>()
----> 1 json.load(b)
/usr/lib/python3.5/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
266 cls=cls, object_hook=object_hook,
267 parse_float=parse_float, parse_int=parse_int,
--> 268 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
269
270
/usr/lib/python3.5/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
310 if not isinstance(s, str):
311 raise TypeError('the JSON object must be str, not {!r}'.format(
--> 312 s.__class__.__name__))
313 if s.startswith(u'\ufeff'):
314 raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",
TypeError: the JSON object must be str, not 'bytes'
One method that works:一种有效的方法:
In [8]: json.loads(b.getvalue().decode())
Out[8]: {'a': 1, 'b': 2}
Another one, presumably more efficient?另一个,大概更有效?
In [10]: b.seek(0)
Out[10]: 0
In [11]: json.load(TextIOWrapper(b, encoding='utf-8'))
Out[11]: {'a': 1, 'b': 2}
Do I have more (better) alternatives?我有更多(更好)的选择吗? If no, which one of the above methods should be preferred?如果不是,应首选上述哪一种方法?
If you are using Python 3.5, upgrade to 3.6+如果您使用的是 Python 3.5,请升级到 3.6+
3.5 3.5
>>> import sys
>>> sys.version
'3.5.0 (default, Feb 16 2017, 15:47:16) \n[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]'
>>> import json
>>> from io import BytesIO
>>> d, b = dict(a=1, b=2), BytesIO()
>>> b.write(json.dumps(d).encode())
16
>>> b.seek(0)
0
>>> json.load(b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/cmermingas/.pyenv/versions/3.5.0/lib/python3.5/json/__init__.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/Users/cmermingas/.pyenv/versions/3.5.0/lib/python3.5/json/__init__.py", line 312, in loads
s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'
the JSON object must be str, not 'bytes'
3.6 3.6
>>> import sys
>>> sys.version
'3.6.0 (default, Jul 10 2017, 22:19:26) \n[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]'
>>> import json
>>> from io import BytesIO
>>> d, b = dict(a=1, b=2), BytesIO()
>>> b.write(json.dumps(d).encode())
16
>>> b.seek(0)
0
>>> json.load(b)
{'a': 1, 'b': 2}
I would recommend using TextIOWrapper
for two reasons:我建议使用TextIOWrapper
有两个原因:
fileobj.read().decode()
would needlessly load all 10MB into memory, but if you use TextIOWrapper
then only a few bytes would be loaded before a JsonDecodeError
is thrown.想象一下,您有一个 10MB 的文件,它不是有效的 json - fileobj.read().decode()
会不必要地将所有 10MB 加载到内存中,但是如果您使用TextIOWrapper
那么在抛出JsonDecodeError
之前只会加载几个字节。Since you are dealing with JSON, which is purely texts, you should use io.StringIO
instead of io.BytesIO
:由于您正在处理纯文本的 JSON,您应该使用io.StringIO
而不是io.BytesIO
:
>>> import json
>>> from io import StringIO
>>> d, b = dict(a=1, b=2), StringIO()
>>> b.write(json.dumps(d))
16
>>> b.seek(0)
0
>>> b.read()
'{"a": 1, "b": 2}'
>>> b.seek(0)
0
>>> json.load(b)
{'a': 1, 'b': 2}
Tested with python 3.5用python 3.5测试
import json
import socket, pycurl
from io import BytesIO
test_url='http://echo.jsontest.com/key/value/one/two'
s = pycurl.Curl()
buffer= BytesIO()
s.setopt(s.URL, test_url)
s.setopt(s.HTTPHEADER, ['Host:' + 'localhost'])
s.setopt(s.WRITEDATA, buffer)
s.perform()
response = buffer.getvalue()
response = response.decode('utf-8')
# json.loads in python 3.5, not json.load
rj = json.loads(response)
srj = json.dumps(rj, indent=4, sort_keys=True)
print(srj)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.