简体   繁体   English

加载存储在 BytesIO 对象中的 json 文件的正确方法是什么?

[英]What is correct way to load json file stored in the BytesIO object?

Data which I'm receiving is bytes therefore I need temporary file-like container.我收到的数据是bytes因此我需要类似文件的临时容器。 To my best knowledge BytesIO is file-like object, but json.load() doesn't work on it:据我所知, BytesIO是类文件对象,但json.load()对它不起作用:

In [1]: import json
   ...: from io import BytesIO, TextIOWrapper

In [2]: d, b = dict(a=1, b=2), BytesIO()

In [3]: b.write(json.dumps(d).encode())
Out[3]: 16

In [4]: b.seek(0)
Out[4]: 0

In [5]: b.read()
Out[5]: b'{"a": 1, "b": 2}'

In [6]: b.seek(0)
Out[6]: 0

In [7]: json.load(b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-233ac51d2711> in <module>()
----> 1 json.load(b)

/usr/lib/python3.5/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    266         cls=cls, object_hook=object_hook,
    267         parse_float=parse_float, parse_int=parse_int,
--> 268         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
    269 
    270 

/usr/lib/python3.5/json/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    310     if not isinstance(s, str):
    311         raise TypeError('the JSON object must be str, not {!r}'.format(
--> 312                             s.__class__.__name__))
    313     if s.startswith(u'\ufeff'):
    314         raise JSONDecodeError("Unexpected UTF-8 BOM (decode using utf-8-sig)",

TypeError: the JSON object must be str, not 'bytes'

One method that works:一种有效的方法:

In [8]: json.loads(b.getvalue().decode())
Out[8]: {'a': 1, 'b': 2}

Another one, presumably more efficient?另一个,大概更有效?

In [10]: b.seek(0)
Out[10]: 0

In [11]: json.load(TextIOWrapper(b, encoding='utf-8'))
Out[11]: {'a': 1, 'b': 2}

Do I have more (better) alternatives?我有更多(更好)的选择吗? If no, which one of the above methods should be preferred?如果不是,应首选上述哪一种方法?

If you are using Python 3.5, upgrade to 3.6+如果您使用的是 Python 3.5,请升级到 3.6+

3.5 3.5

>>> import sys                                                                                                   

>>> sys.version                                                                                                  
'3.5.0 (default, Feb 16 2017, 15:47:16) \n[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)]'


>>> import json                                                                                                  

>>> from io import BytesIO                                                                                       

>>> d, b = dict(a=1, b=2), BytesIO()                                                                             

>>> b.write(json.dumps(d).encode())                                                                              
16


>>> b.seek(0)                                                                                                    
0


>>> json.load(b)                                                                                                 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/cmermingas/.pyenv/versions/3.5.0/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/Users/cmermingas/.pyenv/versions/3.5.0/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

the JSON object must be str, not 'bytes'

3.6 3.6

>>> import sys

>>> sys.version
'3.6.0 (default, Jul 10 2017, 22:19:26) \n[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)]'

>>> import json

>>> from io import BytesIO

>>> d, b = dict(a=1, b=2), BytesIO()

>>> b.write(json.dumps(d).encode())
16

>>> b.seek(0)
0

>>> json.load(b)
{'a': 1, 'b': 2}

I would recommend using TextIOWrapper for two reasons:我建议使用TextIOWrapper有两个原因:

  1. It gives you more control: Not only can you specify an encoding, but also how newlines should be handled (which would be relevant if you were parsing csv data, for example) and a number of other things.它为您提供了更多控制权:您不仅可以指定编码,还可以指定如何处理换行符(例如,如果您正在解析 csv 数据,这将是相关的)以及许多其他事情。
  2. It allows you to process the data in a streaming manner.它允许您以流式方式处理数据。 Imagine you have a 10MB file that's not valid json - fileobj.read().decode() would needlessly load all 10MB into memory, but if you use TextIOWrapper then only a few bytes would be loaded before a JsonDecodeError is thrown.想象一下,您有一个 10MB 的文件,它不是有效的 json - fileobj.read().decode()会不必要地将所有 10MB 加载到内存中,但是如果您使用TextIOWrapper那么在抛出JsonDecodeError之前只会加载几个字节。

Since you are dealing with JSON, which is purely texts, you should use io.StringIO instead of io.BytesIO :由于您正在处理纯文本的 JSON,您应该使用io.StringIO而不是io.BytesIO

>>> import json
>>> from io import StringIO
>>> d, b = dict(a=1, b=2), StringIO()
>>> b.write(json.dumps(d))
16
>>> b.seek(0)
0
>>> b.read()
'{"a": 1, "b": 2}'
>>> b.seek(0)
0
>>> json.load(b)
{'a': 1, 'b': 2}

Tested with python 3.5用python 3.5测试

import json
import socket, pycurl
from io import BytesIO

test_url='http://echo.jsontest.com/key/value/one/two' 

s = pycurl.Curl()
buffer= BytesIO()

s.setopt(s.URL, test_url)
s.setopt(s.HTTPHEADER, ['Host:' + 'localhost'])
s.setopt(s.WRITEDATA, buffer)
s.perform()

response = buffer.getvalue()
response = response.decode('utf-8')
# json.loads in python 3.5, not json.load
rj = json.loads(response)
srj = json.dumps(rj, indent=4, sort_keys=True)
print(srj)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM