繁体   English   中英

如何从压缩文件中读取 json 到 pandas dataframe?

[英]How to read json to pandas dataframe from a zipped file?

我有一个非常大的压缩文件(1.5G),压缩文件中有500个子文件夹。每个子文件夹下有5000个json文件。

我想阅读 json 到 python dataframe 并有如下代码和错误。 你能建议我如何解决它吗? 谢谢。

with zipfile.ZipFile('20APIJSON.zip', 'r') as z:
   for filename in z.namelist():
      with z.open(filename) as f:
        data = f.read()
        json_file = json.loads(data)

错误:

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-6-e235c823f732> in <module>()
      9       with z.open(filename) as f:
     10         data = f.read()
---> 11         json_file = json.loads(data)
     12         l.append([json_file['FullStudy']['Study']['ProtocolSection']['IdentificationModule']['NCTId'],
     13                   json_file['FullStudy']['Study']['ProtocolSection']['StatusModule']['OverallStatus'],

~\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

~\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

~\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

for 循环中filename的值是压缩文件中子文件夹的路径,而不是 JSON 文件本身。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM