简体   繁体   English

如何从压缩文件中读取 json 到 pandas dataframe?

[英]How to read json to pandas dataframe from a zipped file?

I have a very large in size zipped file (1.5G), there are 500 sub folders inside the zipped file.And there are 5000 json file under each sub folder.我有一个非常大的压缩文件(1.5G),压缩文件中有500个子文件夹。每个子文件夹下有5000个json文件。

I would like to read the json to python dataframe and have the code & error like below.我想阅读 json 到 python dataframe 并有如下代码和错误。 Could you pls suggest me how to fix it?你能建议我如何解决它吗? Thanks.谢谢。

with zipfile.ZipFile('20APIJSON.zip', 'r') as z:
   for filename in z.namelist():
      with z.open(filename) as f:
        data = f.read()
        json_file = json.loads(data)

error:错误:

JSONDecodeError                           Traceback (most recent call last)
<ipython-input-6-e235c823f732> in <module>()
      9       with z.open(filename) as f:
     10         data = f.read()
---> 11         json_file = json.loads(data)
     12         l.append([json_file['FullStudy']['Study']['ProtocolSection']['IdentificationModule']['NCTId'],
     13                   json_file['FullStudy']['Study']['ProtocolSection']['StatusModule']['OverallStatus'],

~\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352             parse_int is None and parse_float is None and
    353             parse_constant is None and object_pairs_hook is None and not kw):
--> 354         return _default_decoder.decode(s)
    355     if cls is None:
    356         cls = JSONDecoder

~\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
    337 
    338         """
--> 339         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    340         end = _w(s, end).end()
    341         if end != len(s):

~\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    355             obj, end = self.scan_once(s, idx)
    356         except StopIteration as err:
--> 357             raise JSONDecodeError("Expecting value", s, err.value) from None
    358         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The value of filename in your for loop is the path to the subfolders inside the zipped file and not the JSON file itself. for 循环中filename的值是压缩文件中子文件夹的路径,而不是 JSON 文件本身。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM