使用 python/pandas 从特定文件夹读取几个嵌套的 .json 文件到 excel

Question

我想将文件夹中的几个嵌套 json 文件读取到一个 excel 文件中。 由于大多数 .json 文件彼此不同（每个文件中的各种嵌套级别），这也意味着 excel 文件中的某些列（值）显然需要为 NaN。 我用这个代码读取特定文件没有问题，但是一个一个读取10 000需要一段时间。

import json 
import pandas as pd 
from pandas.io.json import json_normalize 

with open('file1.json','r') as f: #Here I want help, since i need to read 10 000 json files.
    data = json.loads(f.read())
multiple_level_data = pd.json_normalize(data, record_path =['data'], errors='ignore', meta =['total-count'], meta_prefix='config_params_', record_prefix='dbscan_')
multiple_level_data.to_excel('file1converted.xlsx', index=False)

但是，如何修改我的 python 代码以读取文件夹中的所有 json 文件，而不仅仅是 file1.json？

Answer 1

你可以试试os.listdir() ：

import os
import json 
import pandas as pd 
from pandas.io.json import json_normalize 

for js in [x for x in os.listdir() if x.endswith('.json')]:
  with open(js,'r') as f: 
    data = json.loads(f.read())
    multiple_level_data = pd.json_normalize(data, record_path =['data'], errors='ignore', meta =['total-count'], meta_prefix='config_params_', record_prefix='dbscan_')
    multiple_level_data.to_excel(js+'converted.xlsx', index=False)

Answer 2

Wasif 的上述解决方案运行良好，但是我添加了它以将其放入一个 excel 文件中。

df = pd.DataFrame()
for file in files:
     if file.endswith('.xlsx'):
         df = df.append(pd.read_excel(file, engine='openpyxl'), ignore_index=True) 
df.to_excel("AllJsonFilesInOneExcel.xlsx")

谢谢你。

使用 python/pandas 从特定文件夹读取几个嵌套的 .json 文件到 excel

问题描述

2 个解决方案

解决方案1
0 2020-11-16 08:41:48

解决方案2
0 2020-11-16 09:37:56

使用 python/pandas 从特定文件夹读取几个嵌套的 .json 文件到 excel

问题描述

2 个解决方案

解决方案1 0 2020-11-16 08:41:48

解决方案2 0 2020-11-16 09:37:56

解决方案1
0 2020-11-16 08:41:48

解决方案2
0 2020-11-16 09:37:56