[英]How to generate a json file with a nested dictionary from pandas df?
[英]read in nested columns from json file into pandas df python
我需要将 JSON 文件读入 Pandas df。 JSON 数据如下所示:
{"f0_":{"id":"138307057680","ActionName":"Complete","Time":"2020-04-23-12:40:04"}}
{"f0_":{"id":"138313115245","ActionName":"Midpoint","Time":"2020-06-16-20:41:16"}}
我需要摆脱包含所有列的第一个键。 我试过:
import json
import pandas as pd
from pandas.io.json import json_normalize
data_pd = pd.read_json('db/my_file.json', lines=True)
new_data = json_normalize(data_pd)
错误消息是: AttributeError: 'str' object has no attribute 'values'
所需的输出是:
id ActionName Time
138307057680 Complete 2020-04-23-12:40:04
138313115245 Midpoint 2020-06-16-20:41:16
您可以尝试:
new_data = pd.DataFrame(data_pd['f0_'].values.tolist())
输出:
id ActionName Time
0 138307057680 Complete 2020-04-23-12:40:04
1 138313115245 Midpoint 2020-06-16-20:41:16
您可以在生成数据框之前清除传递给Pandas
的数据,如下例所示:
import json
import pandas as pd
def gen_data(file_path):
with open(file_path) as f:
for line in f.readlines():
if line:
line = json.loads(line)
for value in line.values():
yield value
df = pd.DataFrame(gen_data('db/my_file.json'))
print(df)
输出:
id ActionName Time
0 138307057680 Complete 2020-04-23-12:40:04
1 138313115245 Midpoint 2020-06-16-20:41:16
奖金:
一些速度比较(我使用的是 i7):
如果您先清理数据,然后生成 DF:
>> %timeit pd.DataFrame(gen_data('db/my_file.json'))
519 µs ± 1.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
如果你生成你的 DF 然后清理它:
import pandas as pd
def gen_df_method2(file_path):
data_pd = pd.read_json(file_path, lines=True)
return pd.DataFrame(data_pd['f0_'].values.tolist())
>> %timeit gen_df_method2('db/my_file.json')
2.66 ms ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.