[英]How to create a pandas dataframe from a json file with multiple arrays
I'm trying to create a pandas dataframe from a json file that I have of my Apple Health Data.我正在尝试从我的 Apple 健康数据的 json 文件创建一个 pandas dataframe。
My json file looks like this:我的 json 文件如下所示:
{
"data": {
"workouts": [],
"metrics": [
{
"name": "active_energy",
"units": "kcal",
"data": [
{
"qty": 213.881,
"date": "2022-04-12 00:00:00 -0600"
}
]
},
{
"name": "apple_exercise_time",
"units": "min",
"data": [
{
"date": "2022-04-12 00:00:00 -0600",
"qty": 6
}
]
},
{
"name": "sleep_analysis",
"units": "min",
"data": []
}
]
}
}
In this data, there is an empty array called workouts
and another called metrics
.在此数据中,有一个名为
workouts
的空数组和另一个名为metrics
的数组。 I want to take the metrics
array from this file and turn it into a pandas dataframe like this:我想从此文件中获取
metrics
数组并将其转换为 pandas dataframe,如下所示:
date![]() |
name![]() |
qty![]() |
units![]() |
---|---|---|---|
2022-04-12 ![]() |
active_energy![]() |
213.881 ![]() |
kcal![]() |
2022-04-12 ![]() |
apple_excersise_time ![]() |
6 ![]() |
min![]() |
Here's one way using a DataFrame constructor, explode
and join
:这是使用
explode
构造函数、分解和join
的一种方法:
tmp = pd.DataFrame(my_data['data']['metrics']).explode('data')
s = tmp['data'].dropna()
out = tmp.drop(columns='data').join(pd.DataFrame(s.tolist(), index=s.index))
Output: Output:
name units qty date
0 active_energy kcal 213.881 2022-04-12 00:00:00 -0600
1 apple_exercise_time min 6.000 2022-04-12 00:00:00 -0600
2 sleep_analysis min NaN NaN
Someone on SO shared this with me around a month ago.大约一个月前,有人在 SO 上与我分享了这个。 Sharing with you now.
现在分享给大家。
import pandas as pd
df = pd.read_json("https://www.chsli.org/sites/default/files/transparency/111888924_GoodSamaritanHospitalMedicalCenter_standardcharges.json", lines=True)
print(df.head())
df.to_csv(r'C:\\your_path_here\\chsli.csv')
Result:结果:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.