如何从具有多个 arrays 的 json 文件创建 pandas dataframe

Question

I'm trying to create a pandas dataframe from a json file that I have of my Apple Health Data.我正在尝试从我的 Apple 健康数据的 json 文件创建一个 pandas dataframe。

My json file looks like this:我的 json 文件如下所示：

{
  "data": {
    "workouts": [],
    "metrics": [
      {
        "name": "active_energy",
        "units": "kcal",
        "data": [
          {
            "qty": 213.881,
            "date": "2022-04-12 00:00:00 -0600"
          }
        ]
      },
      {
        "name": "apple_exercise_time",
        "units": "min",
        "data": [
          {
            "date": "2022-04-12 00:00:00 -0600",
            "qty": 6
          }
        ]
      },
      {
        "name": "sleep_analysis",
        "units": "min",
        "data": []
      }
    ]
  }
}

In this data, there is an empty array called workouts and another called metrics .在此数据中，有一个名为workouts的空数组和另一个名为metrics的数组。 I want to take the metrics array from this file and turn it into a pandas dataframe like this:我想从此文件中获取metrics数组并将其转换为 pandas dataframe，如下所示：

date日期	name名称	qty数量	units单位
2022-04-12 2022-04-12	active_energy活跃能量	213.881 213.881	kcal大卡
2022-04-12 2022-04-12	apple_excersise_time apple_excersise_time	6 6个	min分钟

Answer 1

Here's one way using a DataFrame constructor, explode and join :这是使用explode构造函数、分解和join的一种方法：

tmp = pd.DataFrame(my_data['data']['metrics']).explode('data')
s = tmp['data'].dropna()
out = tmp.drop(columns='data').join(pd.DataFrame(s.tolist(), index=s.index))

Output: Output：

                  name units      qty                       date
0        active_energy  kcal  213.881  2022-04-12 00:00:00 -0600
1  apple_exercise_time   min    6.000  2022-04-12 00:00:00 -0600
2       sleep_analysis   min      NaN                        NaN

Answer 2

Someone on SO shared this with me around a month ago.大约一个月前，有人在 SO 上与我分享了这个。 Sharing with you now.现在分享给大家。

import pandas as pd
df = pd.read_json("https://www.chsli.org/sites/default/files/transparency/111888924_GoodSamaritanHospitalMedicalCenter_standardcharges.json", lines=True)
print(df.head())
df.to_csv(r'C:\\your_path_here\\chsli.csv')

Result:结果：

如何从具有多个 arrays 的 json 文件创建 pandas dataframe

问题描述

2 个解决方案

解决方案1
2 已采纳

解决方案2
0 2022-04-13 02:44:41

如何从具有多个 arrays 的 json 文件创建 pandas dataframe

问题描述

2 个解决方案

解决方案1 2 已采纳

解决方案2 0 2022-04-13 02:44:41

解决方案1
2 已采纳

解决方案2
0 2022-04-13 02:44:41