简体   繁体   English

如何从具有多个 arrays 的 json 文件创建 pandas dataframe

[英]How to create a pandas dataframe from a json file with multiple arrays

I'm trying to create a pandas dataframe from a json file that I have of my Apple Health Data.我正在尝试从我的 Apple 健康数据的 json 文件创建一个 pandas dataframe。

My json file looks like this:我的 json 文件如下所示:

{
  "data": {
    "workouts": [],
    "metrics": [
      {
        "name": "active_energy",
        "units": "kcal",
        "data": [
          {
            "qty": 213.881,
            "date": "2022-04-12 00:00:00 -0600"
          }
        ]
      },
      {
        "name": "apple_exercise_time",
        "units": "min",
        "data": [
          {
            "date": "2022-04-12 00:00:00 -0600",
            "qty": 6
          }
        ]
      },
      {
        "name": "sleep_analysis",
        "units": "min",
        "data": []
      }
    ]
  }
}

In this data, there is an empty array called workouts and another called metrics .在此数据中,有一个名为workouts的空数组和另一个名为metrics的数组。 I want to take the metrics array from this file and turn it into a pandas dataframe like this:我想从此文件中获取metrics数组并将其转换为 pandas dataframe,如下所示:

date日期 name名称 qty数量 units单位
2022-04-12 2022-04-12 active_energy活跃能量 213.881 213.881 kcal大卡
2022-04-12 2022-04-12 apple_excersise_time apple_excersise_time 6 6个 min分钟

Here's one way using a DataFrame constructor, explode and join :这是使用explode构造函数、分解和join的一种方法:

tmp = pd.DataFrame(my_data['data']['metrics']).explode('data')
s = tmp['data'].dropna()
out = tmp.drop(columns='data').join(pd.DataFrame(s.tolist(), index=s.index))

Output: Output:

                  name units      qty                       date
0        active_energy  kcal  213.881  2022-04-12 00:00:00 -0600
1  apple_exercise_time   min    6.000  2022-04-12 00:00:00 -0600
2       sleep_analysis   min      NaN                        NaN

Someone on SO shared this with me around a month ago.大约一个月前,有人在 SO 上与我分享了这个。 Sharing with you now.现在分享给大家。

import pandas as pd
df = pd.read_json("https://www.chsli.org/sites/default/files/transparency/111888924_GoodSamaritanHospitalMedicalCenter_standardcharges.json", lines=True)
print(df.head())
df.to_csv(r'C:\\your_path_here\\chsli.csv')

Result:结果:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM