简体   繁体   English

如何将嵌套的 json 展平为 dataframe pandas

[英]how to flattening nested json to dataframe pandas

how to flattening JSON to pd.dataframe like this:如何像这样将 JSON 展平为 pd.dataframe :

class_id|id |schedule_id |schedule_date |lesson_price |status`
    1   | 3 |    1       | 2017-07-11   |   USD 25    | ONGOING
    1   | 3 |    2       | 2016-09-24   |   USD 15    | OPEN REGISTRATION
    1   | 4 |    1       | 2016-12-17   |   USD 19    | ONGOING
    1   | 4 |    2       | 2015-11-12   |   USD 29    | ONGOING
    1   | 4 |    3       | 2015-11-10   |   USD 14    | ON SCHEDULE
    2   | 1 |    1       | 2017-05-21   |   USD 50    | CANCELLED
    2   | 2 |    1       | 2017-06-04   |   USD10     | FINISHED
    2   | 2 |    2       | 2018-03-01   |   USD12     | CLOSED

from JSONJSON

I've tried from this reference but I give me 2 line groupby class_id我已经尝试过这个参考,但我给了我 2 行 groupby class_id

how to show all data schedule with class_id and id from lesson object like the desired dataframe?如何显示课程 object 中的 class_id 和 id 的所有数据计划,如所需的 dataframe?

The difficulty in your data structure comes from你的数据结构的困难来自于

{
  "lesson3": {
    "id": 3,
    "schedule": [
      {
        "schedule_id": "1",
        "schedule_date": "2017-07-11",
        "lesson_price": "USD 25",
        "status": "ONGOING"
      },
      {
        "schedule_id": "2",
        "schedule_date": "2016-09-24",
        "lesson_price": "USD 15",
        "status": "OPEN REGISTRATION"
      }
    ]
  }
}

It would be better to have最好有

{
  "name": "lesson3",
  "id": 3,
  "schedule": [
    {
      "schedule_id": "1",
      "schedule_date": "2017-07-11",
      "lesson_price": "USD 25",
      "status": "ONGOING"
    },
    {
      "schedule_id": "2",
      "schedule_date": "2016-09-24",
      "lesson_price": "USD 15",
      "status": "OPEN REGISTRATION"
    }
  ]
}

But we don't have control on the data we get most of the time.但我们无法控制大部分时间获得的数据。 So we have to get rid of the lesson1, lesson2 keys and move the object up.所以我们必须去掉第1课,第2课的钥匙,把object向上移动。

Solution解决方案

import requests
data = requests.get(url).json()

Extract the distinct lessons提取不同的教训

data_ = [{'class_id': c['class_id'], 'lessons': v} for c in data['class'] for d, v in c['data'].items()]

The data looks like this now现在的数据是这样的

[
  {
    "class_id": "1",
    "lessons": {
      "id": 3,
      "schedule": [
        {
          "schedule_id": "1",
          "schedule_date": "2017-07-11",
          "lesson_price": "USD 25",
          "status": "ONGOING"
        },
        {
          "schedule_id": "2",
          "schedule_date": "2016-09-24",
          "lesson_price": "USD 15",
          "status": "OPEN REGISTRATION"
        }
      ]
    }
  },
  ...
]

Now we can read it into pandas DataFrame using json_normalize现在我们可以使用 json_normalize 将其读入 pandas json_normalize

df = json_normalize(data_, record_path=['lessons', 'schedule'], meta=['class_id', ['lessons', 'id']])

Output Output

  schedule_id schedule_date lesson_price             status class_id lessons.id
0           1    2017-07-11       USD 25            ONGOING        1          3
1           2    2016-09-24       USD 15  OPEN REGISTRATION        1          3
2           1    2016-12-17       USD 19            ONGOING        1          4
3           2    2015-11-12       USD 29            ONGOING        1          4
4           3    2015-11-10       USD 14        ON SCHEDULE        1          4
5           1    2017-05-21       USD 50          CANCELLED        2          1
6           1    2017-06-04        USD10           FINISHED        2          2
7           5    2018-03-01        USD12             CLOSED        2          2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM