[英]how to flattening nested json to dataframe pandas
how to flattening JSON to pd.dataframe like this:如何像这样将 JSON 展平为 pd.dataframe :
class_id|id |schedule_id |schedule_date |lesson_price |status`
1 | 3 | 1 | 2017-07-11 | USD 25 | ONGOING
1 | 3 | 2 | 2016-09-24 | USD 15 | OPEN REGISTRATION
1 | 4 | 1 | 2016-12-17 | USD 19 | ONGOING
1 | 4 | 2 | 2015-11-12 | USD 29 | ONGOING
1 | 4 | 3 | 2015-11-10 | USD 14 | ON SCHEDULE
2 | 1 | 1 | 2017-05-21 | USD 50 | CANCELLED
2 | 2 | 1 | 2017-06-04 | USD10 | FINISHED
2 | 2 | 2 | 2018-03-01 | USD12 | CLOSED
I've tried from this reference but I give me 2 line groupby class_id我已经尝试过这个参考,但我给了我 2 行 groupby class_id
how to show all data schedule with class_id and id from lesson object like the desired dataframe?如何显示课程 object 中的 class_id 和 id 的所有数据计划,如所需的 dataframe?
The difficulty in your data structure comes from你的数据结构的困难来自于
{
"lesson3": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
}
It would be better to have最好有
{
"name": "lesson3",
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
But we don't have control on the data we get most of the time.但我们无法控制大部分时间获得的数据。 So we have to get rid of the lesson1, lesson2 keys and move the object up.
所以我们必须去掉第1课,第2课的钥匙,把object向上移动。
import requests
data = requests.get(url).json()
Extract the distinct lessons提取不同的教训
data_ = [{'class_id': c['class_id'], 'lessons': v} for c in data['class'] for d, v in c['data'].items()]
The data looks like this now现在的数据是这样的
[
{
"class_id": "1",
"lessons": {
"id": 3,
"schedule": [
{
"schedule_id": "1",
"schedule_date": "2017-07-11",
"lesson_price": "USD 25",
"status": "ONGOING"
},
{
"schedule_id": "2",
"schedule_date": "2016-09-24",
"lesson_price": "USD 15",
"status": "OPEN REGISTRATION"
}
]
}
},
...
]
Now we can read it into pandas DataFrame using json_normalize
现在我们可以使用 json_normalize 将其读入 pandas
json_normalize
df = json_normalize(data_, record_path=['lessons', 'schedule'], meta=['class_id', ['lessons', 'id']])
schedule_id schedule_date lesson_price status class_id lessons.id
0 1 2017-07-11 USD 25 ONGOING 1 3
1 2 2016-09-24 USD 15 OPEN REGISTRATION 1 3
2 1 2016-12-17 USD 19 ONGOING 1 4
3 2 2015-11-12 USD 29 ONGOING 1 4
4 3 2015-11-10 USD 14 ON SCHEDULE 1 4
5 1 2017-05-21 USD 50 CANCELLED 2 1
6 1 2017-06-04 USD10 FINISHED 2 2
7 5 2018-03-01 USD12 CLOSED 2 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.