[英]Flatten JSON to pandas using json_normalize
我需要展平 XML 數據,然后將其轉換為 pandas 的 JSON。問題是,我希望 passage ( <passage gare="87271460">
) 的值位於 dataframe 的每一行。
(對於上下文:Gare 是一個火車站,有一個 ID。我像這樣調用 API https://api.transilien.com/gare/87271460/depart我計划調用 17 個不同的火車站)
我不知道該怎么做。 這是我到目前為止所擁有的
response = requests.request("GET", url, headers=headers, data=payload)
# print(response.text)
dict = xmltodict.parse(response.content)
# print(dict)
s = json.dumps(dict).replace('\'', '"').replace('#', '').replace('@', '')
json_object = json.loads(s)
# print(json_object)
df = pd.json_normalize(json_object['passages'], record_path=['train'])
print(df)
這是我從請求中檢索到的 XML(刪除不需要的字符后)
<?xml version="1.0" encoding="UTF-8"?>
<passages gare="87271460">
<train>
<date mode="R">10/04/2022 14:05</date>
<num>PIST64</num>
<miss>PIST</miss>
<term>87758896</term>
</train>
<train>
<date mode="R">10/04/2022 14:09</date>
<num>KALI66</num>
<miss>KALI</miss>
<term>87393579</term>
</train>
</passages>
我需要的最終 output 是:
passage num miss term etat date.mode date.text
0 87271460 ERBE85 ERBE 87271486 Supprimé R 10/04/2022 16:09
1 87271460 PINS74 PINS 87758896 NaN R 10/04/2022 16:10
2 87271460 PINS80 PINS 87758896 NaN R 10/04/2022 16:17
3 87271460 KARE82 KARE 87758623 Supprimé R 10/04/2022 16:23
4 87271460 EPAU81 EPAU 87001479 NaN R 10/04/2022 16:29
5 87271460 ERIO91 ERIO 87001479 NaN R 10/04/2022 16:30
6 87271460 PINS86 PINS 87758896 NaN R 10/04/2022 16:32
7 87271460 KARE88 KARE 87393579 NaN R 10/04/2022 16:38
8 87271460 ERBE97 ERBE 87271486 Supprimé R 10/04/2022 16:39
9 87271460 EPAU93 EPAU 87001479 NaN R 10/04/2022 16:43
10 87271460 PINS92 PINS 87758896 NaN R 10/04/2022 16:47
11 87271460 EPIN99 EPIN 87001479 NaN R 10/04/2022 16:52
12 87271460 KARE94 KARE 87758623 Supprimé R 10/04/2022 16:53
13 87271460 ERAN67 ERAN 87001479 NaN R 10/04/2022 16:54
14 87271460 EPOL69 EPOL 87001479 NaN R 10/04/2022 17:01
15 87271460 PINS98 PINS 87758896 NaN R 10/04/2022 17:02
16 87271460 KABE02 KABE 87393579 NaN R 10/04/2022 17:08
17 87271460 ERAN73 ERAN 87001479 NaN R 10/04/2022 17:09
18 87271460 EPOL75 EPOL 87001479 NaN R 10/04/2022 17:16
19 87271460 PITA06 PITA 87758896 NaN R 10/04/2022 17:17
20 87271460 KABE08 KABE 87393579 NaN R 10/04/2022 17:23
21 87271460 ERAN79 ERAN 87001479 NaN R 10/04/2022 17:24
22 87271460 EPOL81 EPOL 87001479 NaN R 10/04/2022 17:31
23 87271460 PITA12 PITA 87758896 NaN R 10/04/2022 17:32
24 87271460 KABE14 KABE 87393579 NaN R 10/04/2022 17:38
25 87271460 ERAN85 ERAN 87001479 NaN R 10/04/2022 17:39
26 87271460 EPOL87 EPOL 87001479 NaN R 10/04/2022 17:46
27 87271460 PITA18 PITA 87758896 NaN R 10/04/2022 17:47
28 87271460 KABE20 KABE 87393579 NaN R 10/04/2022 17:53
29 87271460 ERAN91 ERAN 87001479 NaN R 10/04/2022 17:54
您可以從完整的 json (而不僅僅是passage
)創建您的 Dataframe ,然后將gare
欄加入規范化的train
欄:
response = """<?xml version="1.0" encoding="UTF-8"?>
<passages gare="87271460">
<train>
<date mode="R">10/04/2022 14:05</date>
<num>PIST64</num>
<miss>PIST</miss>
<term>87758896</term>
</train>
<train>
<date mode="R">10/04/2022 14:09</date>
<num>KALI66</num>
<miss>KALI</miss>
<term>87393579</term>
</train>
</passages>
"""
dict = xmltodict.parse(response)
s = json.dumps(dict).replace('\'', '"').replace('#', '').replace('@', '')
json_object = json.loads(s)
df = pd.DataFrame.from_dict(json_object, orient='index')
df = df.explode('train').reset_index(drop=True)
df = df.join(pd.json_normalize(df['train'])).drop('train', 1)
print(df)
Output:
gare num miss term date.mode date.text
0 87271460 PIST64 PIST 87758896 R 10/04/2022 14:05
1 87271460 KALI66 KALI 87393579 R 10/04/2022 14:09
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.