[英]Convert unstructured Json to structured DataFrame
我正在嘗試閱讀此 github Json(以下網址),其中包含來自足球隊、比賽和球員的信息
這是我的示例代碼:
import json
import pandas as pd
import urllib.request
from pandas import json_normalize
load_path = 'https://raw.githubusercontent.com/henriquepgomide/caRtola/master/data/2021/Mercado_10.txt'
games_2021 = json.loads(urllib.request.urlopen(load_path).read().decode('latin-1'))
games_2021 = json_normalize(games_2021)
games_2021
壞 output:
所需的 output 可以在下面的代碼中看到:
pd.read_csv('https://raw.githubusercontent.com/henriquepgomide/caRtola/master/data/2022/rodada-0.csv')
兩個 url 都包含相同的信息,但是 JSON 文件在我猜的字典模式中,其中初始信息正在翻譯球員和球隊可以擁有的一些值列,而另一個鏈接已經以某種方式清理,在 Csv 結構中.
只需標准化 json 中的'atleta'
鍵即可。 或者只是將其構造成 DataFrame。
import json
import requests
import pandas as pd
load_path = 'https://raw.githubusercontent.com/henriquepgomide/caRtola/master/data/2021/Mercado_10.txt'
jsonData = requests.get(load_path).json()
games_2021 = pd.json_normalize(jsonData['atletas'])
cols = [x for x in games_2021.columns if 'scout.' not in x]
games_2021 = games_2021[cols]
或者
import json
import requests
import pandas as pd
load_path = 'https://raw.githubusercontent.com/henriquepgomide/caRtola/master/data/2021/Mercado_10.txt'
jsonData = requests.get(load_path).json()
games_2021 = pd.DataFrame(jsonData['atletas']).drop('scout', axis=1)
Output:
print(games_2021)
atleta_id ... foto
0 83817 ... https://s.glbimg.com/es/sde/f/2021/06/04/68300...
1 95799 ... https://s.glbimg.com/es/sde/f/2020/07/28/e1784...
2 81798 ... https://s.glbimg.com/es/sde/f/2021/04/19/7d895...
3 68808 ... https://s.glbimg.com/es/sde/f/2021/04/19/ca9f7...
4 92496 ... https://s.glbimg.com/es/sde/f/2020/08/28/8c0a6...
.. ... ... ...
755 50645 ... https://s.glbimg.com/es/sde/f/2021/06/04/fae6b...
756 69345 ... https://s.glbimg.com/es/sde/f/2021/05/01/0f714...
757 110465 ... https://s.glbimg.com/es/sde/f/2021/04/26/a2187...
758 111578 ... https://s.glbimg.com/es/sde/f/2021/04/27/21a13...
759 38315 ... https://s.glbimg.com/es/sde/f/2020/10/09/a19dc...
[760 rows x 15 columns]
然后只需閱讀每個表格並合並即可獲得完整內容:
import json
import requests
import pandas as pd
load_path = 'https://raw.githubusercontent.com/henriquepgomide/caRtola/master/data/2021/Mercado_10.txt'
jsonData = requests.get(load_path).json()
atletas = pd.DataFrame(jsonData['atletas']).drop('scout', axis=1)
clubes = pd.DataFrame(jsonData['clubes'].values())
posicoes = pd.DataFrame(jsonData['posicoes'].values())
status = pd.DataFrame(jsonData['status'].values())
df = atletas.merge(clubes, how='left', left_on='clube_id', right_on='id', suffixes=['', '_clube'])
df = df.merge(posicoes, how='left', left_on='posicao_id', right_on='id', suffixes=['', '_posicao'])
df = df.merge(status, how='left', left_on='status_id', right_on='id', suffixes=['', '_status'])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.