[英]How to create pandas dataframe from web scrape?
我想使用這個web scrape創建一個pandas數據框,這樣我可以將數據導出到excel。 有人熟悉這個嗎? 我在網上和網站上看到了不同的方法,但是無法通過這種方法成功復制結果。
這是迄今為止的代碼:
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
for player in team['home_players']:
print(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])
這個網站似乎很有用,但示例不同:
https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm
這是stackoverflow.com的另一個例子:
我是編碼/抓取的新手,所以任何幫助都會非常感激。 提前感謝您的時間和精力!
我已經添加了一個解決方案,以團隊方式擁有dataframe
,我希望這會有所幫助。 Updated
代碼
import requests
source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
players = []
teams = []
for team in source['data']:
print("\n%s players\n" % team['home_route'].capitalize())
teams.append(team['home_route'].capitalize())
teams.append(team['away_route'].capitalize())
temp = []
temp1 = []
for player in team['home_players']:
print(player['name'])
temp.append(player['name'])
print("\n%s players\n" % team['away_route'].capitalize())
for player in team['away_players']:
print(player['name'])
temp1.append(player['name'])
players.append(temp)
players.append(temp1)
import pandas as pd
df = pd.DataFrame(columns=teams)
for i in range(0, len(df.columns)):
df[df.columns[i]] = players[i]
df
為了導出到excel,你可以做到
df.to_excel('result.xlsx')
Python requests
方便地將json呈現為dict
因此您只需在pd.DataFrame
構造函數中使用dict pd.DataFrame
。
import pandas as pd
df = pd.DataFrame([dict1, dict2, dict3])
# Do your data processing here
df.to_csv("myfile.csv")
Pandas還有pd.io.json
和pd.io.json
的幫助json_normalize
所以一旦你的數據在數據幀中,你就可以將嵌套的json處理成表格數據,依此類推。
你可以嘗試下面..
>>> import pandas as pd
>>> import json
>>> import requests
>>> source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
>>> df = pd.DataFrame.from_dict(source) # directly use source as itself is a dict
現在您可以通過df.to_csv
將數據幀轉換為csv格式,如下所示:
>>> df.to_csv("nba_play.csv")
以下是您可以根據需要處理數據的列。
>>> df.columns
Index(['bottom_header', 'bottom_paragraph', 'data', 'heading',
'intro_paragraph', 'page_title', 'twitter_link'],
dtype='object')
但是正如Charles所說,你可以使用json_normalize
,它可以以表格形式更好地查看數據。
>>> from pandas.io.json import json_normalize
>>> json_normalize(df['data']).head()
away_bets.key away_bets.moneyline away_bets.over_under \
0 ATL 500 o232.0
1 POR 165 o217.0
2 SAC 320 o225.0
3 BKN 110 o216.0
4 TOR -140 o221.0
away_bets.over_under_moneyline away_bets.spread \
0 -115 11.0
1 -115 4.5
2 -105 9.0
3 -105 2.0
4 -105 -2.0
away_bets.spread_moneyline away_bets.total \
0 -110 121.50
1 -105 110.75
2 -115 117.00
3 -110 109.00
4 -115 109.50
away_injuries \
0 [{'name': 'J. Collins', 'profile_url': '/nba/p...
1 [{'name': 'M. Harkless', 'profile_url': '/nba/...
2 [{'name': 'K. Koufos', 'profile_url': '/nba/pl...
3 [{'name': 'T. Graham', 'profile_url': '/nba/pl...
4 [{'name': 'O. Anunoby', 'profile_url': '/nba/p...
away_players away_route \
0 [{'draftkings_projection': 30.04, 'yahoo_posit... atlanta-hawks
1 [{'draftkings_projection': 47.33, 'yahoo_posit... portland-trail-blazers
2 [{'draftkings_projection': 28.88, 'yahoo_posit... sacramento-kings
3 [{'draftkings_projection': 37.02, 'yahoo_posit... brooklyn-nets
4 [{'draftkings_projection': 45.2, 'yahoo_positi... toronto-raptors
... nav.matchup_season nav.matchup_time \
0 ... 2019 2018-10-29T23:00:00+00:00
1 ... 2019 2018-10-29T23:00:00+00:00
2 ... 2019 2018-10-29T23:30:00+00:00
3 ... 2019 2018-10-29T23:30:00+00:00
4 ... 2019 2018-10-30T00:00:00+00:00
nav.status.away_team_score nav.status.home_team_score nav.status.minutes \
0 None None None
1 None None None
2 None None None
3 None None None
4 None None None
nav.status.quarter_integer nav.status.seconds nav.status.status \
0 None Scheduled
1 None Scheduled
2 None Scheduled
3 None Scheduled
4 None Scheduled
nav.updated order
0 2018-10-29T17:51:05+00:00 0
1 2018-10-29T17:51:05+00:00 1
2 2018-10-29T17:51:05+00:00 2
3 2018-10-29T17:51:05+00:00 3
4 2018-10-29T17:51:05+00:00 4
[5 rows x 383 columns]
希望,這會有所幫助
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.