[英]AttributeError: 'NoneType' object has no attribute 'text' - BeautifulShop
I have a little code for scraping info from fbref (link for data: https://fbref.com/en/comps/9/stats/Premier-League-Stats ) and it worked well but now I have some problems with some features (I've checked that the fields which don't work now are"player","nationality","position","squad","age","birth_year").我有一些用于从 fbref抓取信息的代码(数据链接: https://fbref.com/en/comps/9/stats/Premier-League-Stats ),它运行良好,但现在我在某些功能上遇到了一些问题(我检查了现在不起作用的字段是“玩家”、“国籍”、“位置”、“小队”、“年龄”、“出生年份”)。 I have also checked that the fields have the same name in the web that it used to be.我还检查了 web 中的字段是否与以前的名称相同。 Any ideas/help to solve the problem?有什么想法/帮助解决问题吗?
Many Thanks!非常感谢!
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
import sys, getopt
import csv
def get_tables(url):
res = requests.get(url)
## The next two lines get around the issue with comments breaking the parsing.
comm = re.compile("<!--|-->")
soup = BeautifulSoup(comm.sub("",res.text),'lxml')
all_tables = soup.findAll("tbody")
team_table = all_tables[0]
player_table = all_tables[1]
return player_table, team_table
def get_frame(features, player_table):
pre_df_player = dict()
features_wanted_player = features
rows_player = player_table.find_all('tr')
for row in rows_player:
if(row.find('th',{"scope":"row"}) != None):
for f in features_wanted_player:
cell = row.find("td",{"data-stat": f})
a = cell.text.strip().encode()
text=a.decode("utf-8")
if(text == ''):
text = '0'
if((f!='player')&(f!='nationality')&(f!='position')&(f!='squad')&(f!='age')&(f!='birth_year')):
text = float(text.replace(',',''))
if f in pre_df_player:
pre_df_player[f].append(text)
else:
pre_df_player[f] = [text]
df_player = pd.DataFrame.from_dict(pre_df_player)
return df_player
stats = ["player","nationality","position","squad","age","birth_year","games","games_starts","minutes","goals","assists","pens_made","pens_att","cards_yellow","cards_red","goals_per90","assists_per90","goals_assists_per90","goals_pens_per90","goals_assists_pens_per90","xg","npxg","xa","xg_per90","xa_per90","xg_xa_per90","npxg_per90","npxg_xa_per90"]
def frame_for_category(category,top,end,features):
url = (top + category + end)
player_table, team_table = get_tables(url)
df_player = get_frame(features, player_table)
return df_player
top='https://fbref.com/en/comps/9/'
end='/Premier-League-Stats'
df1 = frame_for_category('stats',top,end,stats)
df1
I suggest loading the table with panda's read_html
.我建议用熊猫的read_html
加载表格。 There is a direct link to this table under Share & Export --> Embed this Table.在 Share & Export --> Embed this Table 下有一个指向该表的直接链接。
import pandas as pd
df = pd.read_html("https://widgets.sports-reference.com/wg.fcgi?css=1&site=fb&url=%2Fen%2Fcomps%2F9%2Fstats%2FPremier-League-Stats&div=div_stats_standard", header=1)
This outputs a list of dataframes, the table can be accessed as df[0]
.这会输出一个数据帧列表,该表可以作为df[0]
访问。 Output df[0].head()
: Output df[0].head()
:
Rk Rk | Player播放器 | Nation国家 | Pos位置 | Squad队 | Age年龄 | Born出生 | MP国会议员 | Starts开始 | Min敏 | 90s 90年代 | Gls玻璃钢 | Ast阿斯特 | G-PK G-PK | PK PK | PKatt PKatt | CrdY CrdY | CrdR CRDR | Gls.1 GLS.1 | Ast.1 Ast.1 | G+A G+A | G-PK.1 G-PK.1 | G+A-PK G+A-PK | xG xG | npxG npxG | xA xA | npxG+xA npxG+xA | xG.1 xG.1 | xA.1 xA.1 | xG+xA xG+xA | npxG.1 npxG.1 | npxG+xA.1 npxG+xA.1 | Matches火柴 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 0 | 1 1 | Patrick van Aanholt帕特里克·范·安霍尔特 | nl NED内德 | DF东风 | Crystal Palace水晶皇宫 | 30-190 30-190 | 1990 1990 | 16 16 | 15 15 | 1324 1324 | 14.7 14.7 | 0 0 | 1 1 | 0 0 | 0 0 | 0 0 | 1 1 | 0 0 | 0 0 | 0.07 0.07 | 0.07 0.07 | 0 0 | 0.07 0.07 | 1.2 1.2 | 1.2 1.2 | 0.8 0.8 | 2 2 | 0.08 0.08 | 0.05 0.05 | 0.13 0.13 | 0.08 0.08 | 0.13 0.13 | Matches火柴 |
1 1 | 2 2 | Tammy Abraham塔米亚伯拉罕 | eng ENG英文 | FW固件 | Chelsea切尔西 | 23-156 23-156 | 1997 1997 | 20 20 | 12 12 | 1021 1021 | 11.3 11.3 | 6 6 | 1 1 | 6 6 | 0 0 | 0 0 | 0 0 | 0 0 | 0.53 0.53 | 0.09 0.09 | 0.62 0.62 | 0.53 0.53 | 0.62 0.62 | 5.6 5.6 | 5.6 5.6 | 0.9 0.9 | 6.5 6.5 | 0.49 0.49 | 0.08 0.08 | 0.57 0.57 | 0.49 0.49 | 0.57 0.57 | Matches火柴 |
2 2 | 3 3 | Che Adams切亚当斯 | eng ENG英文 | FW固件 | Southampton南安普敦 | 24-237 24-237 | 1996 1996 | 26 26 | 22 22 | 1985 1985年 | 22.1 22.1 | 5 5 | 4 4 | 5 5 | 0 0 | 0 0 | 1 1 | 0 0 | 0.23 0.23 | 0.18 0.18 | 0.41 0.41 | 0.23 0.23 | 0.41 0.41 | 5.5 5.5 | 5.5 5.5 | 4.3 4.3 | 9.9 9.9 | 0.25 0.25 | 0.2 0.2 | 0.45 0.45 | 0.25 0.25 | 0.45 0.45 | Matches火柴 |
3 3 | 4 4 | Tosin Adarabioyo托辛·阿达拉比约 | eng ENG英文 | DF东风 | Fulham富勒姆 | 23-164 23-164 | 1997 1997 | 23 23 | 23 23 | 2070 2070 | 23 23 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 1 1 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 1 1 | 1 1 | 0.1 0.1 | 1.1 1.1 | 0.04 0.04 | 0.01 0.01 | 0.05 0.05 | 0.04 0.04 | 0.05 0.05 | Matches火柴 |
4 4 | 5 5 | Adrián阿德里安 | es ESP ESP | GK GK | Liverpool利物浦 | 34-063 34-063 | 1987 1987年 | 3 3 | 3 3 | 270 270 | 3 3 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | 0 0 | Matches火柴 |
If you're only after the player stats, change player_table = all_tables[1]
to player_table = all_tables[2]
, because now you are feeding team table into get_frame
function.如果您只关注球员统计数据, player_table = all_tables[1]
更改为player_table = all_tables[2]
,因为现在您将团队表格输入get_frame
function。
I tried it and it worked fine after that.我试过了,之后效果很好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.