[英]What is the best way to scrape the basketball player's team name?
I can successfully scrape multiple columns, however, I haven't been able to grab the team name of the respective player. 我可以成功刮取多列,但是,我无法获取相应球员的球队名称。 Here is my code so far:
到目前为止,这是我的代码:
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
print(names, per, ws, bpm, vorp)
Everything works up to this point. 至此一切正常。 I would like to add the category of the team name, though.
不过,我想添加团队名称的类别。 I am looking for the abbreviated team name (for example, OKC for Oklahoma City).
我正在寻找缩写的团队名称(例如,俄克拉荷马城的OKC)。
The following code ran into an error: 以下代码遇到错误:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
print(team)
The code starts printing all of the team names then runs into an error. 代码开始打印所有团队名称,然后出现错误。
Here is the error: 这是错误:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range
Just to reiterate what I am looking for... I am to trying to add the abbreviated team name next to the respective player. 只是重申我在寻找什么...我要尝试在相应球员旁边添加缩写的球队名称。
Any suggestions would be greatly appreciated. 任何建议将不胜感激。 I want to thank the community in advance for your time and efforts!
我要在此先感谢社区的时间和努力!
Your script threw that error only when it didn't find that value it looked for. 您的脚本仅在找不到所需的值时才抛出该错误。 What you can do is catch the error and handle it in the right way.
您所能做的就是捕捉错误并以正确的方式进行处理。 Try the below script:
尝试以下脚本:
import requests
from lxml.html import fromstring
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
names = row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
try:
team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
except IndexError: team = "N/A"
print(names, per, ws, bpm, vorp, team)
Outputt you may get like: 您可能会得到类似的输出:
Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.