I can successfully scrape multiple columns, however, I haven't been able to grab the team name of the respective player. Here is my code so far:
from urllib.request import urlopen
from lxml.html import fromstring
import pandas as pd
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
print(names, per, ws, bpm, vorp)
Everything works up to this point. I would like to add the category of the team name, though. I am looking for the abbreviated team name (for example, OKC for Oklahoma City).
The following code ran into an error:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
print(team)
The code starts printing all of the team names then runs into an error.
Here is the error:
team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range
Just to reiterate what I am looking for... I am to trying to add the abbreviated team name next to the respective player.
Any suggestions would be greatly appreciated. I want to thank the community in advance for your time and efforts!
Your script threw that error only when it didn't find that value it looked for. What you can do is catch the error and handle it in the right way. Try the below script:
import requests
from lxml.html import fromstring
url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"
content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)
for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
names = row.xpath('.//td[@data-stat="player"]/a')[0].text
mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
per = row.xpath('.//td[@data-stat="per"]/text()')[0]
ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
try:
team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
except IndexError: team = "N/A"
print(names, per, ws, bpm, vorp, team)
Outputt you may get like:
Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.