刮擦篮球运动员球队名称的最佳方法是什么？

Question

I can successfully scrape multiple columns, however, I haven't been able to grab the team name of the respective player. 我可以成功刮取多列，但是，我无法获取相应球员的球队名称。 Here is my code so far: 到目前为止，这是我的代码：

from urllib.request import urlopen
from lxml.html import fromstring

import pandas as pd


url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"

content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)


for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
    names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
    mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
    per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
    ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
    bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
    vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
    print(names, per, ws, bpm, vorp)

Everything works up to this point. 至此一切正常。 I would like to add the category of the team name, though. 不过，我想添加团队名称的类别。 I am looking for the abbreviated team name (for example, OKC for Oklahoma City). 我正在寻找缩写的团队名称（例如，俄克拉荷马城的OKC）。

The following code ran into an error: 以下代码遇到错误：

team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
    print(team)

The code starts printing all of the team names then runs into an error. 代码开始打印所有团队名称，然后出现错误。

Here is the error: 这是错误：

team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range

Just to reiterate what I am looking for... I am to trying to add the abbreviated team name next to the respective player. 只是重申我在寻找什么...我要尝试在相应球员旁边添加缩写的球队名称。

Any suggestions would be greatly appreciated. 任何建议将不胜感激。 I want to thank the community in advance for your time and efforts! 我要在此先感谢社区的时间和努力！

Answer 1

Your script threw that error only when it didn't find that value it looked for. 您的脚本仅在找不到所需的值时才抛出该错误。 What you can do is catch the error and handle it in the right way. 您所能做的就是捕捉错误并以正确的方式进行处理。 Try the below script: 尝试以下脚本：

import requests
from lxml.html import fromstring

url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"

content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)

for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
    names = row.xpath('.//td[@data-stat="player"]/a')[0].text
    mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
    per = row.xpath('.//td[@data-stat="per"]/text()')[0]
    ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
    bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
    vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
    try:
        team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
    except IndexError: team = "N/A"
    print(names, per, ws, bpm, vorp, team)

Outputt you may get like: 您可能会得到类似的输出：

Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA

刮擦篮球运动员球队名称的最佳方法是什么？

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-10-20 17:17:11

刮擦篮球运动员球队名称的最佳方法是什么？

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-10-20 17:17:11

解决方案1
1 已采纳 2018-10-20 17:17:11