简体   繁体   English

刮擦篮球运动员球队名称的最佳方法是什么?

[英]What is the best way to scrape the basketball player's team name?

I can successfully scrape multiple columns, however, I haven't been able to grab the team name of the respective player. 我可以成功刮取多列,但是,我无法获取相应球员的球队名称。 Here is my code so far: 到目前为止,这是我的代码:

from urllib.request import urlopen
from lxml.html import fromstring

import pandas as pd


url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"

content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)


for idx, bball_row in enumerate(tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]')):
    names = bball_row.xpath('.//td[@data-stat="player"]/a')[0].text
    mp = bball_row.xpath('.//td[@data-stat="mp"]/text()')[0]
    per = bball_row.xpath('.//td[@data-stat="per"]/text()')[0]
    ws = bball_row.xpath('.//td[@data-stat="ws"]/text()')[0]
    bpm = bball_row.xpath('.//td[@data-stat="bpm"]/text()')[0]
    vorp = bball_row.xpath('.//td[@data-stat="vorp"]/text()')[0]
    print(names, per, ws, bpm, vorp)

Everything works up to this point. 至此一切正常。 I would like to add the category of the team name, though. 不过,我想添加团队名称的类别。 I am looking for the abbreviated team name (for example, OKC for Oklahoma City). 我正在寻找缩写的团队名称(例如,俄克拉荷马城的OKC)。

The following code ran into an error: 以下代码遇到错误:

team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
    print(team)

The code starts printing all of the team names then runs into an error. 代码开始打印所有团队名称,然后出现错误。

Here is the error: 这是错误:

team = bball_row.xpath('.//td[@data-stat="team_id"]/a')[0].text
IndexError: list index out of range

Just to reiterate what I am looking for... I am to trying to add the abbreviated team name next to the respective player. 只是重申我在寻找什么...我要尝试在相应球员旁边添加缩写的球队名称。

Any suggestions would be greatly appreciated. 任何建议将不胜感激。 I want to thank the community in advance for your time and efforts! 我要在此先感谢社区的时间和努力!

Your script threw that error only when it didn't find that value it looked for. 您的脚本仅在找不到所需的值时才抛出该错误。 What you can do is catch the error and handle it in the right way. 您所能做的就是捕捉错误并以正确的方式进行处理。 Try the below script: 尝试以下脚本:

import requests
from lxml.html import fromstring

url = "https://www.basketball-reference.com/leagues/NBA_2018_advanced.html"

content = requests.get(url).text
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)

for row in tree.xpath('//table[contains(@class,"stats_table")]//tr[contains(@class,"full_table")]'):
    names = row.xpath('.//td[@data-stat="player"]/a')[0].text
    mp = row.xpath('.//td[@data-stat="mp"]/text()')[0]
    per = row.xpath('.//td[@data-stat="per"]/text()')[0]
    ws = row.xpath('.//td[@data-stat="ws"]/text()')[0]
    bpm = row.xpath('.//td[@data-stat="bpm"]/text()')[0]
    vorp = row.xpath('.//td[@data-stat="vorp"]/text()')[0]
    try:
        team = row.xpath('.//td[@data-stat="team_id"]/a')[0].text
    except IndexError: team = "N/A"
    print(names, per, ws, bpm, vorp, team)

Outputt you may get like: 您可能会得到类似的输出:

Alex Abrines 9.0 2.2 -2.2 -0.1 OKC
Quincy Acy 8.2 1.0 -2.2 -0.1 BRK
Steven Adams 20.6 9.7 3.3 3.3 OKC
Bam Adebayo 15.7 4.2 0.2 0.8 MIA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何刮投投手的名字和球队? - How to scrape the pitcher's name and team? 在scrapy中刮掉disqus评论计数的最佳方法是什么? - What's the best way to scrape disqus comment count in scrapy? 在具有 Python 的网站上抓取和 plot 连接页面的最佳方法是什么? - What's the best way to scrape and plot connected pages on a website with Python? 使用scrapy抓取多个域的最佳方法是什么? - what is the best way to scrape multiple domains with scrapy? 抓取该网站的最佳方法是什么? (不是硒) - What would be the best way to scrape this website? (Not Selenium) 在 pygame 中让玩家在每个间隔移动的最佳方法是什么? - What is the best way to make a player move at every interval in pygame? 使用 Beautifulsoup 抓取 Craigslist 位置或城市的更好方法是什么? - What's the better way to scrape Craigslist location or city with Beautifulsoup? 在 Python 中,避免对 __init__ 参数和实例变量使用相同名称的最佳方法是什么? - In Python, what's the best way to avoid using the same name for a __init__ argument and an instance variable? 在python / django中灰度的最佳方法是什么? - What's the best way to greyscale in python/django? 在Flask中显示Exception的最佳方法是什么? - What's the best way to display Exception in Flask?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM