简体   繁体   English

如何通过网络抓住NBA的首发阵容?

[英]How to web scrape the starting lineup for the NBA?

I am new to web scraping and could use some help. 我是网络抓取的新手,可以使用一些帮助。 I would like to scrape the NBA's starting lineup, the teams and the player's positions using Xpath. 我想用Xpath抓住NBA的首发阵容,球队和球员的位置。 I only starting on the names because I was running into an issue. 我只是从名字开始,因为我遇到了一个问题。

Here is my code so far: 到目前为止,这是我的代码:

from urllib.request import urlopen
from lxml.html import fromstring 


url = "https://www.lineups.com/nba/lineups"

content = str(urlopen(url).read())
comment = content.replace("-->","").replace("<!--","")
tree = fromstring(comment)


for nba, bball_row in enumerate(tree.xpath('//tr[contains(@class,"t-content")]')):
    names = bball_row.xpath('.//span[@_ngcontent-c5="long-player-name"]/text()')[0]
    print(names)

It looks like the program runs without error but the names do not print. 它看起来像程序运行没有错误,但名称不打印。 Any tips on how to parse using Xpath more efficiently would be greatly appreciated. 任何有关如何更有效地使用Xpath解析的提示将非常感激。 I tried messing with Xpath helper and Xpath Finder. 我试过搞乱Xpath助手和Xpath Finder。 Maybe there are some tricks on there in order to make the process easier. 也许在那里有一些技巧,以使过程更容易。 Thanks in advance for your time and effort! 提前感谢您的时间和精力!

Required content located inside script node that looks like 位于script节点内的所需内容

<script nonce="STATE_TRANSFER_TOKEN">window['TRANSFER_STATE'] = {...}</script>

You can try to do following to extract data as simple Python dictionary: 您可以尝试执行以下操作以将数据提取为简单的Python字典:

import re
import json
import requests

source = requests.get("https://www.lineups.com/nba/lineups").text
dictionary = json.loads(re.search(r"window\['TRANSFER_STATE'\]\s=\s(\{.*\})<\/script>", source).group(1))

Optionally: Paste the output of dictionary here and click "Beautify" to see data as readable JSON 可选:在此处粘贴dictionary的输出,然后单击“Beautify”将数据视为可读的JSON

Then you can access required value by key, eg 然后,您可以通过键访问所需的值,例如

for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['home_players']:
    print(player['name'])

Kyrie Irving
Jaylen Brown
Jayson Tatum
Gordon Hayward
Al Horford

for player in dictionary['https://api.lineups.com/nba/fetch/lineups/gateway']['data'][0]['away_players']:
    print(player['name'])

D.J. Augustin
Evan Fournier
Jonathan Isaac
Aaron Gordon
Nikola Vucevic

Update 更新

I guess I just made it overcomplicated :) 我想我只是让它过于复杂:)

It should be as simple as below: 它应该如下所示:

import requests

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()
for player in source['data'][0]['away_players']:
        print(player['name'])

Update 2 更新2

To get all teams lineups use below: 要获得所有球队的阵容,请使用以下内容:

import requests

source = requests.get("https://api.lineups.com/nba/fetch/lineups/gateway").json()

for team in source['data']:
    print("\n%s players\n" % team['home_route'].capitalize())
    for player in team['home_players']:
        print(player['name'])
    print("\n%s players\n" % team['away_route'].capitalize())
    for player in team['away_players']:
        print(player['name'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM