繁体   English   中英

beautifulsoup espn table,找不到合适的标签,里面的图片

[英]beautifulsoup espn table, can't find the proper tag, pictures within

我正在尝试从 espn 网站上刮一张桌子。 我似乎无法找到正确的名称来访问它。

d

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
import requests
from bs4 import BeautifulSoup
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')
soup.find_all('table',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

代码只给了我一个空列表:(

为什么不直接获取 flex 类,然后获取玩家表..

import requests
from bs4 import BeautifulSoup

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"


headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url, headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')

all_tables = soup.find('div', {'class':'flex'})
all_tables.find('table') # To get all players name

您选择的标签:

soup.find_all('table',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

应该不是'table'而是'section'

soup.find_all('section',class_ ="ResponsiveTable ResponsiveTable--fixed-left mt4 Table2__title--remove-capitalization")

要获取所有数据,您可以使用以下示例:

import requests
from bs4 import BeautifulSoup

url="https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"
headers={'User-Agent': 'Mozilla/5.0'}
response=requests.get(url,headers=headers)
soup=BeautifulSoup(response.content, 'html.parser')

for tr1, tr2 in zip(soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left tr'),
                    soup.select('table.Table.Table--align-right.Table--fixed.Table--fixed-left ~ div tr')):
    data = tr1.select('td') + tr2.select('td')
    if not data:
        continue
    print('{:<25}'.format(data[1].get_text(strip=True, separator='-').split()[-1]), end=' ')
    for td in data[2:]:
        print('{:<6}'.format(td.get_text(strip=True)), end=' ')
    print()

印刷:

James-LAL                 SF     30     34.9   25.7   9.9    20.2   49.1   2.2    6.4    34.4   3.6    5.3    67.9   7.6    10.6   1.2    0.6    3.9    23     7      26.33  
Rubio-PHX                 PG     25     31.8   13.8   5.0    12.2   41.0   1.1    3.7    30.1   2.7    3.2    84.8   4.8    9.2    1.2    0.2    2.6    11     1      16.30  
Doncic-DAL                SF     26     32.2   29.1   9.4    19.8   47.7   3.0    9.2    32.2   7.3    9.1    79.7   9.6    8.8    1.2    0.1    4.3    17     8      31.43  
Simmons-PHI               PG     32     34.9   14.3   5.9    10.4   56.3   0.1    0.2    40.0   2.5    4.3    58.3   7.0    8.6    2.2    0.6    3.7    15     2      18.92  
Young-ATL                 PG     31     34.9   28.5   9.3    20.9   44.4   3.4    9.3    36.8   6.5    7.7    84.5   4.3    8.3    1.2    0.1    4.7    9      1      23.21  
Graham-CHA                PG     34     34.7   19.2   6.1    15.9   38.2   3.8    9.5    39.8   3.2    4.1    79.7   3.9    7.6    0.8    0.3    3.0    9      0      17.20  
Brogdon-IND               PG     26     31.4   18.3   6.6    14.5   45.2   1.4    4.3    33.3   3.8    4.0    93.3   4.5    7.6    0.9    0.2    2.7    7      0      20.31  
Harden-HOU                SG     31     37.6   38.1   11.1   24.5   45.2   5.1    13.8   37.2   10.9   12.4   87.5   5.8    7.5    1.9    0.7    4.7    9      0      31.72  
Lillard-POR               PG     30     36.7   26.9   8.4    19.0   44.3   3.4    9.4    35.8   6.6    7.4    89.6   4.2    7.5    1.0    0.4    2.9    6      0      24.42  
Westbrook-HOU             PG     28     35.3   24.1   8.9    20.9   42.6   1.2    5.1    23.8   5.1    6.5    79.1   8.1    7.1    1.5    0.4    4.4    12     6      18.68  
VanVleet-TOR              SG     26     36.3   18.1   5.9    14.5   40.5   2.4    6.6    36.8   3.9    4.5    87.2   3.9    7.0    2.0    0.2    2.6    5      0      16.82  
Jokic-DEN                 C      30     31.3   17.6   7.0    14.4   48.5   1.3    4.1    30.6   2.4    3.0    82.0   10.0   6.8    1.0    0.6    2.5    17     6      23.01  

...and so on.

您还可以使用网页用来用玩家信息填充其表格的相同 API。 如果您直接向该 API 发出 GET 请求(使用正确的标头和查询字符串),您将收到符合 JSON 格式的所有玩家信息。

API 的 URL、相关标题和查询字符串 GET-Parameters 都可以在 Google Chrome 的网络日志中看到(大多数现代浏览器都有等价的东西)。 通过应用过滤器并仅保留 XMLHttpRequest (XHR) 资源,然后单击表格底部的“显示更多”按钮,我能够找到它们。

我将"limit" GET-Parameter 设置为"3" ,因为我只对打印与前三个玩家有关的数据感兴趣。 例如,将此字符串更改为"50"将查询前 50 名玩家的 API。

def main():

    import requests

    headers = {
        "accept": "application/json, text/plain, */*",
        "origin": "https://www.espn.com",
        "user-agent": "Mozilla/5.0"
    }

    params = {
        "region": "us",
        "lang": "en",
        "contentorigin": "espn",
        "isqualified": "true",
        "page": "1",
        "limit": "3",
        "sort": "offensive.avgAssists:desc"
    }

    base_url = "https://site.web.api.espn.com/apis/common/v3/sports/basketball/nba/statistics/byathlete"

    response = requests.get(base_url, headers=headers, params=params)
    response.raise_for_status()

    data = response.json()
    print(data["athletes"])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

如果您有table标签,让Pandas为您完成工作。 它在引擎盖下使用 BeautifulSoup。

import pandas as pd

url = "https://www.espn.com/nba/stats/player/_/table/offensive/sort/avgAssists/dir/desc"

dfs = pd.read_html(url)

df = dfs[0].join(dfs[1])
df[['Name','Team']] = df['Name'].str.extract('^(.*?)([A-Z]+)$', expand=True)

输出:

print(df.head(5).to_string())
   RK          Name POS  GP   MIN   PTS  FGM   FGA   FG%  3PM  3PA   3P%  FTM  FTA   FT%  REB   AST  STL  BLK   TO  DD2  TD3    PER Team
0   1  LeBron James  SF  35  35.1  24.9  9.6  19.7  48.6  2.0  6.0  33.8  3.7  5.5  67.7  7.9  11.0  1.3  0.5  3.7   28    9  26.10  LAL
1   2   Ricky Rubio  PG  30  32.0  13.6  4.9  11.9  41.3  1.2  3.7  31.8  2.6  3.1  83.7  4.6   9.3  1.3  0.2  2.5   12    1  16.40  PHX
2   3   Luka Doncic  SF  32  32.8  29.7  9.6  20.2  47.5  3.1  9.4  33.1  7.3  9.1  80.5  9.7   8.9  1.2  0.2  4.2   22   11  31.74  DAL
3   4   Ben Simmons  PG  36  35.4  14.9  6.1  10.8  56.3  0.1  0.1  40.0  2.7  4.6  59.0  7.5   8.6  2.2  0.7  3.6   19    3  19.49  PHI
4   5    Trae Young  PG  34  35.1  28.9  9.3  20.8  44.8  3.5  9.4  37.5  6.7  7.9  85.0  4.3   8.4  1.2  0.1  4.8   11    1  23.47  ATL

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM