繁体   English   中英

Python,BeautifulSoup4 TypeError:find() 不接受关键字参数

[英]Python, BeautifulSoup4 TypeError: find() takes no keyword arguments

大家,我想用beautifulsoup4解析html并写下这段代码:

from selenium import webdriver
from django.core.management.base import BaseCommand
import datetime
from bs4 import BeautifulSoup as bs


url = "https://www.basketball-reference.com/leagues/NBA_2020.html"
main_url = "https://www.basketball-reference.com"
browser = webdriver.Chrome()
browser.set_window_size(1920, 1080)
browser.minimize_window()
browser.get(url)
soup = bs(browser.page_source, 'lxml')
team_urls = []
try:
    tables = soup.find('table', id='team-stats-per_game')
    for tr in tables.tbody:
        team_name = tr.find('a')
        try:
           if type(team_name) != int:
                if type(team_name) != 'NoneType':
                    team_url = team_name.get('href')
                    team_urls.append(team_url)
        except:
            pass
except Exception as e:
    print(e)
for team in team_urls:
    browser2 = webdriver.Chrome()
    browser2.minimize_window()
    browser2.get(main_url + team)
    team_soup = bs(browser2.page_source, 'lxml')
    team_op_stats = team_soup.find('table', id='team_and_opponent').find_all('tbody')
    for t1_stats in team_op_stats[0]:
        if t1_stats.find('th', attrs={'class', 'left'}):
            print(t1_stats)
        print("##" * 50)
    browser2.quit()
    break
browser.quit()

此代码输出:

File "C:\Users\ysfnm\PycharmProjects\denemee\denemee\apps\result\management\commands\nba.py", line 46, in handle
    if t1_stats.find('th', attrs={'class', 'left'}):
TypeError: find() takes no keyword arguments

经过我的研究,发现给其他收到同样错误的朋友的回答如下:

您不是在调用 BeautifulSoup 的 .find(),而是在普通字符串对象(BeautifulSoup 对象的 .text 属性)上调用它。

但:

            for t1_stats in team_op_stats[0]:
                print(t1_stats)
                print("##" * 50)

此代码输出:

<tr>
<th class="left" data-stat="player" scope="row">Team/G</th>
<td class="center iz" data-stat="g"></td>
<td class="center" data-stat="mp_per_g">240.7</td>
<td class="center" data-stat="fg_per_g">43.8</td>
<td class="center" data-stat="fga_per_g">91.0</td>
<td class="center" data-stat="fg_pct">.481</td>
<td class="center" data-stat="fg3_per_g">14.0</td>
<td class="center" data-stat="fg3a_per_g">39.1</td>
<td class="center" data-stat="fg3_pct">.359</td>
<td class="center" data-stat="fg2_per_g">29.7</td>
<td class="center" data-stat="fg2a_per_g">51.9</td>
<td class="center" data-stat="fg2_pct">.573</td>
<td class="center" data-stat="ft_per_g">17.7</td>
<td class="center" data-stat="fta_per_g">24.3</td>
<td class="center" data-stat="ft_pct">.727</td>
<td class="center" data-stat="orb_per_g">10.0</td>
<td class="center" data-stat="drb_per_g">41.5</td>
<td class="center" data-stat="trb_per_g">51.5</td>
<td class="center" data-stat="ast_per_g">26.0</td>
<td class="center" data-stat="stl_per_g">7.7</td>
<td class="center" data-stat="blk_per_g">6.5</td>
<td class="center" data-stat="tov_per_g">14.7</td>
<td class="center" data-stat="pf_per_g">19.3</td>
<td class="center" data-stat="pts_per_g">119.2</td>
</tr>

我的错在哪里?

  • 更改: if t1_stats.find('th', attrs={'class', 'left'}):
  • 到: if t1_stats.find('th', attrs={'class': 'left'}):

然后

  • 更改: for t1_stats in team_op_stats[0]:
  • for t1_stats in team_op_stats:

然而

使用 Selenium 是一个缓慢的过程。 其中的表格在评论中。 您可以使用请求,然后使用 BeautifulSoup 提取评论,然后使用 Pandas 抓取其中的表格。 处理速度会快很多。

我不完全确定您想要哪个表,但从您上面显示的内容来看,看起来像团队统计数据:

代码:

import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd


url = "https://www.basketball-reference.com/leagues/NBA_2020.html"
main_url = "https://www.basketball-reference.com"

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')


team_urls = []
teams = soup.find_all('div', {'class':'division'})
for each in teams:
    links = each.find_all('a', href=True)
    for link in links:
        team_urls.append(main_url + link['href'])





for team in team_urls:
    response = requests.get(team)
    soup = BeautifulSoup(response.text, 'html.parser')

    seas = soup.find('h1').find_all('span')[0].text
    teamName = soup.find('h1').find_all('span')[1].text

    comments = soup.find_all(string=lambda text: isinstance(text, Comment))

    tables = []
    for each in comments:
        if 'table' in each:
            try:
                tables.append(pd.read_html(each)[0])
            except:
                continue
    print ('%s %s' %(seas, teamName))        
    print (tables[2].to_string())
    print("##" * 50)

输出样本:

2019-20 Toronto Raptors
   Unnamed: 0     G     MP     FG    FGA     FG%     3P    3PA    3P%      2P     2PA     2P%     FT    FTA    FT%    ORB   DRB   TRB    AST     STL    BLK    TOV     PF    PTS
0        Team  37.0   8955   1459   3270   0.446    489   1330  0.368     970    1940   0.500    680    850  0.800    381  1335  1716    915     305    198    549    775   4087
1      Team/G   NaN  242.0   39.4   88.4   0.446   13.2   35.9  0.368    26.2    52.4   0.500   18.4   23.0  0.800   10.3  36.1  46.4   24.7     8.2    5.4   14.8   20.9  110.5
2     Lg Rank   NaN     11     21     19  22.000      5      7  6.000      27      24  25.000     10     16  4.000     15     8     9     11       7      9     14     15     15
3   Year/Year   NaN  -0.2%  -6.5%  -0.8%  -0.027   6.8%   6.4%  0.001  -12.1%   -5.2%  -0.039   4.0%   4.5% -0.004   7.4%  1.3%  2.6%  -2.7%   -0.6%   0.4%   5.8%  -0.4%  -3.5%
4    Opponent  37.0   8955   1402   3313   0.423    470   1416  0.332     932    1897   0.491    615    811  0.758    432  1310  1742    921     249    208    610    744   3889
5  Opponent/G   NaN  242.0   37.9   89.5   0.423   12.7   38.3  0.332    25.2    51.3   0.491   16.6   21.9  0.758   11.7  35.4  47.1   24.9     6.7    5.6   16.5   20.1  105.1
6     Lg Rank   NaN     11      2     17   2.000     26     29  3.000       2       4   3.000     10     11  7.000     29    17    26     22       2     26      2     19      4
7   Year/Year   NaN  -0.2%  -5.9%  -0.1%  -0.026  18.1%  22.6% -0.013  -14.6%  -12.3%  -0.014  -2.6%  -1.7% -0.007  10.4%  3.5%  5.2%   1.4%  -11.3%  25.3%  10.4%  -2.0%  -3.0%
####################################################################################################
2019-20 Boston Celtics
   Unnamed: 0     G     MP     FG    FGA     FG%     3P    3PA     3P%     2P    2PA     2P%     FT    FTA    FT%    ORB    DRB    TRB     AST    STL    BLK   TOV    PF    PTS
0        Team  34.0   8185   1384   3036   0.456    403   1151   0.350    981   1885   0.520    598    749  0.798    372   1200   1572     784    277    210   474   720   3769
1      Team/G   NaN  240.7   40.7   89.3   0.456   11.9   33.9   0.350   28.9   55.4   0.520   17.6   22.0  0.798   10.9   35.3   46.2    23.1    8.1    6.2  13.9  21.2  110.9
2     Lg Rank   NaN     30     15     16  17.000     16     13  20.000     13     15  11.000     14     22  6.000      8     14     10      21      9      6     8    16     14
3   Year/Year   NaN  -0.2%  -3.3%  -1.4%  -0.009  -5.8%  -1.9%  -0.015  -2.2%  -1.0%  -0.006  12.5%  13.0% -0.004  11.6%   1.6%   3.8%  -12.3%  -5.4%  16.4%  8.7%  4.0%  -1.4%
4    Opponent  34.0   8185   1281   2932   0.437    397   1156   0.343    884   1776   0.498    561    761  0.737    346   1156   1502     775    233    187   539   711   3520
5  Opponent/G   NaN  240.7   37.7   86.2   0.437   11.7   34.0   0.343   26.0   52.2   0.498   16.5   22.4  0.737   10.2   34.0   44.2    22.8    6.9    5.5  15.9  20.9  103.5
6     Lg Rank   NaN     30      1      6   4.000     13     18   9.000      4      6   8.000      8     16  1.000     16      8      9       5      3     22     6    14      1
7   Year/Year   NaN  -0.2%  -4.6%  -2.1%  -0.012   1.4%   1.5%  -0.000  -7.1%  -4.3%  -0.015  -5.4%  -2.0% -0.027  -2.1%  -4.3%  -3.8%   -3.7%   1.1%  42.3%  4.7%  7.0%  -4.1%
####################################################################################################

如果您只关注每场比赛的球队统计数据,您可以在https://www.basketball-reference.com/leagues/NBA_2019.html#all_team-stats-base以 1 次请求获得它。 有趣的是,1 个链接和每个团队的单独链接之间的统计数据是不同的(不知道为什么)。

import requests
from bs4 import BeautifulSoup
from bs4 import Comment
import pandas as pd



url = 'https://www.basketball-reference.com/leagues/NBA_2020.html#all_team-stats-base'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')

comments = soup.find_all(string=lambda text: isinstance(text, Comment))

tables = []
for each in comments:
    if 'table' in each:
        try:
            tables.append(pd.read_html(each)[0])
        except:
            continue


print (tables[1])

输出:

print (tables[1].to_string())
      Rk                    Team   G     MP    FG   FGA    FG%    3P   3PA    3P%    2P   2PA    2P%    FT   FTA    FT%   ORB   DRB   TRB   AST  STL  BLK   TOV    PF    PTS
0    1.0          Boston Celtics  34  240.7  37.7  86.2  0.437  11.7  34.0  0.343  26.0  52.2  0.498  16.5  22.4  0.737  10.2  34.0  44.2  22.8  6.9  5.5  15.9  20.9  103.5
1    2.0          Denver Nuggets  36  242.1  39.0  86.1  0.453  10.7  32.6  0.327  28.3  53.5  0.529  16.6  21.9  0.757  10.1  33.4  43.5  24.0  7.1  4.7  14.4  20.6  105.2
2    3.0               Utah Jazz  36  240.0  39.2  89.3  0.439  10.8  31.8  0.341  28.4  57.6  0.493  16.7  21.7  0.769   9.8  34.3  44.0  20.5  7.9  5.0  12.6  21.2  105.9
3    4.0           Orlando Magic  37  240.0  38.8  86.4  0.449  11.9  33.6  0.354  26.9  52.8  0.509  14.3  19.0  0.754   9.8  36.8  46.5  23.1  6.9  4.3  15.1  19.2  103.8
4    5.0              Miami Heat  36  244.2  38.3  86.9  0.441  12.2  37.3  0.327  26.1  49.7  0.526  18.4  24.1  0.764   9.4  32.3  41.7  24.0  8.1  4.2  14.3  22.1  107.3
5    6.0      Los Angeles Lakers  37  240.7  38.1  87.1  0.437  11.0  32.6  0.337  27.1  54.5  0.498  17.7  22.2  0.797   9.7  32.3  42.0  22.9  8.2  4.1  16.0  21.3  104.9
6    7.0         Toronto Raptors  37  242.0  37.9  89.5  0.423  12.7  38.3  0.332  25.2  51.3  0.491  16.6  21.9  0.758  11.7  35.4  47.1  24.9  6.7  5.6  16.5  20.1  105.1
7    8.0          Indiana Pacers  37  242.0  39.2  88.6  0.442  10.8  32.3  0.336  28.3  56.3  0.503  17.0  21.8  0.780  10.4  34.6  45.1  23.4  6.5  4.8  14.3  18.9  106.2
8    9.0        Dallas Mavericks  36  242.1  40.6  90.9  0.447  11.4  33.8  0.337  29.2  57.1  0.511  16.7  21.6  0.773  11.1  34.5  45.6  23.3  7.2  3.9  12.7  21.4  109.3
9   10.0           Chicago Bulls  37  241.4  38.2  83.7  0.457  10.9  32.5  0.336  27.3  51.2  0.534  19.8  26.1  0.761  10.5  36.6  47.2  24.1  8.3  6.5  18.3  19.9  107.2
10  11.0   Oklahoma City Thunder  37  242.7  40.8  89.9  0.454  10.8  31.3  0.343  30.1  58.6  0.513  15.0  18.7  0.802  10.6  34.4  45.1  22.7  6.9  4.2  14.3  23.0  107.4
11  12.0         Houston Rockets  35  241.4  42.3  92.2  0.459  12.5  35.7  0.351  29.8  56.6  0.527  16.7  22.3  0.751  10.8  35.1  45.9  26.1  7.9  4.7  15.4  21.6  113.9
12  13.0       San Antonio Spurs  35  244.3  42.6  92.0  0.463  12.5  34.5  0.361  30.2  57.5  0.525  17.1  22.3  0.766   9.7  36.1  45.8  25.1  7.2  4.7  12.6  19.8  114.8
13  14.0           Brooklyn Nets  36  243.5  40.7  93.9  0.433  12.2  34.4  0.354  28.5  59.4  0.479  18.1  23.4  0.771  11.5  35.9  47.4  21.2  7.8  5.6  13.5  21.2  111.7
14  15.0      Philadelphia 76ers  38  241.3  39.1  85.6  0.457   9.8  27.6  0.355  29.3  57.9  0.505  18.0  24.3  0.738   8.2  32.5  40.7  21.9  7.4  4.0  14.2  20.9  105.9
15  16.0         Milwaukee Bucks  38  240.7  38.6  93.4  0.414  14.2  38.4  0.370  24.4  55.0  0.444  15.8  20.6  0.769   9.8  36.3  46.0  23.9  7.2  4.6  14.5  21.4  107.3
16  17.0  Minnesota Timberwolves  36  244.9  41.8  91.6  0.457  11.2  31.4  0.356  30.6  60.1  0.509  19.6  24.8  0.788  11.1  37.4  48.5  23.3  7.4  5.5  15.7  22.3  114.4
17  18.0        Sacramento Kings  38  242.6  39.5  84.9  0.465  11.7  33.4  0.349  27.8  51.5  0.540  17.9  22.5  0.796   9.3  33.8  43.1  24.3  8.1  4.3  15.3  19.0  108.5
18  19.0         New York Knicks  37  240.7  39.5  85.9  0.460  13.6  35.1  0.386  25.9  50.8  0.511  19.3  26.2  0.739  10.2  36.2  46.3  23.9  7.0  4.9  14.2  19.9  111.9
19  20.0    Los Angeles Clippers  38  240.7  39.4  89.7  0.439  11.9  34.4  0.346  27.5  55.3  0.497  19.0  24.7  0.768  10.9  34.4  45.3  22.8  8.4  5.0  15.3  23.5  109.8
20  21.0     Cleveland Cavaliers  37  240.7  43.4  89.7  0.484  12.7  33.7  0.377  30.7  56.0  0.549  14.1  18.3  0.770   9.9  33.5  43.5  25.9  8.8  6.6  12.9  19.6  113.6
21  22.0         Detroit Pistons  38  240.0  41.7  88.1  0.474  11.4  30.4  0.377  30.3  57.7  0.525  16.2  20.9  0.774  10.2  33.1  43.3  25.1  8.2  5.8  14.1  20.1  111.1
22  23.0            Phoenix Suns  37  242.0  41.8  87.5  0.477  11.9  32.0  0.373  29.8  55.4  0.538  19.7  25.6  0.771   9.0  36.0  45.1  23.8  7.6  5.6  16.2  23.4  115.2
23  24.0   Golden State Warriors  38  242.0  41.6  88.4  0.471  13.5  34.8  0.387  28.1  53.6  0.525  16.2  21.0  0.774  10.4  35.8  46.3  25.2  8.1  5.4  16.3  20.4  112.9
24  25.0  Portland Trail Blazers  38  240.7  40.8  91.9  0.444  12.4  34.3  0.361  28.4  57.5  0.494  19.6  25.6  0.767  11.8  36.2  47.9  23.6  7.2  5.3  13.0  19.9  113.6
25  26.0      Washington Wizards  36  240.7  43.4  89.1  0.487  12.3  33.3  0.369  31.1  55.8  0.557  21.1  26.9  0.786  10.6  35.9  46.5  25.6  7.0  5.5  15.6  21.3  120.1
26  27.0    New Orleans Pelicans  37  242.0  41.9  89.7  0.468  12.6  34.1  0.369  29.4  55.6  0.528  20.4  25.6  0.797   9.8  36.5  46.3  24.5  7.7  4.4  15.1  19.9  116.9
27  28.0       Charlotte Hornets  39  241.9  42.2  88.3  0.479  12.4  34.9  0.355  29.8  53.4  0.559  14.2  18.3  0.773  10.8  35.3  46.1  27.0  8.3  4.8  14.9  21.2  111.0
28  29.0           Atlanta Hawks  37  242.0  42.8  90.0  0.476  11.5  32.1  0.359  31.3  57.8  0.541  20.1  26.0  0.774  11.2  35.5  46.8  24.6  8.9  6.6  15.8  20.5  117.3
29  30.0       Memphis Grizzlies  38  240.7  41.8  90.1  0.464  12.3  33.8  0.365  29.5  56.3  0.524  20.3  25.7  0.790   9.9  35.1  44.9  25.1  7.9  5.4  14.6  19.8  116.3
30   NaN          League Average  37  241.7  40.4  88.9  0.455  11.9  33.6  0.355  28.5  55.3  0.516  17.6  22.9  0.771  10.3  35.0  45.3  24.0  7.6  5.1  14.8  20.8  110.4

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM