繁体   English   中英

从网站 Beautifulsoup 抓取特定表

[英]Scraping specific table from a website Beautifulsoup

我想从这个网站获取一个名为 Form table (last 8) https://www.soccerstats.com/pmatch.asp?league=italy&stats=145-7-5-2022的特定表格,但我得到了AttributeError: 'NoneType' object has no attribute 'text'

代码

  headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
  s = requests.Session()
  s.headers.update(headers)

  response = requests.get(link, headers=headers)
  soup = BeautifulSoup(response.text, 'html.parser')

  standings_forms = soup.find_all('table', border='0', cellspacing='0', cellpadding='0', width='100%')
  for t in standings_forms:
    if t.find('b').text == 'Form table (last 8)':
      print(t)

尝试使用以下脚本从该特定表中获取所需信息。 在执行脚本之前,请确保通过运行此命令升级您的 bs4 版本pip install bs4 --upgrade因为我在脚本中使用了伪 css 选择器,只有当它是最新版本或至少等于版本4.7.0.时 bs4 才支持4.7.0.

import requests
from bs4 import BeautifulSoup

link = 'https://www.soccerstats.com/pmatch.asp?league=italy&stats=145-7-5-2022'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.get(link)
    soup = BeautifulSoup(res.text,"html.parser")
    for item in soup.select("table:has(> tr > td > b:contains('Form table')) table > tr")[1:]:
        name = item.select("td")[0].get_text(strip=True)
        gp = item.select("td")[1].get_text(strip=True)
        pts = item.select("td")[2].get_text(strip=True)
        print((name,gp,pts))

上述脚本生成如下 output:

('Atalanta', '8', '20')
('Inter Milan', '8', '17')
('AC Milan', '8', '16')
('Napoli', '8', '15')
('Juventus', '8', '13')
('Bologna', '8', '13')
('Fiorentina', '8', '12')
('Sassuolo', '8', '12')
('Hellas Verona', '8', '12')
('AS Roma', '8', '10')
('Empoli', '8', '10')
('Lazio', '8', '10')
('Venezia', '8', '10')
('Torino', '8', '9')
('Sampdoria', '8', '9')
('Udinese', '8', '8')
('Spezia', '8', '7')
('Cagliari', '8', '6')
('Genoa', '8', '5')
('Salernitana', '8', '4')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM