简体   繁体   English

从网站 Beautifulsoup 抓取特定表

[英]Scraping specific table from a website Beautifulsoup

I want to get a specific table from this website named Form table (last 8) https://www.soccerstats.com/pmatch.asp?league=italy&stats=145-7-5-2022 but I got AttributeError: 'NoneType' object has no attribute 'text'我想从这个网站获取一个名为 Form table (last 8) https://www.soccerstats.com/pmatch.asp?league=italy&stats=145-7-5-2022的特定表格,但我得到了AttributeError: 'NoneType' object has no attribute 'text'

Code代码

  headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36'}
  s = requests.Session()
  s.headers.update(headers)

  response = requests.get(link, headers=headers)
  soup = BeautifulSoup(response.text, 'html.parser')

  standings_forms = soup.find_all('table', border='0', cellspacing='0', cellpadding='0', width='100%')
  for t in standings_forms:
    if t.find('b').text == 'Form table (last 8)':
      print(t)

Try the following script to get the required information from that particular table.尝试使用以下脚本从该特定表中获取所需信息。 Before executing the script, make sure to upgrade your bs4 version by running this command pip install bs4 --upgrade as I have used pseudo css selectors within the script which bs4 supports only when it is of the latest version or at least equal to version 4.7.0.在执行脚本之前,请确保通过运行此命令升级您的 bs4 版本pip install bs4 --upgrade因为我在脚本中使用了伪 css 选择器,只有当它是最新版本或至少等于版本4.7.0.时 bs4 才支持4.7.0.

import requests
from bs4 import BeautifulSoup

link = 'https://www.soccerstats.com/pmatch.asp?league=italy&stats=145-7-5-2022'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    res = s.get(link)
    soup = BeautifulSoup(res.text,"html.parser")
    for item in soup.select("table:has(> tr > td > b:contains('Form table')) table > tr")[1:]:
        name = item.select("td")[0].get_text(strip=True)
        gp = item.select("td")[1].get_text(strip=True)
        pts = item.select("td")[2].get_text(strip=True)
        print((name,gp,pts))

The above script generates the following output:上述脚本生成如下 output:

('Atalanta', '8', '20')
('Inter Milan', '8', '17')
('AC Milan', '8', '16')
('Napoli', '8', '15')
('Juventus', '8', '13')
('Bologna', '8', '13')
('Fiorentina', '8', '12')
('Sassuolo', '8', '12')
('Hellas Verona', '8', '12')
('AS Roma', '8', '10')
('Empoli', '8', '10')
('Lazio', '8', '10')
('Venezia', '8', '10')
('Torino', '8', '9')
('Sampdoria', '8', '9')
('Udinese', '8', '8')
('Spezia', '8', '7')
('Cagliari', '8', '6')
('Genoa', '8', '5')
('Salernitana', '8', '4')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM