繁体   English   中英

无法刮表 BeautifulSoup

[英]Can't scrape table BeautifulSoup

我试图从这个 URL 中抓取下表: https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show

这是我的代码:

import requests
from bs4 import BeautifulSoup

url = "https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show"
r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

table = soup.find("table")
for row in table.findAll("tr"):
    print([i.text for i in row.findAll("td")])

但是,我的变量table返回None,即使网站的HTML代码中明明有一个表标签。 我怎么得到它?

该网页是动态加载的,依赖JavaScript,因此requests不支持。 您可以使用另一个解析器库,例如selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()

url = "https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show"

driver.get(url)

wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.TAG_NAME, 'table')))

table = driver.find_element(By.TAG_NAME, 'table')

table_html = table.get_attribute('innerHTML')

# print('table html:', table_html)

for tr_web_element in table.find_elements(By.TAG_NAME, 'tr'):
    for td_web_element in tr_web_element.find_elements(By.TAG_NAME, 'td'):
        print(td_web_element.text)

driver.close()

或者查看此答案以将 Selenium 与 BeautifulSoup 合并。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM