简体   繁体   English

无法刮表 BeautifulSoup

[英]Can't scrape table BeautifulSoup

I'm trying to scrape the following table from this URL: https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show我试图从这个 URL 中抓取下表: https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show

This is my code:这是我的代码:

import requests
from bs4 import BeautifulSoup

url = "https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show"
r = requests.get(url)

soup = BeautifulSoup(r.content, "lxml")

table = soup.find("table")
for row in table.findAll("tr"):
    print([i.text for i in row.findAll("td")])

However, my variable table returns None, even though there is clearly a table tag in the HTML code of the website.但是,我的变量table返回None,即使网站的HTML代码中明明有一个表标签。 How do I get it?我怎么得到它?

The webpage is loaded dynamically and relies on JavaScript, therefore requests won't support it.该网页是动态加载的,依赖JavaScript,因此requests不支持。 You could use another parser library such as selenium .您可以使用另一个解析器库,例如selenium

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()

url = "https://baseballsavant.mlb.com/leaderboard/outs_above_average?type=Fielder&startYear=2022&endYear=2022&split=no&team=&range=year&min=10&pos=of&roles=&viz=show"

driver.get(url)

wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.TAG_NAME, 'table')))

table = driver.find_element(By.TAG_NAME, 'table')

table_html = table.get_attribute('innerHTML')

# print('table html:', table_html)

for tr_web_element in table.find_elements(By.TAG_NAME, 'tr'):
    for td_web_element in tr_web_element.find_elements(By.TAG_NAME, 'td'):
        print(td_web_element.text)

driver.close()

Or see this answer to incorporate Selenium with BeautifulSoup.或者查看此答案以将 Selenium 与 BeautifulSoup 合并。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM