簡體   English   中英

在BeautifulSoup中使用find_all抓取Class

[英]Using find_all in BeautifulSoup to grab Class

我試圖從紐約證券交易所獲取信息,特別是 class“flex_tr”,其 html 路徑為https://www.nyse.com/quote/XNGS:AAPL

html->body->div->div.sticky-header__main->div.landing-section->div.idc-container->div->div->div.row->div.col-lg-12.col-md-12->div.d-widget.d-vbox.d-flex1.DataTable-nyse->div.d-container.d-flex1.d-vbox.d-nowrap.d-justify-start.data-table-container.d-noscroll->div.d-flex1->div.d-vbox->div.d-flex-1.d-scroll-y->div.contentContainer->div.flex_tr

應該抓取大量行,但我目前無法獲取任何行的內容。 我已經嘗試過soup.find_all("div", class_="flex_tr")soup.find_all("div", {"class": "flex_tr"}) ,但似乎無法獲取信息。

from selenium import webdriver

from bs4 import BeautifulSoup

driver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")

driver.get('https://www.nyse.com/quote/XNGS:AAPL')

content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')

flex_tr = soup.find_all(class_="flex_tr")
print(flex_tr)

driver.close()

看起來你在元素加載到頁面之前關閉了驅動程序(返回值是一個空數組)。

Selenium 包含一些模塊,可讓您等待元素加載。 這里的這個問題更多地討論了這一點: 等待頁面加載 Selenium WebDriver for Python

至於你的問題,我能夠通過以下方式解決這個問題(這反映了上述鏈接中的最佳答案):

from selenium import webdriver
from bs4 import BeautifulSoup
# from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By


driver = webdriver.Chrome('./chromedriver')

driver.get('https://www.nyse.com/quote/XNGS:AAPL')

delay = 5 # of seconds for the WebDriverWait param

try:
    element = WebDriverWait(driver, delay).until(expected_conditions.presence_of_element_located((By.CLASS_NAME, "flex_tr")))
    content = driver.page_source
    soup = BeautifulSoup(content, 'html.parser')
    flex_tr = soup.find_all(class_="flex_tr")
    print(flex_tr)
    
except:
    print ("Timeout")

driver.close()

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM