簡體   English   中英

如何通過Python Selenium BeautifulSoup從網站上提取安全價格作為文本

[英]How to extract the price for the security as text from the website through Python Selenium BeautifulSoup

我試圖簡單地獲得https://investor.vanguard.com/529-plan/profile/4514上顯示的安全價格。 我運行這段代碼:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

當我在硒打開的Firefox中“檢查元素”價格時,我清楚地看到:

<span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding arrange">$42.91</span >

但那些數據不在我的湯里。 如果我打印我的湯,html與網站上顯示的非常不同。 我試過這個,但它完全失敗了:

myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})

我完全難過了。 如果有人能指出我正確的方向,我會非常感激。 我覺得我完全錯過了一些東西,可能有幾件事......

使用data_*屬性和值來選擇范圍的方式沒有任何問題。 實際上,它是文檔中提到的正確方法。有4個span標記符合所有屬性。 find_all將返回所有這些標記。 第二個對應於價格。

您錯過的是跨度需要一些時間來加載並在此之前返回頁面源。 您可以顯式等待該范圍,然后獲取頁面源。 這里我使用Xpath來等待元素。 您可以通過轉到inspect tool -> right click element -> copy -> copy xpath獲取xpath inspect tool -> right click element -> copy -> copy xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

產量

[<span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Unit price as of 02/15/2019</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">$42.91</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Change</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer"><span class="number-positive">$0.47</span> <span class="number-positive">1.11%</span></span>]
$42.91

單獨的可以足以提取所需的文本。 您需要為visibility_of_element_located引入WebDriverWait ,您可以使用以下解決方案:

  • 代碼塊:

     from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox(executable_path=r'C:\\Utility\\BrowserDrivers\\geckodriver.exe') driver.get('https://investor.vanguard.com/529-plan/profile/4514') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='ng-scope']//td[@class='ng-scope right']//span[@class='ng-scope ng-binding arrange' and @data-ng-bind-html]"))).get_attribute("innerHTML")) 
  • 控制台輸出:

     $42.91 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM