简体   繁体   English

如何通过Python Selenium BeautifulSoup从网站上提取安全价格作为文本

[英]How to extract the price for the security as text from the website through Python Selenium BeautifulSoup

I am trying to simply get the price for the security shown at https://investor.vanguard.com/529-plan/profile/4514 . 我试图简单地获得https://investor.vanguard.com/529-plan/profile/4514上显示的安全价格。 I run this code: 我运行这段代码:

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

When I "inspect element" the price in the selenium-opened Firefox, I clearly see this: 当我在硒打开的Firefox中“检查元素”价格时,我清楚地看到:

<span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding arrange">$42.91</span >

But that data is NOT in my soup. 但那些数据不在我的汤里。 If I print my soup, the html is really quite different from that shown on the website. 如果我打印我的汤,html与网站上显示的非常不同。 I tried this, but it totally fails: 我试过这个,但它完全失败了:

myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})

I am totally stumped. 我完全难过了。 If anyone could point me in the right direction, I would really appreciate it. 如果有人能指出我正确的方向,我会非常感激。 I sense I am totally missing something, possible several things... 我觉得我完全错过了一些东西,可能有几件事......

There is nothing wrong in the way you are using the data_* attributes and values to select the span. 使用data_*属性和值来选择范围的方式没有任何问题。 In fact it is the correct method as mentioned in the documentation .There are 4 span tags that match all the attributes. 实际上,它是文档中提到的正确方法。有4个span标记符合所有属性。 find_all will return all of those tags. find_all将返回所有这些标记。 The second one corresponds to the price. 第二个对应于价格。

What you missed out on is that the span takes some time to be loaded and the page source is returned before that. 您错过的是跨度需要一些时间来加载并在此之前返回页面源。 You can explicitly wait for that span and then get the page source. 您可以显式等待该范围,然后获取页面源。 Here i am using Xpath to wait for the element. 这里我使用Xpath来等待元素。 You can get the xpath by going to the inspect tool -> right click element -> copy -> copy xpath 您可以通过转到inspect tool -> right click element -> copy -> copy xpath获取xpath inspect tool -> right click element -> copy -> copy xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

Output 产量

[<span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Unit price as of 02/15/2019</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">$42.91</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Change</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer"><span class="number-positive">$0.47</span> <span class="number-positive">1.11%</span></span>]
$42.91

Selenium alone can can be sufficient to extract the desired text. 单独的可以足以提取所需的文本。 You need to induce WebDriverWait for the visibility_of_element_located and you can use the following solution: 您需要为visibility_of_element_located引入WebDriverWait ,您可以使用以下解决方案:

  • Code Block: 代码块:

     from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox(executable_path=r'C:\\Utility\\BrowserDrivers\\geckodriver.exe') driver.get('https://investor.vanguard.com/529-plan/profile/4514') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='ng-scope']//td[@class='ng-scope right']//span[@class='ng-scope ng-binding arrange' and @data-ng-bind-html]"))).get_attribute("innerHTML")) 
  • Console Output: 控制台输出:

     $42.91 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python 和 BeautifulSoup 从本网站提取表格 - How to extract tables from this website with Python and BeautifulSoup 如何通过Python使用Selenium从网页中提取文本$7.56 - How to extract the text $7.56 from the webpage using Selenium through Python 如何在python中使用硒和beautifulsoup从网站上抓报纸? - How to scrape newspaper articles from website using selenium and beautifulsoup in python? 如何使用 python 中的 BeautifulSoup package 从网站中提取 href 内容 - how to extract a href content from a website using BeautifulSoup package in python Python BeautifulSoup-如何提取此文本 - Python BeautifulSoup - How to extract this text 如何使用 Selenium 和 Python 从网站以数字形式获取价格 - How to get the price as a number from a website using Selenium and Python 在 Python 中使用 beautifulsoup 从网站中提取数字 - Extract number from a website using beautifulsoup in Python Python - 使用 Beautifulsoup 从网站中提取字符串 - Python - Extract string from website with Beautifulsoup 如何使用 Selenium 和 Python 提取小计价格 - How to extract the Subtotal Price using Selenium and Python 如何在 Python 中使用 BeautifulSoup 从 html 中提取特定文本? - How to extract specific text from html using BeautifulSoup in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM