如何通过Python Selenium BeautifulSoup从网站上提取安全价格作为文本

Question

I am trying to simply get the price for the security shown at https://investor.vanguard.com/529-plan/profile/4514 . 我试图简单地获得https://investor.vanguard.com/529-plan/profile/4514上显示的安全价格。 I run this code: 我运行这段代码：

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox(executable_path=r'C:\Program_Files_EllieTheGoodDog\Geckodriver\geckodriver.exe')
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
html = driver.page_source
soup = BeautifulSoup(html, 'lxml')

When I "inspect element" the price in the selenium-opened Firefox, I clearly see this: 当我在硒打开的Firefox中“检查元素”价格时，我清楚地看到：

<span data-ng-if="!data.isLayer" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" class="ng-scope ng-binding arrange">$42.91</span >

But that data is NOT in my soup. 但那些数据不在我的汤里。 If I print my soup, the html is really quite different from that shown on the website. 如果我打印我的汤，html与网站上显示的非常不同。 I tried this, but it totally fails: 我试过这个，但它完全失败了：

myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})

I am totally stumped. 我完全难过了。 If anyone could point me in the right direction, I would really appreciate it. 如果有人能指出我正确的方向，我会非常感激。 I sense I am totally missing something, possible several things... 我觉得我完全错过了一些东西，可能有几件事......

Answer 1

There is nothing wrong in the way you are using the data_* attributes and values to select the span. 使用data_*属性和值来选择范围的方式没有任何问题。 In fact it is the correct method as mentioned in the documentation .There are 4 span tags that match all the attributes. 实际上，它是文档中提到的正确方法。有4个span标记符合所有属性。 find_all will return all of those tags. find_all将返回所有这些标记。 The second one corresponds to the price. 第二个对应于价格。

What you missed out on is that the span takes some time to be loaded and the page source is returned before that. 您错过的是跨度需要一些时间来加载并在此之前返回页面源。 You can explicitly wait for that span and then get the page source. 您可以显式等待该范围，然后获取页面源。 Here i am using Xpath to wait for the element. 这里我使用Xpath来等待元素。 You can get the xpath by going to the inspect tool -> right click element -> copy -> copy xpath 您可以通过转到inspect tool -> right click element -> copy -> copy xpath获取xpath inspect tool -> right click element -> copy -> copy xpath

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://investor.vanguard.com/529-plan/profile/4514')
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH ,'/html/body/div[1]/div[3]/div[3]/div[1]/div/div[1]/div/div/div/div[2]/div/div[3]/div[1]/div/div/table/tbody/tr[1]/td[2]/div/span[1]')))
html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')
myspan = soup.find_all('span', attrs={'data-ng-if': '!data.isLayer', 'data-ng-bind-html': 'data.value', 'data-ng-class': '{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}', 'class': 'ng-scope ng-binding arrange'})
print(myspan)
print(myspan[1].text)

Output 产量

[<span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Unit price as of 02/15/2019</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">$42.91</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer">Change</span>, <span class="ng-scope ng-binding arrange" data-ng-bind-html="data.value" data-ng-class="{sceIsLayer : isETF, arrange : isMutualFund, arrangeSec : isETF}" data-ng-if="!data.isLayer"><span class="number-positive">$0.47</span> <span class="number-positive">1.11%</span></span>]
$42.91

Answer 2

Selenium alone can can be sufficient to extract the desired text. 单独的硒可以足以提取所需的文本。 You need to induce WebDriverWait for the visibility_of_element_located and you can use the following solution: 您需要为visibility_of_element_located引入WebDriverWait ，您可以使用以下解决方案：

Code Block: 代码块：

 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC driver = webdriver.Firefox(executable_path=r'C:\\Utility\\BrowserDrivers\\geckodriver.exe') driver.get('https://investor.vanguard.com/529-plan/profile/4514') print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//tr[@class='ng-scope']//td[@class='ng-scope right']//span[@class='ng-scope ng-binding arrange' and @data-ng-bind-html]"))).get_attribute("innerHTML"))

Console Output: 控制台输出：
```
 $42.91 
```

如何通过Python Selenium BeautifulSoup从网站上提取安全价格作为文本

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-02-16 10:07:12

解决方案2
1 2019-02-16 11:44:25

如何通过Python Selenium BeautifulSoup从网站上提取安全价格作为文本

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-02-16 10:07:12

解决方案2 1 2019-02-16 11:44:25

解决方案1
2 已采纳 2019-02-16 10:07:12

解决方案2
1 2019-02-16 11:44:25