使用 python 中的 selenium 从网站获取价值

Question

I am doing my first steps with Selenium in Python and want to extract a certain value from a webpage.我正在用 Python 中的 Selenium 做我的第一步，并想从网页中提取某个值。 The value i need to find on the webpage is the ID (Melde-ID), which is 355460. In the html i found the 2 lines containing my info:我需要在网页上找到的值是 ID（Melde-ID），即 355460。在 html 中，我找到了包含我的信息的 2 行：

<h3 _ngcontent-wwf-c32="" class="title"> Melde-ID: 355460 </h3><span _ngcontent-wwf-c32="">
<div _ngcontent-wwf-c27="" class="label"> Melde-ID </div><div _ngcontent-wwf-c27="" class="value">

I have been searching websites for about 2 hours for what command to use but i don't know what to actually search for in the html.我已经在网站上搜索了大约 2 个小时以查找要使用的命令，但我不知道在 html 中实际搜索什么。 The website is a html with.js modules.该网站是一个带有.js模块的html。 It works to open the URL over selenium.它可以在 selenium 上打开 URL。

(At first i tried using beautifulsoup but was not able to open the page for some restriction. I did verify that the robots.txt does not disallow anything, but the error on beautifulsoup was "Unfortunately, a problem occurred while forwarding your request to the backend server".) （起初我尝试使用 beautifulsoup 但由于某些限制而无法打开页面。我确实验证了 robots.txt 没有禁止任何内容，但 beautifulsoup 上的错误是“不幸的是，将您的请求转发到后端服务器”。）

I would be thankful for any advice and hope i did explain my issue.我会感谢任何建议，并希望我确实解释了我的问题。 The code i tried to create in Jupyter Notebook with Selenium installed is as follows:我尝试在安装了 Selenium 的 Jupyter Notebook 中创建的代码如下：

from selenium import webdriver
import codecs
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://...."
driver = webdriver.Chrome('./chromedriver')
driver.implicitly_wait(0.5)
#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)
#print(driver.page_source)
#Try 2
#print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//a[normalize-space()='Melde-ID']")))])
#close browser
driver.quit()

Answer 1

From the information you shared here we can see that the element containing the desired information doesn't have class name attribute with a value of Melde-ID .从您在此处共享的信息中，我们可以看到包含所需信息的元素不具有值为Melde-ID的 class 名称属性。
It has class name with value of title and contains text Melde-ID .它的名称为 class ，名称为title并包含文本Melde-ID 。
Also, you should use webdriver wait expected condition instead of driver.implicitly_wait(0.5) .此外，您应该使用 webdriver 等待预期条件而不是driver.implicitly_wait(0.5) 。
With these changes your code can be something like this:通过这些更改，您的代码可以是这样的：

from selenium import webdriver
import codecs
import os
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

url = "https://...."
driver = webdriver.Chrome('./chromedriver')

wait = WebDriverWait(driver, 20)

#maximize browser
driver.maximize_window()
#launch URL
driver.get(url)

content = wait.until(EC.visibility_of_element_located((By.XPATH, "//*[contains(@class,'title') and contains(.,'Melde-ID:')]"))).text

I added .text to extract the text from that web element.我添加了.text以从该 web 元素中提取文本。
Now content should contain Melde-ID: 355460 value.现在content应该包含Melde-ID: 355460值。

Answer 2

Given the HTML:鉴于 HTML：

<h3 _ngcontent-wwf-c32="" class="title"> Melde-ID: 355460 </h3>
<span _ngcontent-wwf-c32="">
    <div _ngcontent-wwf-c27="" class="label"> Melde-ID </div>
    <div _ngcontent-wwf-c27="" class="value">

To extract the text 355460 you need to induce WebDriverWait for the visibility_of_element_located() and extracting the text you have to split the text with respect to the : character and print the second part using either of the following locator strategies :要提取文本355460 ，您需要为visibility_of_element_located()引入WebDriverWait并提取文本，您必须根据:字符拆分文本并使用以下任一定位器策略打印第二部分：

Using CSS_SELECTOR and text attribute:使用CSS_SELECTOR和text属性：

 print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "h3.title"))).text.split(':')[1])

Using XPATH and get_attribute("innerHTML") :使用XPATH和get_attribute("innerHTML") ：

 print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[@class='title' and text()]"))).get_attribute("innerHTML").split(':')[1])

Note : You have to add the following imports:注意：您必须添加以下导入：

 from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python您可以在如何使用 Selenium - Python 检索 WebElement 的文本中找到相关讨论

使用 python 中的 selenium 从网站获取价值

问题描述

2 个解决方案

解决方案1
0 已采纳 2022-08-18 09:44:52

解决方案2
0 2022-08-18 11:13:54

使用 python 中的 selenium 从网站获取价值

问题描述

2 个解决方案

解决方案1 0 已采纳 2022-08-18 09:44:52

解决方案2 0 2022-08-18 11:13:54

解决方案1
0 已采纳 2022-08-18 09:44:52

解决方案2
0 2022-08-18 11:13:54