![](/img/trans.png)
[英]How to extract the text from the HTML using Selenium and Python
[英]How to extract the text H MATTHEWS from the html using Selenium and Python
通過使用“包含” function 如何從這種類型的 html 結構中提取信息,我正在嘗試抓取“H MATTHEWS”此信息
HTML:
<p>
<strong>Date Published:</strong>
20 APRIL 2020
<br>
<strong>Closing Date / Time:</strong>
TUESDAY, 05 MAY 2020
<br>
<strong>Enquiries:</strong>
<br>
Contact Person: H MATTHEWS
<br>
Email:
</p>
HTML 圖像:
文本聯系人:H MATTHEWS位於文本節點內。 因此,要打印文本,您必須為visibility_of_element_located()
誘導WebDriverWait ,並且您可以使用以下任一Locator Strategies :
使用XPATH
和childNodes :
print(driver.execute_script('return arguments[0].childNodes[9].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]")))).strip())
使用XPATH
和splitlines()
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]"))).get_attribute("innerHTML").splitlines()[-3])
注意:您必須添加以下導入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
如果您的用例僅提取文本H MATTHEWS您可以使用以下任一解決方案:
使用XPATH
和childNodes :
print(re.split('[:]', driver.execute_script('return arguments[0].childNodes[9].textContent;', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]")))).strip())[1])
使用XPATH
和splitlines()
:
print(re.split('[:]', WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[./strong[text()='Date Published:']]"))).get_attribute("innerHTML").splitlines()[-3])[1])
您可以在以下位置找到詳細的相關討論:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.