簡體   English   中英

如何使用 Selenium 和 Python 從 webelements 中提取文本

[英]How to extract the text from the webelements using Selenium and Python

代碼試驗:

driver.get(url)
cards = driver.find_elements_by_class_name("job-cardstyle__JobCardComponent-sc-1mbmxes-0")
for card in cards:
    data = card.get_attribute('text')
    print(data)

    
driver.close()
driver.quit()

“卡片”返回 selenium 網絡元素,我無法通過 for 循環從中提取文本。

而不是get_attribute('text')您需要使用text屬性,如下所示:

data = card.text

解決方案

要定位可見元素,您需要為visibility_of_all_elements_located()引入WebDriverWait ,您可以使用以下解決方案:

driver.get(url)
cards = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))
for card in cards:
    data = card.text
    print(data)

注意:您必須添加以下導入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

結局

在一行中,您可以按如下方式使用List Comprehension

print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))])
  1. 檢查您的網絡元素路徑是否正確提及
  2. 從元素中獲取文本
  3. 打印出來

問題出在這一行

 data = card.get_attribute('text')

您可以執行以下操作:

  1. 使用.text

     for card in cards: data = card.text print(data)
  2. 使用innerText

     for card in cards: data = card.get_attribute('innerText') print(data)

此外,根據上面的評論,您應該打印卡片列表長度以更好地調試它。

print(len(cards))

所以如果里面有東西。

這在一定程度上起作用:

driver.get("https://www.monster.com/jobs/search?q=Python-Developer&where=Las+Vegas%2C+NV&page=1")
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@data-test-id = 'svx-job-title']")))
jobs = driver.find_elements(By.XPATH, "//div[contains(@class, 'job-cardstyle__JobCardHeader')]")
all_jobs = [job.text for job in jobs]
print(all_jobs)

WebdriverWait 導入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

Output:

['Software engineer III\nRandstad USA\nLas Vegas, NV', 'C\nPython Developer\nconfidential\n$55 - $65 / Per Hour', 'C\nSenior Software Engineer\nCox Communications Inc\nLas Vegas, NV', 'Mission Systems Engineer\nDCS Corporation\nLas Vegas, NV', 'G\nSoftware Engineer - 914\nGCR Technical Staffing\nHenderson, NV', 'Z\nNetSuite Developer\nZone & Company Software Consulting\nLas Vegas, NV', 'IT Project Engineer\nRauland Florida by Ametek, Inc.\nSunrise, NV', 'A\nWeb Developer\nArdor Global', 'Senior Software Engineer – Node\nMeridian Technology Group Inc.']

Process finished with exit code 0

您可以使用\n分隔符拆分列表以供進一步使用。 此外,該站點似乎是動態加載卡片的,即,當您向下滾動時,新卡片會加載,因此您可能不會在一個實例中獲得所有卡片。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM