簡體   English   中英

Selenium Webdriver - 如何通過抓取提取文本

[英]Selenium Webdriver - How to extract texts through scraping

我正在嘗試從公司的職業網站上抓取信息。 我想獲取相應招聘廣告的參考代碼。

我想使用 Selenium 並嘗試使用 xpath 識別職位發布代碼。 當我運行代碼時,會打開一個 google Chrom 窗口並使用正確的網址:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pandas as pd

PATH = "C:/Users/MyUser/Desktop/Driver/chromedriver.exe"

driver = webdriver.Chrome(PATH)

driver.get("https://www.uke.jobs/sap(bD1kZSZjPTUwMA==)/bc/bsp/kwp/bsp_eui_rd_uc/main.do?action=to_uc_search")
driver.maximize_window()

ref_code = driver.find_elements_by_xpath("//tr[@data-eui-handler=\"{event:'click',handler:'eui.app.controller.search_results.selectRow'}\"]/td[1]")

print(len(ref_code))

User_input = input()

運行代碼時需要永遠,我得到以下結果:

DevTools listening on ws://127.0.0.1:52187/devtools/browser/7300c3d2-42d1-4f8e-a136-4e1ce37bcb87
c:\Users\MyUser\Desktop\PyhtonVisStuCo\Selenium.py:15: DeprecationWarning: find_elements_by_xpath is deprecated. Please use find_elements(by=By.XPATH, value=xpath) instead
  ref_code = driver.find_elements_by_xpath("//tr[@data-eui-handler=\"{event:'click',handler:'eui.app.controller.search_results.selectRow'}\"]/td[1]")
0
[3516:18308:0609/194039.395:ERROR:device_event_log_impl.cc(214)] [19:40:39.395] Bluetooth: bluetooth_adapter_winrt.cc:1074 Getting Default Adapter failed.

我究竟做錯了什么?

要從Referenceenzcode列中提取文本,您可以使用List Comprehension並且可以使用以下任一定位器策略

  • 使用CSS_SELECTOR

     driver.get("https://www.uke.jobs/sap(bD1kZSZjPTUwMA==)/bc/bsp/kwp/bsp_eui_rd_uc/main.do?action=to_uc_search") print([my_elem.text for my_elem in driver.find_elements(By.CSS_SELECTOR, "table#table_search_results tr[data-head] td:first-of-type")])
  • 使用XPATH

     driver.get("https://www.uke.jobs/sap(bD1kZSZjPTUwMA==)/bc/bsp/kwp/bsp_eui_rd_uc/main.do?action=to_uc_search") print([my_elem.text for my_elem in driver.find_elements(By.XPATH, "//table[@id='table_search_results']//tr[@data-head]/td")])
  • 控制台輸出:

     ['ZVW22192', 'ZPF2208_ex', 'ZPF2207_e', 'ZPF2206_e', 'ZMF2249', 'ZIT22484', 'ZIT22444', 'ZIT22380', 'ZIT22379', 'WS22536']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM