使用CSS選擇器從網頁提取數據-Selenium Python

Question

我希望從Dell的網站中提取一些我感興趣的設備日期。 我嘗試使用urllib下載網頁，但該網頁受驗證碼保護，目前無法繞過。 現在，我正在使用Selenium打開瀏覽器，手動解決capthca，然后自動打開頁面並提取日期。 問題是css選擇器返回了一些奇怪的元素，而不是所需的輸出

我的代碼：

from selenium import webdriver
import time
driver = webdriver.Chrome()


def scrape(codes):
    dates = []
    for i in range(len(codes)):
        driver.get("https://www.dell.com/support/home/us/en/19/product-support/"
                   "servicetag/%s/warranty?ref=captchasuccess" % codes[i])

    # Solve captcha manually
        if i == 0:
            print("You now have 120\" seconds to solve the captcha")
            time.sleep(120)
            print("120\" Passed")
    # Extract data
        expdate = driver.find_element_by_css_selector("#printdivid > div > div.not-annotated.hover > table:nth-child(3) > tbody > tr > td:nth-child(3)")
        print(expdate)
    driver.close()

codes = ['1FMR762', '15FDBG2', '10V8YZ1']
scrape(codes)

預期產量：

June 22, 2018
October 15, 2017
April 19, 2017

給定輸出：

<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.21873872382745052-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.06836824093097027-1")>
<selenium.webdriver.remote.webelement.WebElement (session="d83af0f7a3a9c79307d2058f863a7ecb", element="0.6642161898702734-1")>

Answer 1

查看API文檔， find_element_by_css_selector函數返回一個WebElement對象。 請參閱https://selenium-python.readthedocs.io/api.html 。

Web元素的內容需要在打印之前轉換為字符串，如Python中所述，以及如何從Selenium元素WebElement對象獲取文本？ 。

因此，將行print (expdate)更改為print (expdate.text)應該會有所幫助。

使用CSS選擇器從網頁提取數據-Selenium Python

問題描述

1 個解決方案

解決方案1
1 已采納 2018-09-20 08:39:10

使用CSS選擇器從網頁提取數據-Selenium Python

問題描述

1 個解決方案

解決方案1 1 已采納 2018-09-20 08:39:10

解決方案1
1 已采納 2018-09-20 08:39:10