简体   繁体   English

Python - selenium - 无法获得 xpath

[英]Python - selenium - not able to get the xpath

I'm trying to find xpath of the below HTML structure.我试图找到下面 HTML 结构的 xpath 。

<div class="col-xs-6 pg-desc-section">
        <p class="boldText">Jewelry</p>
        <p data-hostname="nikki stanzione" data-showname="gifts from dallas prince jewelry" data-category="jewelry" data-airtime="11/12/2020 12:00:00 AM">

                </p><div class="hidden-xs " data-showscheduleid="23839680">Gifts From Dallas Prince Jewelry</div>
                    <a class="mobile-showlink visible-xs" data-showscheduleid="23839680" onclick="pgmoreinfo(this); $(this).hide(); $(this).next().show();"> Gifts From Dallas Prince Jewelry</a>
                    <a class="ab-mobile-showlink" style="display:none;" data-showlink="abTest" data-showscheduleid="23839680" onclick="pgmoreinfo(this);"> Gifts From Dallas Prince Jewelry</a>
        <p></p>
    </div>

what I want is the value of attribute data-airtime in the third line.我想要的是第三行中属性 data-airtime 的值。 I tried the below code but its showing wrong syntax.我尝试了下面的代码,但它显示错误的语法。 ( Please note that data-airtime is self defined) 请注意,数据通话时间是自定义的)

driver.get("https://www.shophq.com/onair/programguide?cm_re=GN-_-ONAIR-_-PROGRAMGUIDE#content")
driver.implicitly_wait(20)
bc=driver.find_elements_by_xpath("//div[@class='col-xs-6 pg-desc-section']/p[@data-category='Jewelry']").data-airtime

Please help to find what I'm doing wrong here while defining the xpath在定义 xpath 时,请帮助找出我在这里做错了什么

So, it took a little bit;所以,花了一点时间; but, I was able to find the data that you were searching for.但是,我能够找到您正在搜索的数据。 For starters, the base web address is the one that I used for the initial search对于初学者,基本 web 地址是我用于初始搜索的地址

https://www.shophq.com/onair/programguide

Once I got the website to display, I created a method to get the number of displayed rows ( since we need to print out the data for our records )一旦我让网站显示,我创建了一个方法来获取显示的行数(因为我们需要打印出我们的记录数据)

def get_number_of_displayed_rows(driver : ChromeDriver):
    xpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')]"
    return driver.find_elements(By.XPATH, xpath).__len__()

using this method, I was able to see that 24 rows displayed.使用这种方法,我能够看到显示的 24 行。 From there, I scrolled to each element and then was able to isolate the data using xpath .从那里,我滚动到每个元素,然后能够使用xpath隔离数据。

NOTE笔记
I only got the first column's data for you;我只为你得到了第一列的数据; but, using this code, you should be able to get columns 2 and 3.但是,使用此代码,您应该能够获得第 2 列和第 3 列。

MAIN PROGRAM - For Reference主程序 - 供参考

from selenium import webdriver
from selenium.webdriver.chrome.webdriver import WebDriver as ChromeDriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as DriverWait
from selenium.webdriver.support import expected_conditions as DriverConditions
from selenium.common.exceptions import WebDriverException


def get_chrome_driver():
    """This sets up our Chrome Driver and returns it as an object"""
    path_to_chrome = "F:\Selenium_Drivers\Windows_Chrome87_Driver\chromedriver.exe"
    chrome_options = webdriver.ChromeOptions() 
    
    # Browser is displayed in a custom window size
    chrome_options.add_argument("window-size=1500,1000")
    
    return webdriver.Chrome(executable_path = path_to_chrome,
                            options = chrome_options)

    
def wait_displayed(driver : ChromeDriver, xpath: str, int = 5):
    try:
         DriverWait(driver, int).until(
            DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
        )
    except:
        raise WebDriverException(f'Timeout: Failed to find {xpath}')
    

def scroll_to_element(driver : ChromeDriver, xpath: str, int = 5):
    try:
         webElement = DriverWait(driver, int).until(
            DriverConditions.presence_of_element_located(locator = (By.XPATH, xpath))
        )
         driver.execute_script("return arguments[0].scrollIntoView();", webElement)
    except:
        raise WebDriverException(f'Timeout: Failed to find {xpath}\nResult: Could not scroll to element')
    
    
def get_number_of_displayed_rows(driver : ChromeDriver):
    xpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')]"
    return driver.find_elements(By.XPATH, xpath).__len__()
    

# Gets our chrome driver and opens our site
chrome_driver = get_chrome_driver()
chrome_driver.get("https://www.shophq.com/onair/programguide")
wait_displayed(chrome_driver, "//input[@id='txtSearchString']", 30)
wait_displayed(chrome_driver, "//div[contains(@class, 'program-guide-table')]", 30)
wait_displayed(chrome_driver, "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')][1]", 30)

numberOfRecords = get_number_of_displayed_rows(chrome_driver)

# Loop through each record and scrape the data
for rowNumber in range(numberOfRecords):
    # Record Row Base Xpath
    recordXpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'custom-col-2-detail')]"
    print(f'Scrolling to Record #{rowNumber + 1}')
    scroll_to_element(chrome_driver, xpath = f'{recordXpath}[{rowNumber + 1}]')
    recordXpath = "//div[contains(@class, 'program-guide-table')]//div[@class='row content']//div[contains(@class, 'desktop-schedule-row')]"
    rowXpath = f'{recordXpath}[{rowNumber + 1}]'
    print(f'Record {rowNumber + 1} data')
    
    # First column's element and details
    recordElement = chrome_driver.find_element(By.XPATH, "{0}//div[contains(@class, 'pg-show')][1]//p[@data-showname]".format(rowXpath))
    print("Host Name: {0}".format(recordElement.get_attribute('data-hostname')))
    print("Show Name: {0}".format(recordElement.get_attribute('data-showname')))
    print("Category: {0}".format(recordElement.get_attribute('data-category')))
    print("Air Time: {0}".format(recordElement.get_attribute('data-airtime')))
    print("========================================================================================\n")

chrome_driver.quit()
chrome_driver.service.stop()

SAMPLE OUTPUT样品 OUTPUT

Scrolling to Record #1
Record 1 data
Host Name: kendy kloepfer
Show Name: gifts from invicta watches
Category: watches
Air Time: 11/26/2020 12:00:00 AM
========================================================================================

Scrolling to Record #2
Record 2 data
Host Name: lynne schacher
Show Name: holiday diamond day kick-off
Category: jewelry
Air Time: 11/26/2020 1:00:00 AM
========================================================================================

Scrolling to Record #3
Record 3 data
Host Name: daniel green
Show Name: dr. terry dubrow: safe living
Category: beauty
Air Time: 11/26/2020 2:00:00 AM
========================================================================================

Scrolling to Record #4
Record 4 data
Host Name: daniel green
Show Name: gifts from invicta watches
Category: watches
Air Time: 11/26/2020 3:00:00 AM
========================================================================================

Scrolling to Record #5
Record 5 data
Host Name: melissa miner
Show Name: gifts of joy
Category: electronics
Air Time: 11/26/2020 4:00:00 AM
========================================================================================

Scrolling to Record #6
Record 6 data
Host Name: heather hall
Show Name: dr. terry dubrow: safe living
Category: beauty
Air Time: 11/26/2020 5:00:00 AM
========================================================================================

Scrolling to Record #7
Record 7 data
Host Name: kendy kloepfer
Show Name: gifts from invicta watches
Category: watches
Air Time: 11/26/2020 6:00:00 AM
========================================================================================

Scrolling to Record #8
Record 8 data
Host Name: nikki stanzione
Show Name: gifts for the family
Category: electronics
Air Time: 11/26/2020 7:00:00 AM
========================================================================================

Scrolling to Record #9
Record 9 data
Host Name: fatima cocci
Show Name: gifts from pamela mccoy collection
Category: jewelry
Air Time: 11/26/2020 8:00:00 AM
========================================================================================

Scrolling to Record #10
Record 10 data
Host Name: kathy norton
Show Name: fashion talk with fatima & kathy
Category: fashion
Air Time: 11/26/2020 9:00:00 AM
========================================================================================

Scrolling to Record #11
Record 11 data
Host Name: kathy norton
Show Name: fashion talk with fatima & kathy
Category: fashion
Air Time: 11/26/2020 10:00:00 AM
========================================================================================

Scrolling to Record #12
Record 12 data
Host Name: kathy norton
Show Name: gifts of designer fragrances
Category: beauty
Air Time: 11/26/2020 11:00:00 AM
========================================================================================

Scrolling to Record #13
Record 13 data
Host Name: lynne schacher
Show Name: top accessory gifts of the season
Category: fashion
Air Time: 11/26/2020 12:00:00 PM
========================================================================================

Scrolling to Record #14
Record 14 data
Host Name: lynne schacher
Show Name: fashion doorbusters
Category: watches
Air Time: 11/26/2020 1:00:00 PM
========================================================================================

Scrolling to Record #15
Record 15 data
Host Name: nikki stanzione
Show Name: fashion doorbusters
Category: fashion
Air Time: 11/26/2020 2:00:00 PM
========================================================================================

Scrolling to Record #16
Record 16 data
Host Name: nikki stanzione
Show Name: top accessory gifts of the season
Category: fashion
Air Time: 11/26/2020 3:00:00 PM
========================================================================================

Scrolling to Record #17
Record 17 data
Host Name: jess manuel
Show Name: mayamar jewelry: live from st. barts
Category: jewelry
Air Time: 11/26/2020 4:00:00 PM
========================================================================================

Scrolling to Record #18
Record 18 data
Host Name: jess manuel
Show Name: dr. terry dubrow: safe living
Category: beauty
Air Time: 11/26/2020 5:00:00 PM
========================================================================================

Scrolling to Record #19
Record 19 data
Host Name: kendy kloepfer
Show Name: black friday starts now
Category: electronics
Air Time: 11/26/2020 6:00:00 PM
========================================================================================

Scrolling to Record #20
Record 20 data
Host Name: kendy kloepfer
Show Name: black friday starts now
Category: electronics
Air Time: 11/26/2020 7:00:00 PM
========================================================================================

Scrolling to Record #21
Record 21 data
Host Name: jess manuel
Show Name: dr. terry dubrow: safe living
Category: home
Air Time: 11/26/2020 8:00:00 PM
========================================================================================

Scrolling to Record #22
Record 22 data
Host Name: kendy kloepfer
Show Name: gifts from waterford crystal
Category: home
Air Time: 11/26/2020 9:00:00 PM
========================================================================================

Scrolling to Record #23
Record 23 data
Host Name: kendy kloepfer
Show Name: gifts from waterford crystal
Category: home
Air Time: 11/26/2020 10:00:00 PM
========================================================================================

Scrolling to Record #24
Record 24 data
Host Name: fatima cocci
Show Name: gifts from stefano oro gold jewelry
Category: jewelry
Air Time: 11/26/2020 11:00:00 PM
========================================================================================

find_elements_by_xpath() returns list of elements. find_elements_by_xpath()返回元素列表。 you need to iterate first and then use get_attribute() to get the value.您需要先进行迭代,然后使用get_attribute()来获取值。

Code Block:代码块:

for item in driver.find_elements_by_xpath("//div[@class='col-xs-6 pg-desc-section']/p[@data-category='jewelry']"):
    print(item.get_attribute("data-airtime"))

To avoid synchronization issue use WebDriverWait()为避免同步问题,请使用WebDriverWait()

driver.get('https://www.shophq.com/onair/programguide?cm_re=GN-_-ONAIR-_-PROGRAMGUIDE#content')
items=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='col-xs-6 pg-desc-section']/p[@data-category='jewelry']")))
for item in items:
   print(item.get_attribute("data-airtime"))

You need to import below libraries.您需要导入以下库。

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM