簡體   English   中英

通過Xpath循環錯誤Selenium Python獲取元素

[英]Get Element By Xpath Loop Error Selenium Python

我正在嘗試為Pinterest制作網絡抓取工具。 我幾乎可以獲取所有數據,但是每個引腳都有一個名為“查看更多”的按鈕,該按鈕生成:“板名”和“作者名”數據。

邏輯:

  1. 將所有按鈕元素保存在數組中
  2. 遍歷它們並單擊每個按鈕
  3. 在頁面上獲得了總針數
  4. 通過增加xpath來針對引腳數進行循環以查找“板名”

按鈕單擊循環代碼:

moreButtons = driver.find_elements_by_xpath('//button[@data-test-id="seemoretoggle"]')
    for moreBtn in moreButtons:
        moreBtn.click()

    source_data = driver.page_source

獲取董事會名稱代碼

# Pin Length - Total Pins
total_pins = []
total_pins = driver.find_elements_by_class_name("Grid__Item")

# Pin Board Names
i = 1
while i <= len(total_pins):
    temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div[2]/h4/a[1]"
    temp = driver.find_element_by_xpath(temp_xpath)
    #pin_Board_Names.append(temp)
    print(temp.text)
    i += 1

部分作品..

Just old
Tiny House interior
SimpleLivingMama.com
Traceback (most recent call last):
  File "scrape.py", line 109, in <module>
    main()
  File "scrape.py", line 106, in main
    grab(args.url, args.fname)
  File "scrape.py", line 91, in grab
    temp = driver.find_element_by_xpath(temp_xpath)
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 393, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 966, in find_element
    'value': value})['value']
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 320, in execute
    self.error_handler.check_response(response)
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div[2]/h4/a[1]'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"187","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:57743","User-Agent":"selenium/3.13.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div[2]/h4/a[1]\", \"sessionId\": \"a8cdaa10-a2d3-11e8-86db-a3b39599a684\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a8cdaa10-a2d3-11e8-86db-a3b39599a684/element"}}
Screenshot: available via screen

它為我找到了3個板名,但以錯誤結尾。 我嘗試編輯循環和按鈕單擊,但是它們似乎都可以工作。 有誰知道是什么原因引起的,或者也許有建議去探索?

編輯1 :看到錯誤,說無法通過xpath找到元素。 但是該元素在網頁上。

編輯2 :添加了try:except進行檢查。 這里的代碼:

try:
            temp = driver.find_element_by_xpath(temp_xpath)
        except:
            print('no element at pin number: ' + str(i))

輸出:

Just old
Tiny House interior
SimpleLivingMama.com
no element at pin number: 4
SimpleLivingMama.com
Books for Pre-Schoolers
Stuff to Try
Baby & Toddler Milestones
Toys For Boys & Girls
House
OT
Make Extra Money
Shoes
Old photos
Crafts
for baby
There's A Book About That
Geek
Real DIY
Recycle & Repurpose
Crafts
Preschool Activities
Wild West Project
#BossMoms
no element at pin number: 24
#BossMoms
Crazy for DIY
Money Saving Tips
Painting Furniture
The home I want
screen door ideas
DIY Home
Little girl rooms
Container Home Desing
Bentley Joseph Adams
some truth bombs
New house!
Advice and Wisdom-Words
no element at pin number: 37
Advice and Wisdom-Words
House ideas
Houses
no element at pin number: 40
Houses
no element at pin number: 41
Houses
Fine Motor Activities for Kids
crafts
decorating ideas
mama
Barn Homes
For the Home
no element at pin number: 48
For the Home

檢查了找不到輸出的引腳號,但網頁上有板名。

編輯3 :注意,在引腳號47之后,總是說找不到元素。 無論列表多大。 還檢查moreButtons中是否存在所有按鈕xpath,並且它們是有效的。

預先感謝您的幫助

正如@AnkDasCo在評論中的幫助,找到了解決方案。 這里有兩個問題:

  1. Pinterest中同一元素有2個不同的xpath。 在某些地方,他們為同一元素創建2個div,而不是1個。
  2. 網絡驅動程序需要一些時間來提取元素。 盡管驅動程序等待頁面使用默認腳本完全加載元素,但在webdriver中添加“ wait ”有助於確保它嘗試提取元素然后在一段時間后繼續前進。 與“ time.sleep() ”相同,但是有所不同,因為它與webdriver有關。

xpaths以下是同一項目的2個xpath:

  1. / HTML /體/格[1] / DIV [1] / DIV [1] / DIV / DIV / DIV / DIV / DIV [1] / DIV / DIV / DIV / DIV [4] / DIV / DIV / DIV [ 2] / DIV / H4 / A [1]
  2. / HTML /體/格[1] / DIV [1] / DIV [1] / DIV / DIV / DIV / DIV / DIV [1] / DIV / DIV / DIV / DIV [1] / DIV / DIV / DIV [ 2] / DIV [2] / H4 / A [1]

我們注意到,兩者中的最后一個/ div不同。

工作守則

    driver = webdriver.PhantomJS(executable_path='phantomjs.exe')
    print("Ghost Headless Driver Invoked")
    # driver.implicitly_wait(5) # if element not found, wait for (seconds) before next operation
    driver.get(url) # grab the url

    # Scrolling till the end of page
    print("Started Scrolling ... ")
    match=True # change to 'False' for making this work..
    lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    while(match==False):
        lastCount = lenOfPage
        lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        if lastCount==lenOfPage:
            match=True

    source_data = driver.page_source # page source code as html

    # Get all pins , number of pins collected
    total_pins = []
    try:
        total_pins = driver.find_elements_by_class_name("Grid__Item")
    except:
        print("Unable to load pins")
    print("Total Pins: " + str(len(total_pins)))

    # get number of 'see more' buttons collected - for error checking
    moreButtons = driver.find_elements_by_xpath('//button[@data-test-id="seemoretoggle"]')
    print("Dynamic Elements: " + str(len(moreButtons)))
    print("Display: Dynamic Elements ... ")

    # clicking all 'See More' buttons
    i = 0
    while i <= (len(moreButtons) - 1):
        moreButtons[i].click()
        i += 1

    # Pin Board Names
    print("Extracting Board Names ... ")
    i = 1
    successful = False # for checking success of try | else not working
    while i <= len(total_pins):
        try:
            temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div[2]/h4/a[1]"
            temp = driver.find_element_by_xpath(temp_xpath)
            pin_Board_Names.append(temp.text)
            # print("Board_No: " + str(i) + " > " + temp.text)
            successful = True
        except:
            temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div/h4/a[1]"
            temp = driver.find_element_by_xpath(temp_xpath)
            pin_Board_Names.append(temp.text)
            # print("Board_No: " + str(i) + " > " + temp.text)
            successful = True
        if successful == False:
            print("Board_No: " + str(i) + " not found!")
        i += 1

    # quit driver
    driver.quit()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM