简体   繁体   中英

Get Element By Xpath Loop Error Selenium Python

I'm trying to make a web scraper for Pinterest. I'm able to get almost all the data, but each pin has a button called "see more" which generates: 'board name' and 'author name' data.

Logic:

  1. Saved all the button elements in array
  2. Loop through them and clicked each button
  3. Got total number of pins on page
  4. Looped against number of pins to find 'board name' by incrementing xpath

Button Click Loop Code:

moreButtons = driver.find_elements_by_xpath('//button[@data-test-id="seemoretoggle"]')
    for moreBtn in moreButtons:
        moreBtn.click()

    source_data = driver.page_source

Get Board Name Code

# Pin Length - Total Pins
total_pins = []
total_pins = driver.find_elements_by_class_name("Grid__Item")

# Pin Board Names
i = 1
while i <= len(total_pins):
    temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div[2]/h4/a[1]"
    temp = driver.find_element_by_xpath(temp_xpath)
    #pin_Board_Names.append(temp)
    print(temp.text)
    i += 1

Kind Of Works.. Partially..

Just old
Tiny House interior
SimpleLivingMama.com
Traceback (most recent call last):
  File "scrape.py", line 109, in <module>
    main()
  File "scrape.py", line 106, in main
    grab(args.url, args.fname)
  File "scrape.py", line 91, in grab
    temp = driver.find_element_by_xpath(temp_xpath)
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 393, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 966, in find_element
    'value': value})['value']
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 320, in execute
    self.error_handler.check_response(response)
  File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div[2]/h4/a[1]'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"187","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:57743","User-Agent":"selenium/3.13.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div[2]/h4/a[1]\", \"sessionId\": \"a8cdaa10-a2d3-11e8-86db-a3b39599a684\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a8cdaa10-a2d3-11e8-86db-a3b39599a684/element"}}
Screenshot: available via screen

It found 3 board names for me, but then it ends with errors. I tried to edit loop and button click, but they all seem to work. Does anyone know what is causing it or maybe suggestions to explore?

Edit 1 : Saw the error says cannot find element by xpath. But the element is there on the webpage.

Edit 2 : Added try:except to check. Here the code:

try:
            temp = driver.find_element_by_xpath(temp_xpath)
        except:
            print('no element at pin number: ' + str(i))

with output:

Just old
Tiny House interior
SimpleLivingMama.com
no element at pin number: 4
SimpleLivingMama.com
Books for Pre-Schoolers
Stuff to Try
Baby & Toddler Milestones
Toys For Boys & Girls
House
OT
Make Extra Money
Shoes
Old photos
Crafts
for baby
There's A Book About That
Geek
Real DIY
Recycle & Repurpose
Crafts
Preschool Activities
Wild West Project
#BossMoms
no element at pin number: 24
#BossMoms
Crazy for DIY
Money Saving Tips
Painting Furniture
The home I want
screen door ideas
DIY Home
Little girl rooms
Container Home Desing
Bentley Joseph Adams
some truth bombs
New house!
Advice and Wisdom-Words
no element at pin number: 37
Advice and Wisdom-Words
House ideas
Houses
no element at pin number: 40
Houses
no element at pin number: 41
Houses
Fine Motor Activities for Kids
crafts
decorating ideas
mama
Barn Homes
For the Home
no element at pin number: 48
For the Home

Checked the pin number where can't find output, but the board name is there on webpage.

Edit 3 : Noticed that just after pin number 47, it always says no element found. No matter how big the list is. Also checked that all buttons xpaths are there in moreButtons and they're valid..

Thanks for help in advance

As helped by @AnkDasCo in the comments, found a solution to it. There were 2 problems here:

  1. There are 2 different xpath for the same element in Pinterest. Some places they create 2 divs instead of just 1 for the same element.
  2. The webdriver needs some time to extract the element. Though the driver waits for the page to load elements completely with default script, adding ' wait ' to webdriver helped to ensure that it tries to extract element and then move on after some time. The same as ' time.sleep() ' but different because it's related to webdriver.

xpaths The following are 2 xpaths for the same item:

  1. /html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div/h4/a[1]
  2. /html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[1]/div/div/div[2]/div[2]/h4/a[1]

As we notice, the last /div in both is different.

Working Code

    driver = webdriver.PhantomJS(executable_path='phantomjs.exe')
    print("Ghost Headless Driver Invoked")
    # driver.implicitly_wait(5) # if element not found, wait for (seconds) before next operation
    driver.get(url) # grab the url

    # Scrolling till the end of page
    print("Started Scrolling ... ")
    match=True # change to 'False' for making this work..
    lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    while(match==False):
        lastCount = lenOfPage
        lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        if lastCount==lenOfPage:
            match=True

    source_data = driver.page_source # page source code as html

    # Get all pins , number of pins collected
    total_pins = []
    try:
        total_pins = driver.find_elements_by_class_name("Grid__Item")
    except:
        print("Unable to load pins")
    print("Total Pins: " + str(len(total_pins)))

    # get number of 'see more' buttons collected - for error checking
    moreButtons = driver.find_elements_by_xpath('//button[@data-test-id="seemoretoggle"]')
    print("Dynamic Elements: " + str(len(moreButtons)))
    print("Display: Dynamic Elements ... ")

    # clicking all 'See More' buttons
    i = 0
    while i <= (len(moreButtons) - 1):
        moreButtons[i].click()
        i += 1

    # Pin Board Names
    print("Extracting Board Names ... ")
    i = 1
    successful = False # for checking success of try | else not working
    while i <= len(total_pins):
        try:
            temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div[2]/h4/a[1]"
            temp = driver.find_element_by_xpath(temp_xpath)
            pin_Board_Names.append(temp.text)
            # print("Board_No: " + str(i) + " > " + temp.text)
            successful = True
        except:
            temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div/h4/a[1]"
            temp = driver.find_element_by_xpath(temp_xpath)
            pin_Board_Names.append(temp.text)
            # print("Board_No: " + str(i) + " > " + temp.text)
            successful = True
        if successful == False:
            print("Board_No: " + str(i) + " not found!")
        i += 1

    # quit driver
    driver.quit()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM