[英]Get Element By Xpath Loop Error Selenium Python
我正在尝试为Pinterest制作网络抓取工具。 我几乎可以获取所有数据,但是每个引脚都有一个名为“查看更多”的按钮,该按钮生成:“板名”和“作者名”数据。
逻辑:
按钮单击循环代码:
moreButtons = driver.find_elements_by_xpath('//button[@data-test-id="seemoretoggle"]')
for moreBtn in moreButtons:
moreBtn.click()
source_data = driver.page_source
获取董事会名称代码
# Pin Length - Total Pins
total_pins = []
total_pins = driver.find_elements_by_class_name("Grid__Item")
# Pin Board Names
i = 1
while i <= len(total_pins):
temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div[2]/h4/a[1]"
temp = driver.find_element_by_xpath(temp_xpath)
#pin_Board_Names.append(temp)
print(temp.text)
i += 1
部分作品..
Just old
Tiny House interior
SimpleLivingMama.com
Traceback (most recent call last):
File "scrape.py", line 109, in <module>
main()
File "scrape.py", line 106, in main
grab(args.url, args.fname)
File "scrape.py", line 91, in grab
temp = driver.find_element_by_xpath(temp_xpath)
File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 393, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 966, in find_element
'value': value})['value']
File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\webdriver.py", line 320, in execute
self.error_handler.check_response(response)
File "C:\Users\da74\AppData\Roaming\Python\Python36\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div[2]/h4/a[1]'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"187","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:57743","User-Agent":"selenium/3.13.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[4]/div/div/div[2]/div[2]/h4/a[1]\", \"sessionId\": \"a8cdaa10-a2d3-11e8-86db-a3b39599a684\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a8cdaa10-a2d3-11e8-86db-a3b39599a684/element"}}
Screenshot: available via screen
它为我找到了3个板名,但以错误结尾。 我尝试编辑循环和按钮单击,但是它们似乎都可以工作。 有谁知道是什么原因引起的,或者也许有建议去探索?
编辑1 :看到错误,说无法通过xpath找到元素。 但是该元素在网页上。
编辑2 :添加了try:except进行检查。 这里的代码:
try:
temp = driver.find_element_by_xpath(temp_xpath)
except:
print('no element at pin number: ' + str(i))
输出:
Just old
Tiny House interior
SimpleLivingMama.com
no element at pin number: 4
SimpleLivingMama.com
Books for Pre-Schoolers
Stuff to Try
Baby & Toddler Milestones
Toys For Boys & Girls
House
OT
Make Extra Money
Shoes
Old photos
Crafts
for baby
There's A Book About That
Geek
Real DIY
Recycle & Repurpose
Crafts
Preschool Activities
Wild West Project
#BossMoms
no element at pin number: 24
#BossMoms
Crazy for DIY
Money Saving Tips
Painting Furniture
The home I want
screen door ideas
DIY Home
Little girl rooms
Container Home Desing
Bentley Joseph Adams
some truth bombs
New house!
Advice and Wisdom-Words
no element at pin number: 37
Advice and Wisdom-Words
House ideas
Houses
no element at pin number: 40
Houses
no element at pin number: 41
Houses
Fine Motor Activities for Kids
crafts
decorating ideas
mama
Barn Homes
For the Home
no element at pin number: 48
For the Home
检查了找不到输出的引脚号,但网页上有板名。
编辑3 :注意,在引脚号47之后,总是说找不到元素。 无论列表多大。 还检查moreButtons中是否存在所有按钮xpath,并且它们是有效的。
预先感谢您的帮助
正如@AnkDasCo在评论中的帮助,找到了解决方案。 这里有两个问题:
xpaths以下是同一项目的2个xpath:
我们注意到,两者中的最后一个/ div不同。
工作守则
driver = webdriver.PhantomJS(executable_path='phantomjs.exe')
print("Ghost Headless Driver Invoked")
# driver.implicitly_wait(5) # if element not found, wait for (seconds) before next operation
driver.get(url) # grab the url
# Scrolling till the end of page
print("Started Scrolling ... ")
match=True # change to 'False' for making this work..
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
while(match==False):
lastCount = lenOfPage
lenOfPage = driver.execute_script("window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
if lastCount==lenOfPage:
match=True
source_data = driver.page_source # page source code as html
# Get all pins , number of pins collected
total_pins = []
try:
total_pins = driver.find_elements_by_class_name("Grid__Item")
except:
print("Unable to load pins")
print("Total Pins: " + str(len(total_pins)))
# get number of 'see more' buttons collected - for error checking
moreButtons = driver.find_elements_by_xpath('//button[@data-test-id="seemoretoggle"]')
print("Dynamic Elements: " + str(len(moreButtons)))
print("Display: Dynamic Elements ... ")
# clicking all 'See More' buttons
i = 0
while i <= (len(moreButtons) - 1):
moreButtons[i].click()
i += 1
# Pin Board Names
print("Extracting Board Names ... ")
i = 1
successful = False # for checking success of try | else not working
while i <= len(total_pins):
try:
temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div[2]/h4/a[1]"
temp = driver.find_element_by_xpath(temp_xpath)
pin_Board_Names.append(temp.text)
# print("Board_No: " + str(i) + " > " + temp.text)
successful = True
except:
temp_xpath = "/html/body/div[1]/div[1]/div[1]/div/div/div/div/div[1]/div/div/div/div[" + str(i) + "]/div/div/div[2]/div/h4/a[1]"
temp = driver.find_element_by_xpath(temp_xpath)
pin_Board_Names.append(temp.text)
# print("Board_No: " + str(i) + " > " + temp.text)
successful = True
if successful == False:
print("Board_No: " + str(i) + " not found!")
i += 1
# quit driver
driver.quit()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.