[英]Python web scraping with Selenium on Dynamic Page - Issue with looping to next element
I am hoping someone can please help me out and put me out of my misery.我希望有人能帮助我,让我摆脱痛苦。 I have recently started to learn Python and wanted to challenge myself with some web-scraping.
我最近开始学习 Python 并想通过一些网络抓取来挑战自己。
Over the past couple of days I have been trying to web-scrape this website ( https://ebn.eu/?p=members ).在过去的几天里,我一直在尝试对这个网站进行网络抓取( https://ebn.eu/?p=members )。 On the website, I am interesting in:
在网站上,我感兴趣的是:
I have managed to get Selenium up and running but the issue is that it keeps opening the first logo and copying the same link as opposed to moving to the next one.我已经设法让 Selenium 启动并运行,但问题是它不断打开第一个徽标并复制相同的链接,而不是移动到下一个。 I have tried in various different ways but came up against a brick wall.
我尝试过各种不同的方式,但遇到了一堵砖墙。
My code so far:到目前为止我的代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
PATH = "/home/ed/Documents/python/chromedriver" # location of the webdriver - amend as requried
url = "https://ebn.eu/?p=members"
driver = webdriver.Chrome(PATH)
driver.get(url)
member_list = []
# Flow to open page and click on link to extract href
members = driver.find_elements_by_xpath('//*[@class="projectImage"]') # Looking for the required class - the image which on click brings up the info
for member in members:
print(member.text) # to see that loop went to next iteration
member.find_element_by_xpath('//*[@class="projectImage"]').click()
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'VIEW FULL PROFILE')))
links = driver.find_element_by_partial_link_text("VIEW FULL PROFILE")
href = links.get_attribute("href")
member_list.append(href)
member.find_element_by_xpath("/html/body/div[5]/div[1]/button").click()
print(member_list)
driver.quit()
PS: I have tried changing the member.find to: PS:我尝试将 member.find 更改为:
member.find_element_by_xpath('.//*[@class="projectImage"]').click()
But then I get "Unable to find element" member.find_element_by_xpath('.//*[@class="projectImage"]').click()
但后来我得到“无法找到元素”
Any help is very much appreciated.很感谢任何形式的帮助。
Thanks谢谢
If you study the HTML of the page, they have the onclick script which basically triggers the JS and renders the pop-up.如果您研究页面的 HTML,他们有 onclick 脚本,该脚本基本上触发 JS 并呈现弹出窗口。 You can make use of it.
你可以利用它。 You can find the onclick script in the child element
img
.您可以在子元素
img
中找到 onclick 脚本。 So your logic should be like (1)Get the child element (2)go to first child element (which is img always for your case) (3)Get the onclick script text (4)execute the script.所以你的逻辑应该是(1)获取子元素(2)转到第一个子元素(对于你的情况总是 img)(3)获取 onclick 脚本文本(4)执行脚本。
for member in members:
print(member.text) # to see that loop went to next iteration
# member.find_element_by_xpath('//*[@class="projectImage"]').click()
#Begin of modification
child_elems = member.find_elements_by_css_selector("*") #Get the child elems
onclick_script = child_elems[0].get_attribute('onclick')#Get the img's onclick value
driver.execute_script(onclick_script) #Execute the JS
time.sleep(5) #Wait for some time
#end of modification
wait = WebDriverWait(driver, 10)
element = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'VIEW FULL PROFILE')))
links = driver.find_element_by_partial_link_text("VIEW FULL PROFILE")
href = links.get_attribute("href")
member_list.append(href)
member.find_element_by_xpath("/html/body/div[5]/div[1]/button").click()
print(member_list)
You need to import time module.您需要导入时间模块。 I prefer
time.sleep
over wait.until
, it's more easier to use when you are starting with web scraping.我更喜欢
time.sleep
而不是wait.until
,当您从 web 抓取开始时,它更容易使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.