简体   繁体   English

Python web 在动态页面上使用 Selenium 抓取 - 循环到下一个元素的问题

[英]Python web scraping with Selenium on Dynamic Page - Issue with looping to next element

I am hoping someone can please help me out and put me out of my misery.我希望有人能帮助我,让我摆脱痛苦。 I have recently started to learn Python and wanted to challenge myself with some web-scraping.我最近开始学习 Python 并想通过一些网络抓取来挑战自己。

Over the past couple of days I have been trying to web-scrape this website ( https://ebn.eu/?p=members ).在过去的几天里,我一直在尝试对这个网站进行网络抓取( https://ebn.eu/?p=members )。 On the website, I am interesting in:在网站上,我感兴趣的是:

  1. Clicking on each logo image which brings up a pop-up单击每个徽标图像会弹出一个弹出窗口
  2. From the pop-up scrape the link which is behind the text "VIEW FULL PROFILE"从弹出窗口中抓取文本“查看完整资料”后面的链接
  3. Move to the next logo and do the same for each移至下一个徽标并对每个徽标执行相同操作

I have managed to get Selenium up and running but the issue is that it keeps opening the first logo and copying the same link as opposed to moving to the next one.我已经设法让 Selenium 启动并运行,但问题是它不断打开第一个徽标并复制相同的链接,而不是移动到下一个。 I have tried in various different ways but came up against a brick wall.我尝试过各种不同的方式,但遇到了一堵砖墙。

My code so far:到目前为止我的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "/home/ed/Documents/python/chromedriver" # location of the webdriver - amend as requried

url = "https://ebn.eu/?p=members"

driver = webdriver.Chrome(PATH)
driver.get(url)

member_list = []

# Flow to open page and click on link to extract href

members = driver.find_elements_by_xpath('//*[@class="projectImage"]') # Looking for the required class - the image which on click brings up the info

for member in members:
    print(member.text) # to see that loop went to next iteration
    member.find_element_by_xpath('//*[@class="projectImage"]').click()
    wait = WebDriverWait(driver, 10)
    element = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'VIEW FULL PROFILE')))
    links = driver.find_element_by_partial_link_text("VIEW FULL PROFILE")
    href = links.get_attribute("href")
    member_list.append(href)
    member.find_element_by_xpath("/html/body/div[5]/div[1]/button").click()

    print(member_list)
driver.quit()

PS: I have tried changing the member.find to: PS:我尝试将 member.find 更改为:

member.find_element_by_xpath('.//*[@class="projectImage"]').click() But then I get "Unable to find element" member.find_element_by_xpath('.//*[@class="projectImage"]').click()但后来我得到“无法找到元素”

Any help is very much appreciated.很感谢任何形式的帮助。

Thanks谢谢

If you study the HTML of the page, they have the onclick script which basically triggers the JS and renders the pop-up.如果您研究页面的 HTML,他们有 onclick 脚本,该脚本基本上触发 JS 并呈现弹出窗口。 You can make use of it.你可以利用它。 You can find the onclick script in the child element img .您可以在子元素img中找到 onclick 脚本。 So your logic should be like (1)Get the child element (2)go to first child element (which is img always for your case) (3)Get the onclick script text (4)execute the script.所以你的逻辑应该是(1)获取子元素(2)转到第一个子元素(对于你的情况总是 img)(3)获取 onclick 脚本文本(4)执行脚本。

child element子元素

for member in members:
    print(member.text) # to see that loop went to next iteration
    # member.find_element_by_xpath('//*[@class="projectImage"]').click()
    
    #Begin of modification
    child_elems = member.find_elements_by_css_selector("*") #Get the child elems
    onclick_script = child_elems[0].get_attribute('onclick')#Get the img's onclick value
    driver.execute_script(onclick_script)                   #Execute the JS
    time.sleep(5)                                           #Wait for some time
    #end of modification 
    wait = WebDriverWait(driver, 10)
    element = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'VIEW FULL PROFILE')))
    links = driver.find_element_by_partial_link_text("VIEW FULL PROFILE")
    href = links.get_attribute("href")
    member_list.append(href)
    member.find_element_by_xpath("/html/body/div[5]/div[1]/button").click()

    print(member_list)

You need to import time module.您需要导入时间模块。 I prefer time.sleep over wait.until , it's more easier to use when you are starting with web scraping.我更喜欢time.sleep而不是wait.until ,当您从 web 抓取开始时,它更容易使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM