Python web 在动态页面上使用 Selenium 抓取 - 循环到下一个元素的问题

Question

我希望有人能帮助我，让我摆脱痛苦。 我最近开始学习 Python 并想通过一些网络抓取来挑战自己。

在过去的几天里，我一直在尝试对这个网站进行网络抓取（ https://ebn.eu/?p=members ）。 在网站上，我感兴趣的是：

单击每个徽标图像会弹出一个弹出窗口
从弹出窗口中抓取文本“查看完整资料”后面的链接
移至下一个徽标并对每个徽标执行相同操作

我已经设法让 Selenium 启动并运行，但问题是它不断打开第一个徽标并复制相同的链接，而不是移动到下一个。 我尝试过各种不同的方式，但遇到了一堵砖墙。

到目前为止我的代码：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

PATH = "/home/ed/Documents/python/chromedriver" # location of the webdriver - amend as requried

url = "https://ebn.eu/?p=members"

driver = webdriver.Chrome(PATH)
driver.get(url)

member_list = []

# Flow to open page and click on link to extract href

members = driver.find_elements_by_xpath('//*[@class="projectImage"]') # Looking for the required class - the image which on click brings up the info

for member in members:
    print(member.text) # to see that loop went to next iteration
    member.find_element_by_xpath('//*[@class="projectImage"]').click()
    wait = WebDriverWait(driver, 10)
    element = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'VIEW FULL PROFILE')))
    links = driver.find_element_by_partial_link_text("VIEW FULL PROFILE")
    href = links.get_attribute("href")
    member_list.append(href)
    member.find_element_by_xpath("/html/body/div[5]/div[1]/button").click()

    print(member_list)
driver.quit()

PS：我尝试将 member.find 更改为：

member.find_element_by_xpath('.//*[@class="projectImage"]').click()但后来我得到“无法找到元素”

很感谢任何形式的帮助。

谢谢

Answer 1

如果您研究页面的 HTML，他们有 onclick 脚本，该脚本基本上触发 JS 并呈现弹出窗口。 你可以利用它。 您可以在子元素img中找到 onclick 脚本。 所以你的逻辑应该是（1）获取子元素（2）转到第一个子元素（对于你的情况总是 img）（3）获取 onclick 脚本文本（4）执行脚本。

子元素

for member in members:
    print(member.text) # to see that loop went to next iteration
    # member.find_element_by_xpath('//*[@class="projectImage"]').click()
    
    #Begin of modification
    child_elems = member.find_elements_by_css_selector("*") #Get the child elems
    onclick_script = child_elems[0].get_attribute('onclick')#Get the img's onclick value
    driver.execute_script(onclick_script)                   #Execute the JS
    time.sleep(5)                                           #Wait for some time
    #end of modification 
    wait = WebDriverWait(driver, 10)
    element = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'VIEW FULL PROFILE')))
    links = driver.find_element_by_partial_link_text("VIEW FULL PROFILE")
    href = links.get_attribute("href")
    member_list.append(href)
    member.find_element_by_xpath("/html/body/div[5]/div[1]/button").click()

    print(member_list)

您需要导入时间模块。 我更喜欢time.sleep而不是wait.until ，当您从 web 抓取开始时，它更容易使用。

Python web 在动态页面上使用 Selenium 抓取 - 循环到下一个元素的问题

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-02 05:00:22

Python web 在动态页面上使用 Selenium 抓取 - 循环到下一个元素的问题

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-02 05:00:22

解决方案1
1 已采纳 2021-04-02 05:00:22