如何使用 selenium python 从具有多个页面（分页）的特定 div 容器中获取子元素的所有超链接

Question

我正在尝试从该站点的父 id='search-properties' 中抓取子元素 href 属性内的链接。 我首先尝试使用 find_elements_by_id 定位元素，然后使用 find_elements_by_css_selector 定位链接，但我经常得到AttributeError: 'list' object has no attribute 'find_elements_by_css_selectors'这样做所以我尝试使用 find_elements_by_tag_name 以及 find_elements_by_xpath 但实际上是刮掉了链接链接中的详细信息对我没有用。 所以经过大量环顾后，我终于找到了这段代码

from logging import exception
from typing import Text
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
driver.implicitly_wait(10)
house=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in house:
   # get_attribute() to get all href
   print(lnk.get_attribute('href'))

这段代码的问题在于它刮掉了所有链接，这意味着它也有绝对不必要的链接，就像在这个图像中不需要 javascript void 一样。 最后，对于分页，我尝试按照此答案进行操作，但遇到了无限循环，因此我不得不删除分页代码。 总之，我正在尝试获取 id = 'search-properties' 的多个页面的链接

Answer 1

试试这个。

    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")

    for ele in links:
        print(ele.get_attribute('href'))

Answer 2

我试过这个用于分页。

    from selenium import webdriver
    import time

    driver = webdriver.Chrome(executable_path="path")
    driver.implicitly_wait(10)
    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    driver.quit()

我试过这个是为了从每个页面获取链接。

    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2
    pagelinks= []
    #links of the 1st page
    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
    for ele in links:
        pagelinks.append(ele.get_attribute('href'))

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
            for ele in links:
                pagelinks.append(ele.get_attribute('href'))
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    print(len(pagelinks))
    for i in range(len(pagelinks)):
        print(pagelinks[i])

    driver.quit()

如何使用 selenium python 从具有多个页面（分页）的特定 div 容器中获取子元素的所有超链接

问题描述

2 个解决方案

解决方案1
0 2021-07-22 06:35:39

解决方案2
0 已采纳 2021-07-22 16:17:27

如何使用 selenium python 从具有多个页面（分页）的特定 div 容器中获取子元素的所有超链接

问题描述

2 个解决方案

解决方案1 0 2021-07-22 06:35:39

解决方案2 0 已采纳 2021-07-22 16:17:27

解决方案1
0 2021-07-22 06:35:39

解决方案2
0 已采纳 2021-07-22 16:17:27