繁体   English   中英

如何使用 selenium python 从具有多个页面(分页)的特定 div 容器中获取子元素的所有超链接

[英]how to get all the hyperlinks from child elements from a specific div container having multiple pages(pagination) using selenium python

我正在尝试从该站点的父 id='search-properties' 中抓取子元素 href 属性内的链接。 我首先尝试使用 find_elements_by_id 定位元素,然后使用 find_elements_by_css_selector 定位链接,但我经常得到AttributeError: 'list' object has no attribute 'find_elements_by_css_selectors'这样做所以我尝试使用 find_elements_by_tag_name 以及 find_elements_by_xpath 但实际上是刮掉了链接链接中的详细信息对我没有用。 所以经过大量环顾后,我终于找到了这段代码

from logging import exception
from typing import Text
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
import time
import pandas as pd
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import csv
from selenium import webdriver 
PATH = "C:/ProgramData/Anaconda3/scripts/chromedriver.exe" #always keeps chromedriver.exe inside scripts to save hours of debugging
driver =webdriver.Chrome(PATH) #preety important part
driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
driver.implicitly_wait(10)
house=driver.find_elements_by_tag_name("a")
# traverse list
for lnk in house:
   # get_attribute() to get all href
   print(lnk.get_attribute('href'))

这段代码的问题在于它刮掉了所有链接,这意味着它也有绝对不必要的链接,就像在这个图像中不需要 javascript void 一样 最后,对于分页,我尝试按照此答案进行操作,但遇到了无限循环,因此我不得不删除分页代码。 总之,我正在尝试获取 id = 'search-properties' 的多个页面的链接

试试这个。

    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")

    for ele in links:
        print(ele.get_attribute('href'))

我试过这个用于分页。

    from selenium import webdriver
    import time

    driver = webdriver.Chrome(executable_path="path")
    driver.implicitly_wait(10)
    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    driver.quit()

我试过这个是为了从每个页面获取链接。

    driver.get("https://www.gharbazar.com/property/search/?_q=&_vt=1&_r=0&_pt=residential&_si=0&_srt=latest")
    page=2
    pagelinks= []
    #links of the 1st page
    links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
    for ele in links:
        pagelinks.append(ele.get_attribute('href'))

    while True:
        nextoption = driver.find_element_by_xpath("//div[@id='pagination-div']//a[contains(text(),'>>')]")
        driver.execute_script("arguments[0].scrollIntoView(true);",nextoption)
        driver.execute_script("window.scrollBy(0,-300)")
        time.sleep(5)
        try:
            driver.find_element_by_link_text(str(page)).click()
            page += 1
            links = driver.find_elements_by_xpath("//div[@id = 'search-properties']/a")
            for ele in links:
                pagelinks.append(ele.get_attribute('href'))
            time.sleep(3)

        except Exception as e:
            print(e)
            break

    print(len(pagelinks))
    for i in range(len(pagelinks)):
        print(pagelinks[i])

    driver.quit()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM