如何使用 selenium 抓取链接和图像

Question

我想获取这些角色的链接和图像

网站链接

https://opensea.io/collection/meebits?search[sortAscending]=false&search[sortBy]=FAVORITE_COUNT

XPATH 选择代码

coll_name = driver.find_elements(By.XPATH, '//h1')
coll_desc = driver.find_elements(By.XPATH, '//div[@class="sc-1xf18x6-0 sc-1aqfqq9-0 sc-1y1ib3i-7 haVRLx dfsEJr eGsklH"]')
profiles = driver.find_element(By.XPATH, '//div[@role="grid"]/div')
for profile in profiles:
    art_name = driver.find_elements(By.XPATH, '/div[@class="sc-7qr9y8-0 sc-dw611d-1 iUvoJs fcpvjL"]')
    art_price = driver.find_elements(By.XPATH, '//div[@class="sc-7qr9y8-0 iUvoJs Price--amount"]')
    art_link = driver.find_elements(By.LINK_TEXT, '(//link/@href)[16]')

FOR 循环代码

for c in coll_name:
    collection_name.append(c.text)
    time.sleep(1)


for d in coll_desc:
    collection_desc.append(d.text)
    time.sleep(1)

for n in art_name:
    artname.append(n.text)
    time.sleep(1)

for p in art_price:
    price.append(p.text)
    time.sleep(1)
    
for l in art_link:
    link.append(n.text)
    time.sleep(1)

请帮助我解决这个问题

Answer 1

以下代码将滚动页面，获取其中的项目，再次滚动，等等。 Selenium 设置是针对 linux 的，你只需要注意导入，以及定义驱动程序/浏览器后的代码：

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time as t
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")


webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

df_list = []
url = 'https://opensea.io/collection/meebits?search%5BsortAscending%5D=false&search%5BsortBy%5D=FAVORITE_COUNT'
browser.get(url)
while True:
    try:
        items = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@role='grid']/div[@role='gridcell']")))
        for item in items:
            print(item.text)
            print('______________')
    except Exception as e:
        continue
    browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    t.sleep(5)

几点注意事项：

在尝试找到它们之前，请注意导入，以及如何等待元素在页面中加载
查找每个项目所需的具体详细信息（注意：某些项目的详细信息不完整，因此对每个项目的详细信息使用 try/except），创建一个包含所有项目元素的元组，并将其 append 到在开头初始化的 df_list编码。 一旦获得所有元素，您将能够转换 dataframe 中的元组列表
现在的代码是一个无限循环； 一旦没有更多要加载的元素，您需要编写一个条件来中断循环
如果您在理解我上面写的任何内容时遇到问题，请重新阅读 Python 和 Selenium 4 的基础知识。

如何使用 selenium 抓取链接和图像

问题描述

1 个解决方案

解决方案1
0 2022-07-30 14:34:28

如何使用 selenium 抓取链接和图像

问题描述

1 个解决方案

解决方案1 0 2022-07-30 14:34:28

解决方案1
0 2022-07-30 14:34:28