简体   繁体   English

如何使用 selenium 抓取链接和图像

[英]How to scrape links and images by using selenium

I want get the links and images of these character我想获取这些角色的链接和图像

人物

Website Link网站链接

https://opensea.io/collection/meebits?search[sortAscending]=false&search[sortBy]=FAVORITE_COUNT https://opensea.io/collection/meebits?search[sortAscending]=false&search[sortBy]=FAVORITE_COUNT

XPATH SELECTION CODE XPATH 选择代码

coll_name = driver.find_elements(By.XPATH, '//h1')
coll_desc = driver.find_elements(By.XPATH, '//div[@class="sc-1xf18x6-0 sc-1aqfqq9-0 sc-1y1ib3i-7 haVRLx dfsEJr eGsklH"]')
profiles = driver.find_element(By.XPATH, '//div[@role="grid"]/div')
for profile in profiles:
    art_name = driver.find_elements(By.XPATH, '/div[@class="sc-7qr9y8-0 sc-dw611d-1 iUvoJs fcpvjL"]')
    art_price = driver.find_elements(By.XPATH, '//div[@class="sc-7qr9y8-0 iUvoJs Price--amount"]')
    art_link = driver.find_elements(By.LINK_TEXT, '(//link/@href)[16]')

FOR LOOP CODE FOR 循环代码

for c in coll_name:
    collection_name.append(c.text)
    time.sleep(1)


for d in coll_desc:
    collection_desc.append(d.text)
    time.sleep(1)

for n in art_name:
    artname.append(n.text)
    time.sleep(1)

for p in art_price:
    price.append(p.text)
    time.sleep(1)
    
for l in art_link:
    link.append(n.text)
    time.sleep(1)

PLEASE HELP ME IN SOLVING THIS ISSUE请帮助我解决这个问题

The following code will scroll the page, get the items on it, scroll again, and so on.以下代码将滚动页面,获取其中的项目,再次滚动,等等。 Selenium setup is for linux, you just need to pay attention to imports, and to the code after defining the driver/browser: Selenium 设置是针对 linux 的,你只需要注意导入,以及定义驱动程序/浏览器后的代码:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time as t
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")


webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

df_list = []
url = 'https://opensea.io/collection/meebits?search%5BsortAscending%5D=false&search%5BsortBy%5D=FAVORITE_COUNT'
browser.get(url)
while True:
    try:
        items = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@role='grid']/div[@role='gridcell']")))
        for item in items:
            print(item.text)
            print('______________')
    except Exception as e:
        continue
    browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    t.sleep(5)

A couple of notes:几点注意事项:

  • pay attention to imports, and how elements are waited to load in the pagee, before trying to locate them在尝试找到它们之前,请注意导入,以及如何等待元素在页面中加载
  • look up the specific details you need for each item (careful: some items have incomplete details, so use try/except for each of item's details), create a tuple with all item's elements, and append it to the df_list initialized at the beginning of the code.查找每个项目所需的具体详细信息(注意:某些项目的详细信息不完整,因此对每个项目的详细信息使用 try/except),创建一个包含所有项目元素的元组,并将其 append 到在开头初始化的 df_list编码。 You will be able to transform that list of tuples in a dataframe, once you got all elements一旦获得所有元素,您将能够转换 dataframe 中的元组列表
  • the code as it is now is an infinite loop;现在的代码是一个无限循环; you need to write a condition to break the loop once there are no more elements to be loaded一旦没有更多要加载的元素,您需要编写一个条件来中断循环
  • if you have issues understanding anything I wrote above, please revisit the basics of Python and Selenium 4.如果您在理解我上面写的任何内容时遇到问题,请重新阅读 Python 和 Selenium 4 的基础知识。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM