简体   繁体   中英

How to scrape links and images by using selenium

I want get the links and images of these character

人物

Website Link

https://opensea.io/collection/meebits?search[sortAscending]=false&search[sortBy]=FAVORITE_COUNT

XPATH SELECTION CODE

coll_name = driver.find_elements(By.XPATH, '//h1')
coll_desc = driver.find_elements(By.XPATH, '//div[@class="sc-1xf18x6-0 sc-1aqfqq9-0 sc-1y1ib3i-7 haVRLx dfsEJr eGsklH"]')
profiles = driver.find_element(By.XPATH, '//div[@role="grid"]/div')
for profile in profiles:
    art_name = driver.find_elements(By.XPATH, '/div[@class="sc-7qr9y8-0 sc-dw611d-1 iUvoJs fcpvjL"]')
    art_price = driver.find_elements(By.XPATH, '//div[@class="sc-7qr9y8-0 iUvoJs Price--amount"]')
    art_link = driver.find_elements(By.LINK_TEXT, '(//link/@href)[16]')

FOR LOOP CODE

for c in coll_name:
    collection_name.append(c.text)
    time.sleep(1)


for d in coll_desc:
    collection_desc.append(d.text)
    time.sleep(1)

for n in art_name:
    artname.append(n.text)
    time.sleep(1)

for p in art_price:
    price.append(p.text)
    time.sleep(1)
    
for l in art_link:
    link.append(n.text)
    time.sleep(1)

PLEASE HELP ME IN SOLVING THIS ISSUE

The following code will scroll the page, get the items on it, scroll again, and so on. Selenium setup is for linux, you just need to pay attention to imports, and to the code after defining the driver/browser:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains
import time as t
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

chrome_options = Options()
chrome_options.add_argument("--no-sandbox")


webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
browser = webdriver.Chrome(service=webdriver_service, options=chrome_options)

df_list = []
url = 'https://opensea.io/collection/meebits?search%5BsortAscending%5D=false&search%5BsortBy%5D=FAVORITE_COUNT'
browser.get(url)
while True:
    try:
        items = WebDriverWait(browser, 20).until(EC.presence_of_all_elements_located((By.XPATH, "//div[@role='grid']/div[@role='gridcell']")))
        for item in items:
            print(item.text)
            print('______________')
    except Exception as e:
        continue
    browser.execute_script("window.scrollTo(0,document.body.scrollHeight);")
    t.sleep(5)

A couple of notes:

  • pay attention to imports, and how elements are waited to load in the pagee, before trying to locate them
  • look up the specific details you need for each item (careful: some items have incomplete details, so use try/except for each of item's details), create a tuple with all item's elements, and append it to the df_list initialized at the beginning of the code. You will be able to transform that list of tuples in a dataframe, once you got all elements
  • the code as it is now is an infinite loop; you need to write a condition to break the loop once there are no more elements to be loaded
  • if you have issues understanding anything I wrote above, please revisit the basics of Python and Selenium 4.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM