[英]how to extract all data from a web page with a scroll using selenium python and different rank pages?
我正在嘗試從https://opensea.io/rankings?category=new讀取所有 nfts,在 5 個不同的排名頁面上有 100 個 nfts,總共 500 個 nfts
我的代碼
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get("https://opensea.io/rankings?category=new")
driver.maximize_window()
time.sleep(3)
l= driver.find_element_by_xpath("//div[@role='list']")
nfts = l.find_elements(By.XPATH, "//div[@role='listitem']")
column_name = driver.find_element_by_class_name('ggkQUt')
column_name = column_name.text.split('\n')
my_data = {}
for i in column_name:
my_data[i] = []
del(my_data['arrow_drop_down'])
print(my_data)
for nft in nfts:
nft = nft.text.split('\n')
for item, col in zip(nft, my_data.keys()):
my_data[col].append(item)
這里 nfts 列表只包含 16 個 nfts,我知道這是因為 allnfts 同時在頁面上不可見,我嘗試解決它但找不到任何解決我問題的答案,我是 selenium 的新手任何幫助,將不勝感激
注:基於 Java 的解決方案
當您打開給定的 url 時,不會一次加載所有 100 個 NFT 行。 只有在小步向下滾動時,您才會獲得新的 NFT。 在此觀察的基礎上,我使用以下方法編寫代碼:
div[role='listitem'] div.cIYIHz span div
)。 這確保了一些 NFT 數據已經加載並准備好被我們的腳本使用div[role='listitem']
div.cIYIHz>span>div
)、Volume(cssSelector - div.jYqxGr span.heRNcW div
) 等。以 a 的形式存儲每行數據Map(K,V) 其中 K = 列名稱和 V = 當前行的該列下的值Set<Map<K,V>>
,其中集合中的每個項目對應於一行的數據Java 代碼(帶DEMO ):
package usecase;
import java.time.Duration;
import java.util.LinkedHashMap;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.support.ui.ExpectedConditions;
import org.openqa.selenium.support.ui.WebDriverWait;
import io.github.bonigarcia.wdm.WebDriverManager;
public class NFT {
static WebDriver driver;
static JavascriptExecutor jse;
public static WebElement findElement(By by) {
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15));
return wait.until(ExpectedConditions.elementToBeClickable(by));
}
public static List<WebElement> findElements(By by) {
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15));
return wait.until(ExpectedConditions.presenceOfAllElementsLocatedBy(by));
}
public static WebElement findChildElement(WebElement parent, By by) {
WebDriverWait wait = new WebDriverWait(driver, Duration.ofSeconds(15));
return wait.until(ExpectedConditions.presenceOfNestedElementLocatedBy(parent, by));
}
public static void main(String[] args) throws InterruptedException {
int stepSize = 400; //page scroll size in pixels
WebDriverManager.chromedriver().setup();
driver = new ChromeDriver();
driver.manage().window().maximize();
driver.get("https://opensea.io/rankings?category=new");
Set<Map<String, String>> uniqueNFTs = new LinkedHashSet<Map<String, String>>();
jse = (JavascriptExecutor) driver;
int totalPagesToCheck = 2, pageCounter = 1; //have set the maximum pages to scrape to 2. You can change it as per your needs
do {
long prev = -1L, curr = 0L;
findElement(By.cssSelector("div[role='listitem'] div.cIYIHz span div")); //wait for at-least one row's data to be present on the screen
while (prev != curr) {
List<WebElement> rows = findElements(By.cssSelector("div[role='listitem']")); //get all rows
for (WebElement row : rows) {
Map<String, String> rowData = new LinkedHashMap<String, String>();
rowData.put("name", findChildElement(row, By.cssSelector("div.cIYIHz>span>div")).getText()); //fetching the Collection name for current/each row
rowData.put("volume",
findChildElement(row, By.cssSelector("div.jYqxGr span.heRNcW div")).getText()); //fetching the Collection volume for current/each row. You can get other columns also similarly
uniqueNFTs.add(rowData);
}
jse.executeScript("window.scrollBy(0," + stepSize + ")"); //scroll down in small steps. Remember, we had set stepSize to 400 above. Change it as per your needs.
prev = curr;
curr = (Long) (jse.executeScript("return window.pageYOffset"));
}
try {
findElement(By.cssSelector("i[value='arrow_forward_ios']")).click();
pageCounter++;
} catch (Exception e) {
e.printStackTrace();
break;
}
} while (pageCounter <= totalPagesToCheck);
System.out.println(uniqueNFTs.size());
uniqueNFTs.forEach(nft -> System.out.println(nft));
driver.quit();
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.