在opensea中抓取頁面時如何打破while循環

Question

我正在嘗試廢棄 opnesea.io，我有可以抓取所有頁面的代碼，但我只需要前五頁來抓取，所以我嘗試打破循環，但它沒有這樣做。

time.sleep(2)  # Allow 2 seconds for the web page to open
data = []

path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
def scroll():

""" Get urls from Opensea.io """

  global data
  scroll_pause_time = 1 
  screen_height = driver.execute_script("return window.screen.height;") 
  i = 1

  num = 0
  while num True:
    
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    main_url = 'https://opensea.io'
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    soup = BS(driver.page_source, 'html.parser')
    divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
    for items in divs:
        link = main_url + items['href'] 
        print(link)
        d = {'link' : link}
        print('Done!')
       
        data.append(d)
        if (screen_height) * i > scroll_height:
            el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
            time.sleep(7)
            scroll()
            num += 1
            if num == 5:
                return
            

    
    

scroll()
print('Done ----> Opensea.io urls')

所以，你可以看到我在我的任務中使用了遞歸，我知道同時使用 while 循環和遞歸並不是一個好主意，但只有這樣它才能抓取超過一頁。

Answer 1

向函數添加參數並在函數外部創建一個名為 pages 的全局變量並將其傳遞給函數。 使用 if 語句檢查它是否小於 5 並在遞歸之前增加它。 如下所示：

time.sleep(2)  # Allow 2 seconds for the web page to open
data = []

path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
#create a new integer to count the number of recursions outside of the function
pages = 0
#pass it to the function
def scroll(pages):

""" Get urls from Opensea.io """

  global data
  scroll_pause_time = 1 
  screen_height = driver.execute_script("return window.screen.height;") 
  i = 1

  while num True:
    
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    main_url = 'https://opensea.io'
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    soup = BS(driver.page_source, 'html.parser')
    divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
    for items in divs:
        link = main_url + items['href'] 
        print(link)
        d = {'link' : link}
        print('Done!')
       
        data.append(d)
        if (screen_height) * i > scroll_height and pages < 5:
            el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
            time.sleep(7)
            #incremnt it before every recursion
            pages += 1
            scroll(pages)
            return

在opensea中抓取頁面時如何打破while循環

問題描述

1 個解決方案

解決方案1
0 2022-06-11 09:25:13

在opensea中抓取頁面時如何打破while循環

問題描述

1 個解決方案

解決方案1 0 2022-06-11 09:25:13

解決方案1
0 2022-06-11 09:25:13