簡體   English   中英

在opensea中抓取頁面時如何打破while循環

[英]How to break while loop when scraping pages in opensea

我正在嘗試廢棄 opnesea.io,我有可以抓取所有頁面的代碼,但我只需要前五頁來抓取,所以我嘗試打破循環,但它沒有這樣做。

time.sleep(2)  # Allow 2 seconds for the web page to open
data = []

path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
def scroll():

""" Get urls from Opensea.io """

  global data
  scroll_pause_time = 1 
  screen_height = driver.execute_script("return window.screen.height;") 
  i = 1

  num = 0
  while num True:
    
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    main_url = 'https://opensea.io'
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    soup = BS(driver.page_source, 'html.parser')
    divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
    for items in divs:
        link = main_url + items['href'] 
        print(link)
        d = {'link' : link}
        print('Done!')
       
        data.append(d)
        if (screen_height) * i > scroll_height:
            el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
            time.sleep(7)
            scroll()
            num += 1
            if num == 5:
                return
            

    
    

scroll()
print('Done ----> Opensea.io urls')

所以,你可以看到我在我的任務中使用了遞歸,我知道同時使用 while 循環和遞歸並不是一個好主意,但只有這樣它才能抓取超過一頁。

向函數添加參數並在函數外部創建一個名為 pages 的全局變量並將其傳遞給函數。 使用 if 語句檢查它是否小於 5 並在遞歸之前增加它。 如下所示:

time.sleep(2)  # Allow 2 seconds for the web page to open
data = []

path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
#create a new integer to count the number of recursions outside of the function
pages = 0
#pass it to the function
def scroll(pages):

""" Get urls from Opensea.io """

  global data
  scroll_pause_time = 1 
  screen_height = driver.execute_script("return window.screen.height;") 
  i = 1

  while num True:
    
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    main_url = 'https://opensea.io'
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    soup = BS(driver.page_source, 'html.parser')
    divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
    for items in divs:
        link = main_url + items['href'] 
        print(link)
        d = {'link' : link}
        print('Done!')
       
        data.append(d)
        if (screen_height) * i > scroll_height and pages < 5:
            el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
            time.sleep(7)
            #incremnt it before every recursion
            pages += 1
            scroll(pages)
            return

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM