简体   繁体   English

在opensea中抓取页面时如何打破while循环

[英]How to break while loop when scraping pages in opensea

I'm trying to scrap opnesea.io, I have code which scrapes all pages, but I need only first five page for scraping, so I have try to break loop, but it doesn't do it.我正在尝试废弃 opnesea.io,我有可以抓取所有页面的代码,但我只需要前五页来抓取,所以我尝试打破循环,但它没有这样做。

time.sleep(2)  # Allow 2 seconds for the web page to open
data = []

path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
def scroll():

""" Get urls from Opensea.io """

  global data
  scroll_pause_time = 1 
  screen_height = driver.execute_script("return window.screen.height;") 
  i = 1

  num = 0
  while num True:
    
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    main_url = 'https://opensea.io'
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    soup = BS(driver.page_source, 'html.parser')
    divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
    for items in divs:
        link = main_url + items['href'] 
        print(link)
        d = {'link' : link}
        print('Done!')
       
        data.append(d)
        if (screen_height) * i > scroll_height:
            el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
            time.sleep(7)
            scroll()
            num += 1
            if num == 5:
                return
            

    
    

scroll()
print('Done ----> Opensea.io urls')

So, you can see I use recursion for my task, I know that using a while loop and recursion at same time is not good idea, but only in this way it's scraping more than one page.所以,你可以看到我在我的任务中使用了递归,我知道同时使用 while 循环和递归并不是一个好主意,但只有这样它才能抓取超过一页。

Add a parameter to the function and create a global variable outside of the function called pages and pass it the function.向函数添加参数并在函数外部创建一个名为 pages 的全局变量并将其传递给函数。 Check if it's less than 5 using if statement and increment it before a recursion.使用 if 语句检查它是否小于 5 并在递归之前增加它。 Like below:如下所示:

time.sleep(2)  # Allow 2 seconds for the web page to open
data = []

path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
#create a new integer to count the number of recursions outside of the function
pages = 0
#pass it to the function
def scroll(pages):

""" Get urls from Opensea.io """

  global data
  scroll_pause_time = 1 
  screen_height = driver.execute_script("return window.screen.height;") 
  i = 1

  while num True:
    
    driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))  
    i += 1
    time.sleep(scroll_pause_time)
    main_url = 'https://opensea.io'
    # update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
    scroll_height = driver.execute_script("return document.body.scrollHeight;")
    soup = BS(driver.page_source, 'html.parser')
    divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
    for items in divs:
        link = main_url + items['href'] 
        print(link)
        d = {'link' : link}
        print('Done!')
       
        data.append(d)
        if (screen_height) * i > scroll_height and pages < 5:
            el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
            time.sleep(7)
            #incremnt it before every recursion
            pages += 1
            scroll(pages)
            return

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM