[英]How to break while loop when scraping pages in opensea
我正在嘗試廢棄 opnesea.io,我有可以抓取所有頁面的代碼,但我只需要前五頁來抓取,所以我嘗試打破循環,但它沒有這樣做。
time.sleep(2) # Allow 2 seconds for the web page to open
data = []
path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
def scroll():
""" Get urls from Opensea.io """
global data
scroll_pause_time = 1
screen_height = driver.execute_script("return window.screen.height;")
i = 1
num = 0
while num True:
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
main_url = 'https://opensea.io'
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
soup = BS(driver.page_source, 'html.parser')
divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
for items in divs:
link = main_url + items['href']
print(link)
d = {'link' : link}
print('Done!')
data.append(d)
if (screen_height) * i > scroll_height:
el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
time.sleep(7)
scroll()
num += 1
if num == 5:
return
scroll()
print('Done ----> Opensea.io urls')
所以,你可以看到我在我的任務中使用了遞歸,我知道同時使用 while 循環和遞歸並不是一個好主意,但只有這樣它才能抓取超過一頁。
向函數添加參數並在函數外部創建一個名為 pages 的全局變量並將其傳遞給函數。 使用 if 語句檢查它是否小於 5 並在遞歸之前增加它。 如下所示:
time.sleep(2) # Allow 2 seconds for the web page to open
data = []
path = ChromeDriverManager().install()
url = 'https://opensea.io/rankings?sortBy=seven_day_volume'
driver = webdriver.Chrome(path)
driver.get(url)
start = time.time()
#create a new integer to count the number of recursions outside of the function
pages = 0
#pass it to the function
def scroll(pages):
""" Get urls from Opensea.io """
global data
scroll_pause_time = 1
screen_height = driver.execute_script("return window.screen.height;")
i = 1
while num True:
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
main_url = 'https://opensea.io'
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
soup = BS(driver.page_source, 'html.parser')
divs = soup.find_all('a', class_='sc-1pie21o-0 elyzfO sc-1xf18x6-0 sc-1twd32i-0 sc-1idymv7-0 dGptxx kKpYwv iLNufV fresnel-lessThan-xl')
for items in divs:
link = main_url + items['href']
print(link)
d = {'link' : link}
print('Done!')
data.append(d)
if (screen_height) * i > scroll_height and pages < 5:
el = driver.find_element_by_xpath('//*[@id="main"]/div/div[3]/button[2]').click()
time.sleep(7)
#incremnt it before every recursion
pages += 1
scroll(pages)
return
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.