[英]How to crawl website slower in Python with Jupyter Notebook?
我當前的python腳本將在一秒鍾內用2頁在網站上執行Web抓取。 我想使其變慢,例如一頁上顯示25秒。 我怎么做?
我嘗試了以下python腳本。
# Dependencies
from bs4 import BeautifulSoup
import requests
import pandas as pd
# Testing
linked = 'https://www.zillow.com/homes/for_sale/San-Francisco-CA/fsba,fsbo,fore,new_lt/house_type/20330_rid/globalrelevanceex_sort/37.859675,-122.285557,37.690612,-122.580815_rect/11_zm/{}_p/0_mmm/'
for link in [linked.format(page) for page in range(1,2)]:
user_agent = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(link, headers=headers)
soup = BeautifulSoup(response.text, 'html.pafinite-item')
print(soup)
我應該在腳本中添加什么以使網絡抓取速度變慢?
只需使用time.sleep
:
import requests
import pandas as pd
from time import sleep
from bs4 import BeautifulSoup
linked = 'https://www.zillow.com/homes/for_sale/San-Francisco-CA/fsba,fsbo,fore,new_lt/house_type/20330_rid/globalrelevanceex_sort/37.859675,-122.285557,37.690612,-122.580815_rect/11_zm/{}_p/0_mmm/'
for link in [linked.format(page) for page in range(1,2)]:
sleep(25.0)
user_agent = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'
headers = {'User-Agent': user_agent}
response = requests.get(link, headers=headers)
soup = BeautifulSoup(response.text, 'html.pafinite-item')
print(soup)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.