[英]How to create a for loop when scraping multiple pages of a url?
我希望能夠創建一個 for 循環來抓取具有多個頁面的 url。 我發現了一些這樣的例子,但是我的代碼需要身份驗證,因此我沒有共享實際的 url。 我輸入了一個顯示相同密鑰標識符“currentPage=1”的示例 url
因此,對於頁面 i 的此示例,它將是 currentPage=i,其中 i 將是 1,2,3,4....
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def requests_retry_session(retries=10,
backoff_factor=0.3,
status_forcelist=(500, 502, 503, 504),
session=None):
session = session or requests.Session()
retry = Retry(total=retries,
read=retries,
connect=retries,
backoff_factor=backoff_factor,
status_forcelist=status_forcelist)
adapter = HTTPAdapter(max_retries=retry)
session.mount("http://", adapter)
session.mount("https://", adapter)
return session
import io
import urllib3
import pandas as pd
from requests_kerberos import OPTIONAL, HTTPKerberosAuth
import mwinit2web
a = mwinit2web.get_mwinit_cookie()
urls = https://example-url.com/ABCD/customer.currentPage=1&end
def Scraper(url):
urllib3.disable_warnings()
with requests_retry_session() as req:
resp = req.get(url,
timeout=30,
verify=False,
allow_redirects=True,
auth=HTTPKerberosAuth(mutual_authentication=OPTIONAL),
cookies=a)
global df
data = pd.read_html(resp.text, flavor=None, header=0, index_col=0)
df = pd.concat(data, sort=False)
print(df)
s = Scraper(urls)
df
pageCount = 4 #say you have 3 pages
urlsList = []
base = "https://example-url.com/ABCD/customer.currentPage={}&end" #curly braces let you format
for x in range(pageCount)[1:]:
urlsList.append(base.format(x))
然后您可以將列表傳遞給您的函數。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.