[英]How can I loop scraping data for multiple pages in a website using python and beautifulsoup4
[英]Loop scraping for data for multiple pages in a website using a variable in a URL
我想根據變量 {event} 從 request2 中的 URL 循環抓取多個頁面。 這段代碼基本上會遍歷數據集“eventid”,並根據 request2 中的變量 {event} 抓取 URL 的多個頁面。 我遇到的問題是代碼只從“eventid”(991215)中抓取最后一個事件的頁面,然后停止。
>>> request1 = requests.get('https://www.odds.com.au/api/web/public/Meetings/getDataByRangeCacheable/?filter=events,regions,meetings&APIKey=65d5a3e79fcd603b3845f0dc7c2437f0&sportId=1®ionId[]=1®ionId[]=22®ionId[]=24®ionId[]=25®ionId[]=26®ionId[]=27®ionId[]=28®ionId[]=29®ionId[]=30&rangeStart=2020-02-19T16:00:00.356Z&rangeEnd=2020-02-20T15:59:59.356Z ')
# Data set from request1
>>> eventid = []
>>> json1 = request1.json()
>>> for id in json1.get('events'):
... eventid.append(id['id'])
>>> print(eventid)
[990607, 990111, 990594, 990614, 990608, 990112, 990595, 990615, 990609, 990113, 990114, 990115, 990116, 990117, 990118, 990119, 990324, 990325, 990326, 990327, 990295, 990286, 990328, 990318, 990296, 990287, 990329, 990319, 990297, 990288, 990330, 990320, 990311, 990298, 990289, 990331, 990321, 990312, 990299, 990290, 990322, 990313, 990300, 990291, 989959, 990323, 990314, 990301, 990292, 989960, 990315, 989822, 989961, 990316, 990303, 990293, 989962, 990317, 990304, 990294, 989963, 990305, 989964, 990306, 989965, 990307, 989966, 990308, 990309, 990310, 991142, 991143, 991144, 991145, 991146, 991232, 991211, 991218, 991147, 991233, 990583, 991212, 991219, 991148, 991234, 990584, 991213, 991220, 991235, 991149, 990585, 991214, 991221, 991236, 990586, 991215]
# Code I am having trouble with
>>> for event in eventid:
... request2 = requests.get(f'https://www.punters.com.au/api/web/public/Odds/getOddsComparisonCacheable/?allowGet=true&APIKey=65d5a3e79fcd603b3845f0dc7c2437f0&eventId={event}&betType=FixedWin', headers={'User-Agent': 'Mozilla/5.0'})
events = []
for event in eventid:
events.append(requests.get(f'https://www.punters.com.au/api/web/public/Odds/getOddsComparisonCacheable/?allowGet=true&APIKey=65d5a3e79fcd603b3845f0dc7c2437f0&eventId={event}&betType=FixedWin', headers={'User-Agent': 'Mozilla/5.0'}).json())
print(events)
您每次迭代都覆蓋request2
的內容,您應該將每個結果添加到列表中。
代碼沒問題,它確實for event in eventid:
循環中提取了for event in eventid:
每個事件。 請嘗試以下操作:
for event in eventid:
request2 = requests.get(f'https://www.punters.com.au/api/web/public/Odds/getOddsComparisonCacheable/?allowGet=true&APIKey=65d5a3e79fcd603b3845f0dc7c2437f0&eventId={event}&betType=FixedWin', headers={'User-Agent': 'Mozilla/5.0'})
tmp_json = request2.json()
print(tmp_json['eventId'], tmp_json['eventNameFull'])
輸出將是:
990607 Ludlow Race 5 - 5234m
990111 Newcastle Race 5 - 1765m
990594 Punchestown Race 6 - 4400m
990614 Doncaster Race 5 - 5721m
990608 Ludlow Race 6 - 3512m
...
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.