[英]Python requests - fetch and save images while there are pages
I am trying to fetch a bunch of images, from an uncertain number of pages, and keep saving all images to a folder while there are new pages to be hit.我正在尝试从不确定数量的页面中获取一堆图像,并在有新页面要点击时将所有图像保存到一个文件夹中。
The code:编码:
def get_player_images_with_api():
url = 'https://footballapi.pulselive.com/football/players?pageSize=30&compSeasons=274&altIds=true&page={page}&type=player&id=-1&compSeasonId=274'
img_url = 'https://resources.premierleague.com/premierleague/photos/players/250x250/{player_id}.png'
headers = {'Origin': 'https://www.premierleague.com'}
my_path = 'images/players_250x250'
page=0
while True:
try:
data = requests.get(url.format(page=page), headers=headers).json()
# uncoment this to print all data:
# print(json.dumps(data, indent=4))
for player in data['content']:
print('{:<50} {}'.format(player['name']['display'], img_url.format(player_id=player['altIds']['opta'])))
fullfilename = os.path.join(my_path, player['name']['display'] + ".png")
urllib.request.urlretrieve(img_url.format(player_id=player['altIds']['opta']), fullfilename)
sleep(2)
page+=1
except:
break
But the code is breaking after first page, and only images from first page are being saved to path.但是代码在第一页之后就中断了,只有第一页的图像被保存到路径中。
However, If I comment out these lines:但是,如果我注释掉这些行:
#fullfilename = os.path.join(my_path, player['name']['display'] + ".png")
#urllib.request.urlretrieve(img_url.format(player_id=player['altIds']['opta']), fullfilename)
All dozens of pages are printed fine.所有几十页都打印得很好。
What am I missing?我错过了什么?
I removed the part with urllib.request
and replaced it with requests
module + specified the headers=
parameter.我用urllib.request
删除了该部分,并将其替换为requests
模块+指定了headers=
参数。
Running this script it goes through pages and saves the images:运行此脚本,它会遍历页面并保存图像:
import os
from time import sleep
import requests
def get_player_images_with_api():
url = 'https://footballapi.pulselive.com/football/players?pageSize=30&compSeasons=274&altIds=true&page={page}&type=player&id=-1&compSeasonId=274'
img_url = 'https://resources.premierleague.com/premierleague/photos/players/250x250/{player_id}.png'
headers = {'Origin': 'https://www.premierleague.com'}
my_path = 'images/players_250x250'
page=0
while True:
try:
print('Page {}...'.format(page))
data = requests.get(url.format(page=page), headers=headers).json()
# uncoment this to print all data:
# print(json.dumps(data, indent=4))
for player in data['content']:
pic = img_url.format(player_id=player['altIds']['opta'])
print('{:<50} {}'.format(player['name']['display'], pic))
fullfilename = os.path.join(my_path, player['name']['display'] + ".png")
r = requests.get(pic, stream=True, headers=headers)
if r.status_code == 200:
with open(fullfilename, 'wb') as f:
for chunk in r:
f.write(chunk)
# sleep(2)
page+=1
except Exception as ex:
print(ex)
break
get_player_images_with_api()
Prints and saves the images:打印并保存图像:
Page 0...
Max Aarons https://resources.premierleague.com/premierleague/photos/players/250x250/p232980.png
Abdul Rahman Baba https://resources.premierleague.com/premierleague/photos/players/250x250/p118335.png
Tammy Abraham https://resources.premierleague.com/premierleague/photos/players/250x250/p173879.png
Adam Smith https://resources.premierleague.com/premierleague/photos/players/250x250/p54469.png
Che Adams https://resources.premierleague.com/premierleague/photos/players/250x250/p200439.png
Dennis Adeniran https://resources.premierleague.com/premierleague/photos/players/250x250/p183645.png
Albert Adomah https://resources.premierleague.com/premierleague/photos/players/250x250/p49773.png
Adrián https://resources.premierleague.com/premierleague/photos/players/250x250/p60706.png
Adrien Silva https://resources.premierleague.com/premierleague/photos/players/250x250/p46483.png
Benik Afobe https://resources.premierleague.com/premierleague/photos/players/250x250/p88498.png
Sergio Agüero https://resources.premierleague.com/premierleague/photos/players/250x250/p37572.png
Daniel Agyei https://resources.premierleague.com/premierleague/photos/players/250x250/p207725.png
Soufyan Ahannach https://resources.premierleague.com/premierleague/photos/players/250x250/p134293.png
Ahmed El Mohamady https://resources.premierleague.com/premierleague/photos/players/250x250/p37339.png
Albian Ajeti https://resources.premierleague.com/premierleague/photos/players/250x250/p181008.png
Nathan Aké https://resources.premierleague.com/premierleague/photos/players/250x250/p126184.png
Alberto Moreno https://resources.premierleague.com/premierleague/photos/players/250x250/p100059.png
Marc Albrighton https://resources.premierleague.com/premierleague/photos/players/250x250/p51938.png
Toby Alderweireld https://resources.premierleague.com/premierleague/photos/players/250x250/p55605.png
Aleix García https://resources.premierleague.com/premierleague/photos/players/250x250/p178871.png
Trent Alexander-Arnold https://resources.premierleague.com/premierleague/photos/players/250x250/p169187.png
Ali Koiki https://resources.premierleague.com/premierleague/photos/players/250x250/p432793.png
Alisson https://resources.premierleague.com/premierleague/photos/players/250x250/p116535.png
Allan https://resources.premierleague.com/premierleague/photos/players/250x250/p214275.png
Miguel Almirón https://resources.premierleague.com/premierleague/photos/players/250x250/p179018.png
Marcos Alonso https://resources.premierleague.com/premierleague/photos/players/250x250/p82263.png
Steven Alzate https://resources.premierleague.com/premierleague/photos/players/250x250/p235382.png
Ibrahim Amadou https://resources.premierleague.com/premierleague/photos/players/250x250/p128348.png
Daniel Amartey https://resources.premierleague.com/premierleague/photos/players/250x250/p155569.png
Luke Amos https://resources.premierleague.com/premierleague/photos/players/250x250/p168764.png
Page 1...
Ethan Ampadu https://resources.premierleague.com/premierleague/photos/players/250x250/p199598.png
Joseph Anang https://resources.premierleague.com/premierleague/photos/players/250x250/p447879.png
... and so on.
In those 2 lines there is an exception happening.Try to to catch the exception.在这 2 行中发生了异常。尝试捕获异常。
Change to改成
except Exception as ex:
print(str(ex))
# Take some action if you want.
break
You can also catch specific exception by adding multiple except
您还可以通过添加多个except
来捕获特定异常
Your code is stopping whenever there is a exception.只要出现异常,您的代码就会停止。 But instead, you can just skip the download mechanism whenever there is an error.但是,只要出现错误,您就可以跳过下载机制。 :) :)
except:
continue
NOTE: This will also forbid you from pressing Ctrl + C to quit whenever you want to cancel using Ctrl + C .注意:这也将禁止您在想要使用Ctrl + C取消时按Ctrl + C退出。 You can use KeyboardInterrupt class to catch that exception.您可以使用 KeyboardInterrupt class 来捕获该异常。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.