简体   繁体   English

Python 请求 - 在有页面时获取和保存图像

[英]Python requests - fetch and save images while there are pages

I am trying to fetch a bunch of images, from an uncertain number of pages, and keep saving all images to a folder while there are new pages to be hit.我正在尝试从不确定数量的页面中获取一堆图像,并在有新页面要点击时将所有图像保存到一个文件夹中。

The code:编码:

def get_player_images_with_api():

    url = 'https://footballapi.pulselive.com/football/players?pageSize=30&compSeasons=274&altIds=true&page={page}&type=player&id=-1&compSeasonId=274'
    img_url = 'https://resources.premierleague.com/premierleague/photos/players/250x250/{player_id}.png'
    headers = {'Origin': 'https://www.premierleague.com'}
    my_path = 'images/players_250x250'

    page=0
    while True:
        try:
            data = requests.get(url.format(page=page), headers=headers).json()
            # uncoment this to print all data:
            # print(json.dumps(data, indent=4))
            for player in data['content']:
                print('{:<50} {}'.format(player['name']['display'], img_url.format(player_id=player['altIds']['opta'])))

                fullfilename = os.path.join(my_path, player['name']['display'] + ".png")
                urllib.request.urlretrieve(img_url.format(player_id=player['altIds']['opta']), fullfilename)
                sleep(2)
            page+=1
        except:
            break

But the code is breaking after first page, and only images from first page are being saved to path.但是代码在第一页之后就中断了,只有第一页的图像被保存到路径中。

However, If I comment out these lines:但是,如果我注释掉这些行:

#fullfilename = os.path.join(my_path, player['name']['display'] + ".png")
#urllib.request.urlretrieve(img_url.format(player_id=player['altIds']['opta']), fullfilename)

All dozens of pages are printed fine.所有几十页都打印得很好。


What am I missing?我错过了什么?

I removed the part with urllib.request and replaced it with requests module + specified the headers= parameter.我用urllib.request删除了该部分,并将其替换为requests模块+指定了headers=参数。

Running this script it goes through pages and saves the images:运行此脚本,它会遍历页面并保存图像:

import os
from time import sleep
import requests

def get_player_images_with_api():

    url = 'https://footballapi.pulselive.com/football/players?pageSize=30&compSeasons=274&altIds=true&page={page}&type=player&id=-1&compSeasonId=274'
    img_url = 'https://resources.premierleague.com/premierleague/photos/players/250x250/{player_id}.png'
    headers = {'Origin': 'https://www.premierleague.com'}
    my_path = 'images/players_250x250'

    page=0
    while True:
        try:
            print('Page {}...'.format(page))
            data = requests.get(url.format(page=page), headers=headers).json()
            # uncoment this to print all data:
            # print(json.dumps(data, indent=4))
            for player in data['content']:
                pic = img_url.format(player_id=player['altIds']['opta'])
                print('{:<50} {}'.format(player['name']['display'], pic))

                fullfilename = os.path.join(my_path, player['name']['display'] + ".png")

                r = requests.get(pic, stream=True, headers=headers)
                if r.status_code == 200:
                    with open(fullfilename, 'wb') as f:
                        for chunk in r:
                            f.write(chunk)
                # sleep(2)
            page+=1
        except Exception as ex:
            print(ex)
            break

get_player_images_with_api()

Prints and saves the images:打印并保存图像:

    Page 0...
    Max Aarons                                         https://resources.premierleague.com/premierleague/photos/players/250x250/p232980.png
    Abdul Rahman Baba                                  https://resources.premierleague.com/premierleague/photos/players/250x250/p118335.png
    Tammy Abraham                                      https://resources.premierleague.com/premierleague/photos/players/250x250/p173879.png
    Adam Smith                                         https://resources.premierleague.com/premierleague/photos/players/250x250/p54469.png
    Che Adams                                          https://resources.premierleague.com/premierleague/photos/players/250x250/p200439.png
    Dennis Adeniran                                    https://resources.premierleague.com/premierleague/photos/players/250x250/p183645.png
    Albert Adomah                                      https://resources.premierleague.com/premierleague/photos/players/250x250/p49773.png
    Adrián                                             https://resources.premierleague.com/premierleague/photos/players/250x250/p60706.png
    Adrien Silva                                       https://resources.premierleague.com/premierleague/photos/players/250x250/p46483.png
    Benik Afobe                                        https://resources.premierleague.com/premierleague/photos/players/250x250/p88498.png
    Sergio Agüero                                      https://resources.premierleague.com/premierleague/photos/players/250x250/p37572.png
    Daniel Agyei                                       https://resources.premierleague.com/premierleague/photos/players/250x250/p207725.png
    Soufyan Ahannach                                   https://resources.premierleague.com/premierleague/photos/players/250x250/p134293.png
    Ahmed El Mohamady                                  https://resources.premierleague.com/premierleague/photos/players/250x250/p37339.png
    Albian Ajeti                                       https://resources.premierleague.com/premierleague/photos/players/250x250/p181008.png
    Nathan Aké                                         https://resources.premierleague.com/premierleague/photos/players/250x250/p126184.png
    Alberto Moreno                                     https://resources.premierleague.com/premierleague/photos/players/250x250/p100059.png
    Marc Albrighton                                    https://resources.premierleague.com/premierleague/photos/players/250x250/p51938.png
    Toby Alderweireld                                  https://resources.premierleague.com/premierleague/photos/players/250x250/p55605.png
    Aleix García                                       https://resources.premierleague.com/premierleague/photos/players/250x250/p178871.png
    Trent Alexander-Arnold                             https://resources.premierleague.com/premierleague/photos/players/250x250/p169187.png
    Ali Koiki                                          https://resources.premierleague.com/premierleague/photos/players/250x250/p432793.png
    Alisson                                            https://resources.premierleague.com/premierleague/photos/players/250x250/p116535.png
    Allan                                              https://resources.premierleague.com/premierleague/photos/players/250x250/p214275.png
    Miguel Almirón                                     https://resources.premierleague.com/premierleague/photos/players/250x250/p179018.png
    Marcos Alonso                                      https://resources.premierleague.com/premierleague/photos/players/250x250/p82263.png
    Steven Alzate                                      https://resources.premierleague.com/premierleague/photos/players/250x250/p235382.png
    Ibrahim Amadou                                     https://resources.premierleague.com/premierleague/photos/players/250x250/p128348.png
    Daniel Amartey                                     https://resources.premierleague.com/premierleague/photos/players/250x250/p155569.png
    Luke Amos                                          https://resources.premierleague.com/premierleague/photos/players/250x250/p168764.png
    Page 1...
    Ethan Ampadu                                       https://resources.premierleague.com/premierleague/photos/players/250x250/p199598.png
    Joseph Anang                                       https://resources.premierleague.com/premierleague/photos/players/250x250/p447879.png

... and so on.

In those 2 lines there is an exception happening.Try to to catch the exception.在这 2 行中发生了异常。尝试捕获异常。

Change to改成

 except Exception as ex:
        print(str(ex))
        # Take some action if you want.
        break

You can also catch specific exception by adding multiple except您还可以通过添加多个except来捕获特定异常

Your code is stopping whenever there is a exception.只要出现异常,您的代码就会停止。 But instead, you can just skip the download mechanism whenever there is an error.但是,只要出现错误,您就可以跳过下载机制。 :) :)

 except:
    continue

NOTE: This will also forbid you from pressing Ctrl + C to quit whenever you want to cancel using Ctrl + C .注意:这也将禁止您在想要使用Ctrl + C取消时按Ctrl + C退出。 You can use KeyboardInterrupt class to catch that exception.您可以使用 KeyboardInterrupt class 来捕获该异常。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM