简体   繁体   中英

downloading images using python and urllib.request

I have multiple urls in a csv file and I want to pull images from them. I download images every 1 second. I am able to do so by following code. However, there are some situations that image corrupt of url is not providing an image. When those things happens code stops the process. how could I code a logic that if the url is not working, it skips that particular site and continues the process?

import time
import pandas as pd
import urllib.request

starttime = time.time()

from datetime import datetime

while True:
    print("tick")
    time.sleep(1.0 - ((time.time() - starttime) % 1.0))

    def url_to_jpg(i, url, file_path):


        now = datetime.now()
        filename = 'image'+str(now)+'.jpg'

        # print("now =", now)

        full_path = '{}{}'.format(file_path, filename)
        urllib.request.urlretrieve(url, full_path)
        print('{} saved.'.format(filename))

        # return None
    FILENAME='image_urls.csv'
    FILE_PATH='images/'
    urls= pd.read_csv(FILENAME)


    for i, url in enumerate(urls.values):
        url_to_jpg(i, url[0], FILE_PATH)

Error that I get:

  File "url_imageblock.py", line 38, in <module>
    url_to_jpg(i, url[0], FILE_PATH)
  File "url_imageblock.py", line 25, in url_to_jpg
    urllib.request.urlretrieve(url, full_path)
  File "/usr/lib/python3.8/urllib/request.py", line 276, in urlretrieve
    block = fp.read(bs)
  File "/usr/lib/python3.8/http/client.py", line 459, in read
    n = self.readinto(b)
  File "/usr/lib/python3.8/http/client.py", line 503, in readinto
    n = self.fp.readinto(b)
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

I would change your code following way, firstly add

import urllib.error

immediately after import urllib.request then change

    urllib.request.urlretrieve(url, full_path)
    print('{} saved.'.format(filename))

to

    try:
        urllib.request.urlretrieve(url, full_path)
    except (urllib.error.URLError, urllib.error.HTTPError):
        print('{} failure'.format(url))
    else:
        print('{} saved.'.format(filename))

In case of one of mentioned errors occuring this code will just print url followed by failure , if no error occurs it does print filename followed by saved as earlier. If you want to know more about urllib related errors read urllib.error — Exception classes raised by urllib.request

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM