简体   繁体   中英

Why does my urllib.request return a http error 403?

i am attempting to make a program that downloads a series of product pictures from a site using python. The site stores its images under a certain url format https://www.sitename.com/XYZabcde where XYZ are three letters that represent the brand of the product and abcde are a series of numbers in between 00000 and 30000. here is my code:

import urllib.request

def down(i, inp):
    full_path = 'images/image-{}.jpg'.format(i)
    url = "https://www.sitename.com/{}{}.jpg".format(inp,i)
    urllib.request.urlretrieve(url, full_path)

    print("saved")
    return None

inp = input("brand :" )

i = 20100

while i <= 20105:
    x = str(i)
    y = x.zfill(5)
    z = "https://www.sitename.com/{}{}.jpg".format(inp,y)
    print(z)
    down(y, inp)
    i += 1

With the code i have written i can successfully download a series of pictures from it which i know exist for example brand RVL from 20100 to 20105 will succesfully download those six pictures. however when i broaden the while loop to include links i dont know will give me an image i get this error code :

Traceback (most recent call last):
  File "c:/Users/euan/Desktop/university/programming/Python/parser/test - Copy.py", line 20, in <module>
    down(y, inp)
  File "c:/Users/euan/Desktop/university/programming/Python/parser/test - Copy.py", line 6, in down
    urllib.request.urlretrieve(url, full_path)
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 247, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 531, in open
    response = meth(req, response)
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 640, in http_response
    response = self.parent.error(
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 569, in error
    return self._call_chain(*args)
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "C:\Users\euan\AppData\Local\Programs\Python\Python38\lib\urllib\request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

what can i do to check and avoid any url that would yield this result?

You cannot as such know in advance which URLs you don't have access to, but you can surround the download with a try-except:

import urllib.request, urllib.error

...

def down(i, inp):
    full_path = 'images/image-{}.jpg'.format(i)
    url = "https://www.sitename.com/{}{}.jpg".format(inp,i)
    try:
        urllib.request.urlretrieve(url, full_path)
        print("saved")
    except urllib.error.HTTPError as e:
        print("failed:", e)


    return None

In that case it will just print eg "failed: HTTP Error 403: Forbidden" whenever a URL cannot be fetched, and the program will continue.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM