I have problem couse i need find bad urls of pictures its my script:
import requests
import csv
import time
with open(nazwa_pliku) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=';')
count=0
mapa = []
id = 1
next(csv_reader)
next(csv_reader)
for row in csv_reader:
if row[1] != "":
ID=row[0]
NUMBER=row[1]
PICTURES=row[2].split('|')
for url in PICTURES:
url="https://sw67383.mywebshop.io/upload_dir/shop/"+url
result = requests.get(url, stream=True)
if result.status_code != 200:
print(colored("Brak: ", "red"), url)
object = {
"PRODUCT_ID": ID,
"NUMBER":NUMBER,
"PHOTO":url,
}
count += 1
mapa.append(object)
else:
print(colored(str(id)+" Poprawny: ", "green"), url)
id+=1
print(colored("Liczba Brakujących zdjęć: ", "yellow")+"{}/{}").format(count,id)
return mapa
For example i get it from csv files and I request urls but some times i have connection error i dont know why. Maybe my internet or server.
and i getting error
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='sw67383.mywebshop.io', port=443): Max retries exceeded with url: /upload_dir/shop/maxtone/MAXTON_4306_4.jpg (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object.....
What can i do to avoid this problem
I need to check 3000 urls of pictures. And in future much more.
EDIT: I change it like
for row in csv_reader:
if row[1] != "":
ID=row[0]
NUMBER=row[1]
PICTURES=row[2].split('|')
for url in PICTURES:
url="https://sw67383.mywebshop.io/upload_dir/shop/"+url
try:
result = requests.get(url, stream=True)
if result.status_code != 200:
print(colored("Brak: ", "red"), url)
object = {
"PRODUCT_ID": ID,
"NUMBER": NUMBER,
"PHOTO": url,
}
count += 1
mapa.append(object)
else:
print(colored(str(id) + " Poprawny: ", "green"), url)
id += 1
except requests.ConnectionError:
print("Problem z połączeniem z adresem: {} ".format(url))
And now i know when is "time out" but not good when it will bad link to picture (404):P so maybe i shoud save this to object too? and manual verify link like its correct url or wrong
Ok I foud it how I can avoid problem:
for url in PICTURES:
url="https://sw67383.mywebshop.io/upload_dir/shop/"+url
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
result = session.get(url)
if result.status_code != 200:
print(colored("Brak: ", "red"), url)
object = {
"PRODUCT_ID": ID,
"NUMBER": NUMBER,
"PHOTO": url,
"COMMUNICATE":"BRAK"
}
count += 1
mapa.append(object)
else:
print(colored(str(id) + " Poprawny: ", "green"), url)
id += 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.