download images from a website with URL and sorting by description

Question

I am trying to download images from a website and then be able to sort those images into folders based on their respective descriptions. in my script, i have gotten up to the part where i have parsed the HTML tags and gotten the necessary information that i need (the URL of each image, and the description of that image). I also added in this script two more columns, the name of each file and the full path with the name and folder where the file would be downloaded. I am now stuck on the next parts that i want to do. I want to be able to check for if a folder already exists, and in that same if statement, check to see if the file name already exists. If both of these are true, then the script will move onto the next link. If the file does not exist, then it will create the folder and download the file at that time. The next part of what i want to do is an elif, where is the folder does not exist, then it will create the folder and download the file. I outlined what i want this section to do below. The problem that i am running into is that i do not know how to download the files or how to check for them. I also do not know how it will work if i am to be pulling information from multiple lists. For each link, if the file is downloaded, it has to pull the full path and name from another column in the csv which is another list and i do not understand how i set it up so that i can do that. Can someone please help...!!!

My code for up until the part that i am stuck with is below this section that outlines what i want to do with the next part of my script.

for elem in full_links
        if full_path  exists
                run test for if file name exists
                if file name exists = true
                        move onto the next file
                        if last file in list
                                break
                elif  file name exists = false
                        download image to location with with name in list

        elif full_path does not exist
                download image with file path and name

Code that i have done so far:

from bs4 import BeautifulSoup
from bs4 import SoupStrainer
from pip._vendor import requests
import csv
import time
import urllib.request
import pandas as pd 
import wget



URL = 'https://www.baps.org/Vicharan'
content = requests.get(URL)

soup = BeautifulSoup(content.text, 'html.parser')

#create a csv
f=csv.writer(open('crawl3.csv' , 'w'))
f.writerow(['description' , 'full_link', 'name','full_path' , 'full_path_with_jpg_name'])



# Use the 'fullview' class 
panelrow = soup.find('div' , {'id' : 'fullview'})

main_class =  panelrow.find_all('div' , {'class' : 'col-xl-3 col-lg-3 col-md-3 col-sm-12 col-xs-12 padding5'})

# Look for 'highslide-- img-flag' links
individual_classes = panelrow.find_all('a' , {'class' : 'highslide-- img-flag'})

# Get the img tags, each <a> tag contains one
images = [i.img for i in individual_classes]

for image in images:
    src=image.get('src')
    full_link = 'https://www.baps.org' + src
    description = image.get('alt')
    name = full_link.split('/')[-1]
    full_path = '/home/pi/image_downloader_test/' + description + '/'
    full_path_with_jpg_name = full_path + name 
    f.writerow([description , full_link , name, full_path , full_path_with_jpg_name])

print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')
print('finished with search  and csv created. Now moving onto download portion')
print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')



f = open('crawl3.csv')
csv_f = csv.reader(f)

descriptions = []
full_links = []
names = []
full_path = []
full_path_with_jpg_name = []

for row in csv_f:
    descriptions.append(row[0])
    full_links.append(row[1])
    names.append(row[2])
    full_path.append(row[3])
    full_path_with_jpg_name.append(row[4])

Answer 1

To answer the various parts of your question:

To check if a folder or file exists, use the os module

import os if not os.path.exists(path_to_folder): os.makedirs(path_to_folder) if not os.path.exists(path_to_file): # do smth

Downloading files
If you have the src of an image, and the file name that you want to save it in, you can download the file with the urllib.request module as such
```
urllib.request.urlretrieve(image_src, path_to_file)
```
Iterating through multiple lists at the same time
Finally, if you want to pull information from multiple lists, you can do this using the built-in zip function. For example, if you want to iterate through full_links and full_path at the same time, you can do it like so
```
for link, path in zip(full_links, full_path): # do something with link and path
```

Hope this helps!

download images from a website with URL and sorting by description

Question

1 answers

solution1
1 ACCPTED 2020-05-24 07:17:24

download images from a website with URL and sorting by description

Question

1 answers

solution1 1 ACCPTED 2020-05-24 07:17:24

solution1
1 ACCPTED 2020-05-24 07:17:24