從帶有 URL 的網站下載圖像並按描述排序

Question

我正在嘗試從網站下載圖像，然后能夠根據它們各自的描述將這些圖像分類到文件夾中。 在我的腳本中，我已經完成了解析 HTML 標簽並獲得了我需要的必要信息（每個圖像的 URL 以及該圖像的描述）的部分。 我還在此腳本中添加了另外兩列，即每個文件的名稱和完整路徑以及下載文件的名稱和文件夾。 我現在被困在我想做的下一個部分。 我希望能夠檢查一個文件夾是否已經存在，並在同一個 if 語句中，檢查文件名是否已經存在。 如果這兩個都是真的，那么腳本將移動到下一個鏈接。 如果文件不存在，那么它將創建文件夾並在那時下載文件。 我想做的下一部分是一個elif，文件夾在哪里不存在，然后它將創建文件夾並下載文件。 我在下面概述了我希望本節執行的操作。 我遇到的問題是我不知道如何下載文件或如何檢查它們。 如果我要從多個列表中提取信息，我也不知道它將如何工作。 對於每個鏈接，如果下載了文件，它必須從 csv 的另一列中提取完整路徑和名稱，這是另一個列表，我不明白我是如何設置它的，以便我可以做到這一點。 有人可以幫忙嗎...!!!

我的代碼直到我被困住的部分位於本節下方，該部分概述了我想要對腳本的下一部分執行的操作。

for elem in full_links
        if full_path  exists
                run test for if file name exists
                if file name exists = true
                        move onto the next file
                        if last file in list
                                break
                elif  file name exists = false
                        download image to location with with name in list

        elif full_path does not exist
                download image with file path and name

到目前為止我所做的代碼：

from bs4 import BeautifulSoup
from bs4 import SoupStrainer
from pip._vendor import requests
import csv
import time
import urllib.request
import pandas as pd 
import wget



URL = 'https://www.baps.org/Vicharan'
content = requests.get(URL)

soup = BeautifulSoup(content.text, 'html.parser')

#create a csv
f=csv.writer(open('crawl3.csv' , 'w'))
f.writerow(['description' , 'full_link', 'name','full_path' , 'full_path_with_jpg_name'])



# Use the 'fullview' class 
panelrow = soup.find('div' , {'id' : 'fullview'})

main_class =  panelrow.find_all('div' , {'class' : 'col-xl-3 col-lg-3 col-md-3 col-sm-12 col-xs-12 padding5'})

# Look for 'highslide-- img-flag' links
individual_classes = panelrow.find_all('a' , {'class' : 'highslide-- img-flag'})

# Get the img tags, each <a> tag contains one
images = [i.img for i in individual_classes]

for image in images:
    src=image.get('src')
    full_link = 'https://www.baps.org' + src
    description = image.get('alt')
    name = full_link.split('/')[-1]
    full_path = '/home/pi/image_downloader_test/' + description + '/'
    full_path_with_jpg_name = full_path + name 
    f.writerow([description , full_link , name, full_path , full_path_with_jpg_name])

print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')
print('finished with search  and csv created. Now moving onto download portion')
print('-----------------------------------------------------------------------')
print('-----------------------------------------------------------------------')



f = open('crawl3.csv')
csv_f = csv.reader(f)

descriptions = []
full_links = []
names = []
full_path = []
full_path_with_jpg_name = []

for row in csv_f:
    descriptions.append(row[0])
    full_links.append(row[1])
    names.append(row[2])
    full_path.append(row[3])
    full_path_with_jpg_name.append(row[4])

Answer 1

要回答您問題的各個部分：

要檢查文件夾或文件是否存在，請使用os模塊

import os if not os.path.exists(path_to_folder): os.makedirs(path_to_folder) if not os.path.exists(path_to_file): # do smth

下載文件
如果您有圖像的 src 以及要保存它的文件名，則可以使用urllib.request模塊下載該文件
```
urllib.request.urlretrieve(image_src, path_to_file)
```
同時遍歷多個列表
最后，如果你想從多個列表中提取信息，你可以使用內置的zip function 來做到這一點。 例如，如果你想同時遍歷full_links和full_path ，你可以這樣做
```
for link, path in zip(full_links, full_path): # do something with link and path
```

希望這可以幫助！

從帶有 URL 的網站下載圖像並按描述排序

問題描述

1 個解決方案

解決方案1
1 已采納 2020-05-24 07:17:24

從帶有 URL 的網站下載圖像並按描述排序

問題描述

1 個解決方案

解決方案1 1 已采納 2020-05-24 07:17:24

解決方案1
1 已采納 2020-05-24 07:17:24