简体   繁体   中英

How do I bulk download images (70k) from urls with a restriction on the simultaneous downloads?

I'm a bit clueless. I have a csv file with these columns: name - picture url I would like to bulk download the 70k images into a folder, rename the images with the name in the first column and number them if there is more than one per name. Some are jpegs some are pngs.

I'm guessing I need to use pandas to get the data from the csv but I don't know how to make the downloading/renaming part without starting all the downloads at the same time, which will for sure crash my computer (It did, I wasn't even mad). Thanks in advance for any light you can shed on this.

尝试分批下载 500 张图像...然后休眠 1 秒钟并循环播放.... 相当耗时...但肯定会触发方法....对于编码参考,您可以探索像 urllib 这样的包(用于下载) 并在您下载文件后立即使用 os.rename() 更改名称....因为您已经知道该 csv 文件使用 Pandas...

I'm a coding idiot . . . so take this with a pinch of salt.

I needed to take a CSV in (column 1 with URL, Column 2 with filename) and put it in an existing folder in the directory that the code was running.

Worked like a charm for me. Maybe something here helps

#import Libraries
import csv
import urllib.request

# Function to download file from URL to a SPECIFIED FOLDER with a SPECIFIED NAME
def dl_jpg(url, file_path, file_name):  
  full_path = file_path + file_name + '.jpg' 
  urllib.request.urlretrieve(url, full_path)

# Get CSV File-name and Folder to store images from the user
file_name = input('Please enter filename:  ')
folder_name = input('Please enter a folder name:  ')

# Open SPECIFIED CSV and iterate through the URL on each line and call the Download function for each Line and filename.
with open(file_name + '.csv','r') as csv_file:  
    csv_reader = csv.reader(csv_file)
    next(csv_reader)
    for line in csv_reader:  
      print(line[0])
      dl_jpg(line[0],folder_name +'/', line[1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM