[英]Download multiple files in different SFTP directories to local
I have a scenario where we need to download certain image files in different directories in SFTP server to local.我有一个场景,我们需要将 SFTP 服务器中不同目录中的某些图像文件下载到本地。
Example :
/IMAGES/folder1 has img11, img12, img13, img14
/IMAGES/folder2 has img21, img22, img23, img24
/IMAGES/folder3 has img31, img32, img33, img34
And I need to download img12, img23 and img34 from folder 1, 2 and 3 respectively
Right now I go inside each folder and get the images individually which takes an extraordinary amount of time(since there are 10,000s of images to download).现在我进入每个文件夹并单独获取图像,这需要花费大量时间(因为要下载 10,000 张图像)。
I have also found out that downloading a single file of the same size(as that of multiple image files) takes a fraction of the time.我还发现下载相同大小的单个文件(与多个图像文件一样)只需要一小部分时间。
My question is, is there a way to get these multiple files together instead of downloading them one after another ?我的问题是,有没有办法将这些多个文件放在一起而不是一个接一个地下载它们?
One approach I came up with was to copy all the files to a temp folder in sftp server and then download the directory but sftp does not allow 'copy', and I can not use 'rename' because then I will be moving the files to temp directory我想出的一种方法是将所有文件复制到 sftp 服务器中的临时文件夹,然后下载该目录,但 sftp 不允许“复制”,我不能使用“重命名”,因为那样我会将文件移动到临时目录
You could use a process pool to open multiple sftp connections and download in parallel.您可以使用进程池打开多个 sftp 连接并并行下载。 For example,例如,
from paramiko import SSHClient
from multiprocessing import Pool
def download_init(host):
global client, sftp
client = SSHClient()
client.load_system_host_keys()
client.connect(host)
sftp = ssh_client.open_sftp()
def download_close(dummy):
client.close()
def download_worker(params):
local_path, remote_path = *params
sftp.get(remote_path, local_path)
list_of_local_and_remote_files = [
["/client/files/folder1/img11", "/IMAGES/folder1/img11"],
]
def downloader(files):
pool_size = 8
pool = Pool(8, initializer=download_init,
initargs=["sftpserver.example.com"])
result = pool.map(download_worker, files, chunksize=10)
pool.map(download_close, range(pool_size))
if __name__ == "__main__":
downloader(list_of_local_and_remote_files)
Its unfortunate that Pool
doesn't have a finalizer to undo what was set in the initializer.不幸的是, Pool
没有终结器来撤消在初始化器中设置的内容。 Its not usually necessary - the exiting process is cleanup enough.它通常不是必需的 - 现有的进程已经足够清理了。 In the example I just wrote a separate worker function that cleans things up.在示例中,我只是编写了一个单独的工作函数来清理内容。 By having 1 work item per pool process, they each get 1 call.通过每个池进程有 1 个工作项,它们每个都得到 1 个调用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.