简体   繁体   English

python中for循环的多线程

[英]Multithreading of For loop in python

I am creating some scripts to help my database import while using docker. 我正在创建一些脚本来帮助我在使用docker时导入数据库。 I currently have a directory filled with data, and I want to import as quickly as possibly. 我目前有一个充满数据的目录,我想尽可能快地导入。

All of the work done is all single threaded, so I wanted to speed things up by passing off multiple jobs at once to each thread on my server. 所有完成的工作都是单线程的,所以我想通过一次将多个作业传递给我服务器上的每个线程来加快速度。

This is done by this code I've written. 这是通过我编写的代码完成的。

#!/usr/bin/python
import sys
import subprocess

cities = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"];

for x in cities:
    dockerscript = "docker exec -it worker_1 perl import.pl ./public/%s %s mariadb" % (x,x)
    p = subprocess.Popen(dockerscript, shell=True, stderr=subprocess.PIPE)

This works fine if I have more than 10 cores, each gets its own. 如果我有超过10个核心,每个核心都有自己的核心,这种方法很好。 What I want to do is set it up, so if I have 4 cores, the first 4 iterations of the dockerscript runs, 1 to 4, and 5 to 10 wait. 我想要做的是设置它,所以如果我有4个核心,则dockerscript的前4次迭代运行,1到4和5到10等待。

Once any of the 1 to 4 completes, 5 is started and so on until it is all completed. 一旦完成1到4中的任何一个,就会启动5,依此类推,直到全部完成。

I am just having a harder time figuring out how to do this is python 我只是更难以确定如何做到这一点是python

Thanks 谢谢

You should use multiprocessing.Pool() which will automatically create one process per core, then submit your jobs to it. 您应该使用multiprocessing.Pool() ,它将自动为每个核心创建一个进程,然后将作业提交给它。 Each job will be a function which calls subprocess to start Docker. 每个作业都是一个调用subprocess来启动Docker的函数。 Note that you need to make sure the jobs are synchronous--ie the Docker command must not return before it is done working. 请注意,您需要确保作业是同步的 - 即Docker命令在完成工作之前不得返回。

John already has the answer but there are a couple of subtleties worth mentioning. 约翰已经有了答案,但有一些微妙之处值得一提。 A thread pool is fine for this application because the thread just spends its time blocked waiting for the subprocess to terminate. 线程池适用于此应用程序,因为线程只是花时间阻塞等待子进程终止。 And you can use map with chunksize=1 so the pool goes back to the parent to fetch a new job on each iteration. 并且您可以使用带有chunksize=1 map ,以便池返回到父级以在每次迭代时获取新作业。

#!/usr/bin/python
import sys
import subprocess
import multiprocessing.pool

cities = ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]

def run_docker(city):
    return subprocess.call(['docker', 'exec', '-it', 'worker_1', 'perl',
        'import.pl', './public/{0}'.format(city), city, 'mariadb'])

pool = multiprocessing.pool.ThreadPool()
results = pool.map(run_docker, cities, chunksize=1)
pool.close()
pool.join()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM