线程中的subprocess.Popen

Question

I have a number of files (over 4000) that I want to simultaneously load into PostgreSQL. 我有很多文件（超过4000个），我想同时加载到PostgreSQL中。 I have separated them into 4 different file lists and I want a thread to iterate through each list loading the data. 我已将它们分成4个不同的文件列表，我想要一个线程迭代加载数据的每个列表。

The problem I have is that is I use os.system to call the loading program but this prevents the other threads from running simultaneously. 我遇到的问题是我使用os.system来调用加载程序但这会阻止其他线程同时运行。 If I use subprocess.Popen then they run simultaneously but the threads believe they have finished execeuting so move onto the next part of my script. 如果我使用subprocess.Popen然后它们同时运行但是线程认为它们已经完成了execeuting，所以移动到我的脚本的下一部分。

Am I doing this the right way? 我这样做是对的吗？ Or is there a better way to call subprocesses from within a thread. 或者是否有更好的方法从线程内调用子进程。

def thread1Load(self, thread1fileList):
    connectionstring = settings.connectionstring
    postgreshost = settings.postgreshost
    postgresdatabase = settings.postgresdatabase
    postgresport = settings.postgresport
    postgresusername = settings.postgresusername
    postgrespassword = settings.postgrespassword

    tablename = None
    encoding = None
    connection = psycopg2.connect(connectionstring)

    for filename in thread1fileList:
        load_cmd = #load command
        run = subprocess.Popen(load_cmd, shell=True)
    print "finished loading thread 1"


def thread2Load(self, thread2fileList):
    connectionstring = settings.connectionstring
    postgreshost = settings.postgreshost
    postgresdatabase = settings.postgresdatabase
    postgresport = settings.postgresport
    postgresusername = settings.postgresusername
    postgrespassword = settings.postgrespassword

    tablename = None

    connection = psycopg2.connect(connectionstring)
    for filename in thread2fileList:
        load_cmd = #load command            
        run = subprocess.Popen(load_cmd, shell=True)
    print "finished loading thread 2"


def thread3Load(self, thread3fileList):
    connectionstring = settings.connectionstring
    postgreshost = settings.postgreshost
    postgresdatabase = settings.postgresdatabase
    postgresport = settings.postgresport
    postgresusername = settings.postgresusername
    postgrespassword = settings.postgrespassword

    tablename = None
    connection = psycopg2.connect(connectionstring)

    for shapefilename in thread3fileList:
        load_cmd = #load command
        run = subprocess.Popen(load_cmd, shell=True)
    print "finished loading thread 3"

def thread4Load(self, thread4fileList):
    connectionstring = settings.connectionstring
    postgreshost = settings.postgreshost
    postgresdatabase = settings.postgresdatabase
    postgresport = settings.postgresport
    postgresusername = settings.postgresusername
    postgrespassword = settings.postgrespassword

    tablename = None

    connection = psycopg2.connect(connectionstring)

    for filename in thread4fileList:
        load_cmd = #load command
        run = subprocess.Popen(load_cmd, shell=True)

    print "finished loading thread 4"


def finishUp(self):
    print 'finishing up'


def main():
load = Loader()

thread1 = threading.Thread(target=(load.thread1Load), args=(thread1fileList, ))
thread2 = threading.Thread(target=(load.thread2Load), args=(thread2fileList, ))
thread3 = threading.Thread(target=(load.thread3Load), args=(thread3fileList, ))
thread4 = threading.Thread(target=(load.thread4Load), args=(thread4fileList, ))
threads = [thread1, thread2, thread3, thread4]
for thread in threads:
    thread.start()
    thread.join()


load.finishUp(connectionstring)

if __name__ == '__main__':
main()

Answer 1

Don't repeat yourself . 不要重复自己。 One threadLoad method suffices. 一个threadLoad方法就足够了。 That way, if you need to modify something in the method you do not need to make the same modification in 4 different places. 这样，如果您需要修改方法中的某些内容，则无需在4个不同的位置进行相同的修改。
Use run.communicate() to block until the subprocess is done. 使用run.communicate()来阻止，直到子run.communicate()完成。
This starts one thread, then blocks until that thread finishes, then starts another thread, etc: 这将启动一个线程，然后阻塞直到该线程完成，然后启动另一个线程等：
```
 for thread in threads: thread.start() thread.join() 
```
Instead, start all the threads first, then join all the threads: 相反，首先启动所有线程，然后加入所有线程：
```
 for thread in threads: thread.start() for thread in threads: thread.join() 
```

import subprocess
import threading


class Loader(object):
    def threadLoad(self, threadfileList):
        connectionstring = settings.connectionstring
        ...
        connection = psycopg2.connect(connectionstring)

        for filename in threadfileList:
            load_cmd =  # load command
            run = subprocess.Popen(load_cmd, shell=True)
            # block until subprocess is done
            run.communicate()
        name = threading.current_thread().name
        print "finished loading {n}".format(n=name)

    def finishUp(self):
        print 'finishing up'


def main():
    load = Loader()
    threads = [threading.Thread(target=load.threadLoad, args=(fileList, ))
               for fileList in (thread1fileList, thread2fileList,
                                thread3fileList, thread4fileList)]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

    load.finishUp(connectionstring)

if __name__ == '__main__':
    main()

线程中的subprocess.Popen

问题描述

1 个解决方案

解决方案1
7 已采纳 2013-03-27 17:42:22

线程中的subprocess.Popen

问题描述

1 个解决方案

解决方案1 7 已采纳 2013-03-27 17:42:22

解决方案1
7 已采纳 2013-03-27 17:42:22