简体   繁体   中英

Using threading to run a subprocess in parallel

I a linux script that I'm looking to automate through subprocess. Each iteration of subprocess should run the linux script in each subdirectory of a parent directory, and each of these subprocesses should run in a separate thread.

The way my directory is organized is as follows:

  • /parent/p1
  • /parent/p2....and so on till
  • /parent/p[n]

The first part of my code aims to run the process across all the subdirectories (p1, p2, p3...etc). It works fine for a fast process. However, many of my jobs need to run in the background, for which I usually use nohup and manually run them on a separate node. So every node in my terminal will run the same job on each directory (p1, p2, p3..etc). The latter part of my code (using threading) aims to achieve this, but what ends up happening is every node runs the same process (p1,p1,p1...etc) - basically by entire 'jobs' function is being passed through runSims when I want them separated out over the threads. Would someone know how I could further iterate the threading function to place different jobs on each node?

import os
import sys
import subprocess
import os.path
import threading

#takes the argument: python FOLDER_NAME #ofThreads
#Example: python /parent 8

directory = sys.argv[1] #in my case input is /parent 
threads = int(sys.argv[2]) #input is 8
category_name = directory.split('/')[-1] #splits parent as a word
folder_list = next(os.walk(directory))[1] #makes a list of subdirectories [p1,p2,p3..]

def jobs(cmd):
     for i in folder_list:
         f = open("/vol01/bin/dir/nohup.out", "w")
         cmd = subprocess.call(['nohup','python','np.py','{0}/{1}' .format(directory,i)],cwd = '/vol01/bin/dir', stdout=f)
     return cmd

def runSimThreads(numThreads):
    threads = []
    for i in range(numThreads):
         t = threading.Thread(target=jobs, args=(i,))
         threads.append(t)
         t.start()

#Wait for all threads to complete
main_thread = threading.currentThread()
for t in threads:
    if t is main_thread:
        continue
    t.join()

runSimThreads(threads)

That can't be your code.

import os
import sys
import subprocess
import os.path
import threading

#takes the argument: python FOLDER_NAME #ofThreads
#Example: python /parent 8

threads = 8 #input is 8

...
...

for t in threads:
    print("hello")

--output:--
TypeError: 'int' object is not iterable

You are using the same variable names everywhere, and that is confusing you (or me?).

You also do this:

def jobs(cmd):
     for i in folder_list:
         f = open("/vol01/bin/dir/nohup.out", "w")
         cmd =  "something"

You are overwriting your cmd parameter variable, which means that jobs() shouldn't have a parameter variable.

Response to comment1 :

import threading as thr
import time

def greet():
    print("hello world")

t = thr.Thread(target=greet)
t.start()
t.join()

--output:--
hello world

import threading as thr
import time

def greet(greeting):
    print(greeting)

t = thr.Thread(target=greet, args=("Hello, Newman.",) )
t.start()
t.join()

--output:--
Hello, Newman.

Below is the equivalent of what you are doing:

import threading as thr
import time

def greet(greeting):
    greeting = "Hello, Jerry."
    print(greeting)

t = thr.Thread(target=greet, args=("Hello, Newman.",) )
t.start()
t.join()

--output:--
Hello, Jerry.

And anyone reading that code would ask, "Why are you passing an argument to the greet() function when you don't use it?"

I'm relatively new to python

Well, your code does this:

threads = 8 

#Other irrelevant stuff here

for t in threads:
    print("hello")

and that will produce the error:

TypeError: 'int' object is not iterable

Do you know why?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM