简体   繁体   中英

Python3 subprocess in multiprocessing

I'm a beginner in writing concurrent code.

I'm writing a code that takes the users ID and tries to return a full name of the user, the query takes a second or so to execute, so I was hoping to involve multiprocessing to collect the data faster; I think I'm close, but I don't get how the framework needs to be implemented correctly.

from subprocess import getoutput
from multiprocessing import Pool

all_users = ['User1', 'User2', 'User3', 'User4', 'User5', 'User6'] # example list

def get_name(userid):
    name = getoutput('net users {} /domain | findstr "full name:"'.format(userid)).replace('Full Name', '').strip().split('\n')[0]
    return {userid : name}


if __name__ == '__main__':
    with Pool(4) as p:
        print(p.map(get_name, all_users))

    print(' --------- finished')

print(' - exiting - '))

This is just a single step in a multi-step script; and the output appears as follows: (ignore the "the user name could not be found" part, just an example)

 - exiting -
 - exiting -
 - exiting -
 - exiting -
[{'User1': 'The user name could not be found.'}, {'User2': 'The user name could not be found.'}, {'User3': 'The user name could not be found.'}, {'User4': 'The user name could not be found.'}, {'User5': 'The user name could not be found.'}, {'User6': 'The user name could not be found.'}]
 --------- finished
 - exiting -

I'm trying to structure the program as follows:

  1. Get list of users
  2. Convert ID to names (asap, by spawning a separate process for each function call)
  3. Wait for the 2nd step to complete fully and then work with the data that was returned;

I've tried reading on the subject from various sources, but I just can't grasp the structure somehow... as I understood, I'm getting four - exiting - statements at the beginning as I have 4 cores, but how do I encapsulate this part of the code so that while it's running, nothing else is happening and the - exiting - is written only once at the end of it.

You need to use a pool.close() statement in your with loop:

with Pool(4) as p:
    print(p.map(get_name, all_users))
    p.close()

Josh Hayes already gave the right answer. If you use pool like that, it is going to call terminate on exit ( https://docs.python.org/3.4/library/multiprocessing.html?highlight=process ) since Python 3.3. You have to add p.close() to properly finish. However, your last bracket is too much and you should not see more than one finished and exiting print because those calls are not within the pool. How do you start your script? Which Python version are you using?

Edit: You may try to add:

import os

def info(title):
    print(title)
    print('module name:', __name__)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())

all_users = ['User1', 'User2', 'User3', 'User4', 'User5', 'User6'] # example list

def get_name(userid):
    name = getoutput('net users {} /domain | findstr "full name:"'.format(userid)).replace('Full Name', '').strip().split('\n')[0]
    print(info("p "))
    return {userid : name}

and call info("whatever") instead of exiting and see which processes are at work here. Which OS are you using? At least on Linux it makes sense.

Similar question was answered in the following link: multiple output returned from python multiprocessing function

To summarize:

# Import stuff
#If the worker code is in the main thread, exclude it from the if statement:
def worker():
    #worker code
if __name__ == '__main__':
    #execute whatever you want, it will only be executed 
    #as often as you intend it to
    #execute the function that starts multiprocessing, 

#All code outside of the if statement will be executed multiple times
#depending on the # of assigned worker threads.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM