如何知道多处理（python模块）中一个池中有多少个线程/工人？

Question

I am using imapala shell to compute some stats over a text file containing the table names 我正在使用imapala shell在包含表名的文本文件上计算一些统计信息

I am using python multiprocessing module to pool the processes. 我正在使用python multiprocessing模块来合并进程。
The thing is thing task is very time consuming . 事情是事情的任务非常耗时。 So I need to keep track of how many files have been completed to see the job progress. 因此，我需要跟踪已完成多少文件才能查看作业进度。
So let me give you some ideas about the functions that I am using. 因此，让我给您一些有关我正在使用的功能的想法。

job_executor is the function that takes a list of tables and perform the tasks. job_executor是获取表列表并执行任务的函数。

main() is the functions , that takes file location , no of executors(pool_workers), converts the file containing table to list of tables and does the multiprocessing thing main()是函数，它采用文件位置，而不是executors（pool_workers），将包含表的文件转换为表列表并进行多处理

I want to see the progress like how much file has been processed by job_executor , But I can't find a solution . 我想看看job_executor处理了多少文件的进度，但是我找不到解决方案。 Usin a counter also doesn't work . 用柜台也不行。 Help Me 帮我

def job_executor(text):

    impala_cmd = "impala-shell -i %s -q  'compute stats %s.%s'" % (impala_node, db_name, text)
    impala_cmd_res = os.system(impala_cmd)  #runs impala Command    

    #checks for execution type(success or fail)
    if impala_cmd_res == 0:
        print ("invalidated the metadata.")
    else:
        print("error while performing the operation.")


def main(args):
    text_file_path = args.text_file_path
    NUM_OF_EXECUTORS = int(args.pool_executors)

    with open(text_file_path, 'r') as text_file_reader:
        text_file_rows = text_file_reader.read().splitlines()  # this will return list of all the tables in the file.
        process_pool = Pool(NUM_OF_EXECUTORS)
        try:
            process_pool.map(job_executor, text_file_rows)
            process_pool.close()
            process_pool.join()
        except Exception:
            process_pool.terminate()
            process_pool.join()


def parse_args():
    """
    function to take scrap arguments from  test_hr.sh file
    """
    parser = argparse.ArgumentParser(description='Main Process file that will start the process and session too.')
    parser.add_argument("text_file_path",
                        help='provide text file path/location to be read. ')  # text file fath
    parser.add_argument("pool_executors",
                        help='please provide pool executors as an initial argument') # pool_executor path

    return parser.parse_args() # returns list/tuple of all arguments.


if __name__ == "__main__":
    mail_message_start()

    main(parse_args())

    mail_message_end()

Answer 1

If you insist on needlessly doing it via multiprocessing.pool.Pool() , the easiest way to keep a track of what's going on is to use a non-blocking mapping (ie multiprocessing.pool.Pool.map_async() ): 如果您坚持通过multiprocessing.pool.Pool()进行不必要的操作，那么跟踪发生的事情的最简单方法是使用非阻塞映射（即multiprocessing.pool.Pool.map_async() ）：

def main(args):
    text_file_path = args.text_file_path
    NUM_OF_EXECUTORS = int(args.pool_executors)

    with open(text_file_path, 'r') as text_file_reader:
        text_file_rows = text_file_reader.read().splitlines()
        total_processes = len(text_file_rows)  # keep the number of lines for reference
        process_pool = Pool(NUM_OF_EXECUTORS)
        try:
            print('Processing {} lines.'.format(total_processes))
            processing = process_pool.map_async(job_executor, text_file_rows)
            processes_left = total_processes  # number of processing lines left
            while not processing.ready():  # start a loop to wait for all to finish
                if processes_left != processing._number_left:
                    processes_left = processing._number_left
                    print('Processed {} out of {} lines...'.format(
                        total_processes - processes_left, total_processes))
                time.sleep(0.1)  # let it breathe a little, don't forget to `import time`
            print('All done!')
            process_pool.close()
            process_pool.join()
        except Exception:
            process_pool.terminate()
            process_pool.join()

This will check every 100ms if some of the processes finished processing and if something changed since the last check it will print out the number of lines processed so far. 这将每隔100毫秒检查一次某些进程是否已完成处理，以及自上次检查以来是否有任何更改，它将打印出到目前为止已处理的行数。 If you need more insight into what's going on with your subprocesses, you can use some of the shared structures like multiprocessing.Queue() or multiprocessing.Manager() structures to directly report from within your processes. 如果您需要进一步了解子流程的状况，可以使用某些共享结构（例如multiprocessing.Queue()或multiprocessing.Manager()结构multiprocessing.Manager()直接从流程中进行报告。

如何知道多处理（python模块）中一个池中有多少个线程/工人？

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-09-21 09:55:59

如何知道多处理（python模块）中一个池中有多少个线程/工人？

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-09-21 09:55:59

解决方案1
0 已采纳 2018-09-21 09:55:59