繁体   English   中英

如何知道多处理(python模块)中一个池中有多少个线程/工人?

[英]How to know how many threads /workers from a pool in multiprocessing (python module )has been completed?

我正在使用imapala shell在包含表名的文本文件上计算一些统计信息

我正在使用python multiprocessing模块来合并进程。
事情是事情的任务非常耗时。 因此,我需要跟踪已完成多少文件才能查看作业进度。
因此,让我给您一些有关我正在使用的功能的想法。

job_executor是获取表列表并执行任务的函数。

main()是函数,它采用文件位置,而不是executors(pool_workers),将包含表的文件转换为表列表并进行多处理

我想看看job_executor处理了多少文件的进度,但是我找不到解决方案。 用柜台也不行。 帮我

def job_executor(text):

    impala_cmd = "impala-shell -i %s -q  'compute stats %s.%s'" % (impala_node, db_name, text)
    impala_cmd_res = os.system(impala_cmd)  #runs impala Command    

    #checks for execution type(success or fail)
    if impala_cmd_res == 0:
        print ("invalidated the metadata.")
    else:
        print("error while performing the operation.")


def main(args):
    text_file_path = args.text_file_path
    NUM_OF_EXECUTORS = int(args.pool_executors)

    with open(text_file_path, 'r') as text_file_reader:
        text_file_rows = text_file_reader.read().splitlines()  # this will return list of all the tables in the file.
        process_pool = Pool(NUM_OF_EXECUTORS)
        try:
            process_pool.map(job_executor, text_file_rows)
            process_pool.close()
            process_pool.join()
        except Exception:
            process_pool.terminate()
            process_pool.join()


def parse_args():
    """
    function to take scrap arguments from  test_hr.sh file
    """
    parser = argparse.ArgumentParser(description='Main Process file that will start the process and session too.')
    parser.add_argument("text_file_path",
                        help='provide text file path/location to be read. ')  # text file fath
    parser.add_argument("pool_executors",
                        help='please provide pool executors as an initial argument') # pool_executor path

    return parser.parse_args() # returns list/tuple of all arguments.


if __name__ == "__main__":
    mail_message_start()

    main(parse_args())

    mail_message_end()

如果您坚持通过multiprocessing.pool.Pool()进行不必要的操作,那么跟踪发生的事情的最简单方法是使用非阻塞映射(即multiprocessing.pool.Pool.map_async() ):

def main(args):
    text_file_path = args.text_file_path
    NUM_OF_EXECUTORS = int(args.pool_executors)

    with open(text_file_path, 'r') as text_file_reader:
        text_file_rows = text_file_reader.read().splitlines()
        total_processes = len(text_file_rows)  # keep the number of lines for reference
        process_pool = Pool(NUM_OF_EXECUTORS)
        try:
            print('Processing {} lines.'.format(total_processes))
            processing = process_pool.map_async(job_executor, text_file_rows)
            processes_left = total_processes  # number of processing lines left
            while not processing.ready():  # start a loop to wait for all to finish
                if processes_left != processing._number_left:
                    processes_left = processing._number_left
                    print('Processed {} out of {} lines...'.format(
                        total_processes - processes_left, total_processes))
                time.sleep(0.1)  # let it breathe a little, don't forget to `import time`
            print('All done!')
            process_pool.close()
            process_pool.join()
        except Exception:
            process_pool.terminate()
            process_pool.join()

这将每隔100毫秒检查一次某些进程是否已完成处理,以及自上次检查以来是否有任何更改,它将打印出到目前为止已处理的行数。 如果您需要进一步了解子流程的状况,可以使用某些共享结构(例如multiprocessing.Queue()multiprocessing.Manager()结构multiprocessing.Manager()直接从流程中进行报告。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM