简体   繁体   English

如何知道多处理(python模块)中一个池中有多少个线程/工人?

[英]How to know how many threads /workers from a pool in multiprocessing (python module )has been completed?

I am using imapala shell to compute some stats over a text file containing the table names 我正在使用imapala shell在包含表名的文本文件上计算一些统计信息

I am using python multiprocessing module to pool the processes. 我正在使用python multiprocessing模块来合并进程。
The thing is thing task is very time consuming . 事情是事情的任务非常耗时。 So I need to keep track of how many files have been completed to see the job progress. 因此,我需要跟踪已完成多少文件才能查看作业进度。
So let me give you some ideas about the functions that I am using. 因此,让我给您一些有关我正在使用的功能的想法。

job_executor is the function that takes a list of tables and perform the tasks. job_executor是获取表列表并执行任务的函数。

main() is the functions , that takes file location , no of executors(pool_workers), converts the file containing table to list of tables and does the multiprocessing thing main()是函数,它采用文件位置,而不是executors(pool_workers),将包含表的文件转换为表列表并进行多处理

I want to see the progress like how much file has been processed by job_executor , But I can't find a solution . 我想看看job_executor处理了多少文件的进度,但是我找不到解决方案。 Usin a counter also doesn't work . 用柜台也不行。 Help Me 帮我

def job_executor(text):

    impala_cmd = "impala-shell -i %s -q  'compute stats %s.%s'" % (impala_node, db_name, text)
    impala_cmd_res = os.system(impala_cmd)  #runs impala Command    

    #checks for execution type(success or fail)
    if impala_cmd_res == 0:
        print ("invalidated the metadata.")
    else:
        print("error while performing the operation.")


def main(args):
    text_file_path = args.text_file_path
    NUM_OF_EXECUTORS = int(args.pool_executors)

    with open(text_file_path, 'r') as text_file_reader:
        text_file_rows = text_file_reader.read().splitlines()  # this will return list of all the tables in the file.
        process_pool = Pool(NUM_OF_EXECUTORS)
        try:
            process_pool.map(job_executor, text_file_rows)
            process_pool.close()
            process_pool.join()
        except Exception:
            process_pool.terminate()
            process_pool.join()


def parse_args():
    """
    function to take scrap arguments from  test_hr.sh file
    """
    parser = argparse.ArgumentParser(description='Main Process file that will start the process and session too.')
    parser.add_argument("text_file_path",
                        help='provide text file path/location to be read. ')  # text file fath
    parser.add_argument("pool_executors",
                        help='please provide pool executors as an initial argument') # pool_executor path

    return parser.parse_args() # returns list/tuple of all arguments.


if __name__ == "__main__":
    mail_message_start()

    main(parse_args())

    mail_message_end()

If you insist on needlessly doing it via multiprocessing.pool.Pool() , the easiest way to keep a track of what's going on is to use a non-blocking mapping (ie multiprocessing.pool.Pool.map_async() ): 如果您坚持通过multiprocessing.pool.Pool()进行不必要的操作,那么跟踪发生的事情的最简单方法是使用非阻塞映射(即multiprocessing.pool.Pool.map_async() ):

def main(args):
    text_file_path = args.text_file_path
    NUM_OF_EXECUTORS = int(args.pool_executors)

    with open(text_file_path, 'r') as text_file_reader:
        text_file_rows = text_file_reader.read().splitlines()
        total_processes = len(text_file_rows)  # keep the number of lines for reference
        process_pool = Pool(NUM_OF_EXECUTORS)
        try:
            print('Processing {} lines.'.format(total_processes))
            processing = process_pool.map_async(job_executor, text_file_rows)
            processes_left = total_processes  # number of processing lines left
            while not processing.ready():  # start a loop to wait for all to finish
                if processes_left != processing._number_left:
                    processes_left = processing._number_left
                    print('Processed {} out of {} lines...'.format(
                        total_processes - processes_left, total_processes))
                time.sleep(0.1)  # let it breathe a little, don't forget to `import time`
            print('All done!')
            process_pool.close()
            process_pool.join()
        except Exception:
            process_pool.terminate()
            process_pool.join()

This will check every 100ms if some of the processes finished processing and if something changed since the last check it will print out the number of lines processed so far. 这将每隔100毫秒检查一次某些进程是否已完成处理,以及自上次检查以来是否有任何更改,它将打印出到目前为止已处理的行数。 If you need more insight into what's going on with your subprocesses, you can use some of the shared structures like multiprocessing.Queue() or multiprocessing.Manager() structures to directly report from within your processes. 如果您需要进一步了解子流程的状况,可以使用某些共享结构(例如multiprocessing.Queue()multiprocessing.Manager()结构multiprocessing.Manager()直接从流程中进行报告。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python 多处理池:我怎么知道池中的所有工人何时完成? - python multiprocessing pool: how can I know when all the workers in the pool have finished? 从python多处理模块中的进程池中获取worker数 - Get number of workers from process Pool in python multiprocessing module Python 多处理:如何知道使用 Pool 或 Process? - Python multiprocessing: How to know to use Pool or Process? 如何使用共享状态初始化python多处理工作者池? - How to initialize a pool of python multiprocessing workers with a shared state? 知道池中已完成多少任务 - Knowing how many task have been completed in Pool 如何将多处理池分配给 Spark Worker - How to Distribute Multiprocessing Pool to Spark Workers 如何将多处理池(m &lt; n 的 n 个工作人员)的子池(m 个工作人员)分配给 python 中的某个任务? - How to assign subpool (m workers) of multiprocessing pool (n workers with m < n) to some task in python? 通过python的多处理模块在Pool worker中使用本地内存 - Using local memory in Pool workers with python's multiprocessing module Python套接字多进程工作池 - Python socket multiprocessing pool of workers Python 多处理:如何从池中的池返回结果? - Python multiprocessing: How to return results from Pool within pool?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM