![](/img/trans.png)
[英]python multiprocessing pool: how can I know when all the workers in the pool have finished?
[英]How to know how many threads /workers from a pool in multiprocessing (python module )has been completed?
我正在使用imapala shell在包含表名的文本文件上計算一些統計信息
我正在使用python multiprocessing模塊來合並進程。
事情是事情的任務非常耗時。 因此,我需要跟蹤已完成多少文件才能查看作業進度。
因此,讓我給您一些有關我正在使用的功能的想法。
job_executor
是獲取表列表並執行任務的函數。
main()
是函數,它采用文件位置,而不是executors(pool_workers),將包含表的文件轉換為表列表並進行多處理
我想看看job_executor處理了多少文件的進度,但是我找不到解決方案。 用櫃台也不行。 幫我
def job_executor(text):
impala_cmd = "impala-shell -i %s -q 'compute stats %s.%s'" % (impala_node, db_name, text)
impala_cmd_res = os.system(impala_cmd) #runs impala Command
#checks for execution type(success or fail)
if impala_cmd_res == 0:
print ("invalidated the metadata.")
else:
print("error while performing the operation.")
def main(args):
text_file_path = args.text_file_path
NUM_OF_EXECUTORS = int(args.pool_executors)
with open(text_file_path, 'r') as text_file_reader:
text_file_rows = text_file_reader.read().splitlines() # this will return list of all the tables in the file.
process_pool = Pool(NUM_OF_EXECUTORS)
try:
process_pool.map(job_executor, text_file_rows)
process_pool.close()
process_pool.join()
except Exception:
process_pool.terminate()
process_pool.join()
def parse_args():
"""
function to take scrap arguments from test_hr.sh file
"""
parser = argparse.ArgumentParser(description='Main Process file that will start the process and session too.')
parser.add_argument("text_file_path",
help='provide text file path/location to be read. ') # text file fath
parser.add_argument("pool_executors",
help='please provide pool executors as an initial argument') # pool_executor path
return parser.parse_args() # returns list/tuple of all arguments.
if __name__ == "__main__":
mail_message_start()
main(parse_args())
mail_message_end()
如果您堅持通過multiprocessing.pool.Pool()
進行不必要的操作,那么跟蹤發生的事情的最簡單方法是使用非阻塞映射(即multiprocessing.pool.Pool.map_async()
):
def main(args):
text_file_path = args.text_file_path
NUM_OF_EXECUTORS = int(args.pool_executors)
with open(text_file_path, 'r') as text_file_reader:
text_file_rows = text_file_reader.read().splitlines()
total_processes = len(text_file_rows) # keep the number of lines for reference
process_pool = Pool(NUM_OF_EXECUTORS)
try:
print('Processing {} lines.'.format(total_processes))
processing = process_pool.map_async(job_executor, text_file_rows)
processes_left = total_processes # number of processing lines left
while not processing.ready(): # start a loop to wait for all to finish
if processes_left != processing._number_left:
processes_left = processing._number_left
print('Processed {} out of {} lines...'.format(
total_processes - processes_left, total_processes))
time.sleep(0.1) # let it breathe a little, don't forget to `import time`
print('All done!')
process_pool.close()
process_pool.join()
except Exception:
process_pool.terminate()
process_pool.join()
這將每隔100毫秒檢查一次某些進程是否已完成處理,以及自上次檢查以來是否有任何更改,它將打印出到目前為止已處理的行數。 如果您需要進一步了解子流程的狀況,可以使用某些共享結構(例如multiprocessing.Queue()
或multiprocessing.Manager()
結構multiprocessing.Manager()
直接從流程中進行報告。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.