繁体   English   中英

在python中使用多重处理时全局列表未更新

[英]Global List not updating when using multiprocessing in python

我有一些代码(这不是完整的文件):

chunk_list = []

def makeFakeTransactions(store_num, num_transactions):

    global chunk_list
    startTime = datetime.now()
    data_load_datetime = startTime.isoformat()
    data_load_name = "Faked Data v2.2"
    data_load_path = "data was faked"
    index_list = []
    number_of_stores = store_num + 10
    number_of_terminals = 13

    for month in range(1, 13):
        number_of_days = 30
        extra_day_months = [1, 3, 5, 7, 8, 10, 12]
        if month == 2:
            number_of_days = 28
        elif month in extra_day_months:
            number_of_days = 31
        for day in range(1, number_of_days + 1):
            for store in range(store_num, number_of_stores):
                operator_id = "0001"
                operator_counter = 1
                if store < 11:
                    store_number = "0000" + str(store)
                else:
                    store_number = "000" + str(store)

                for terminal in range(1, number_of_terminals + 1):
                    if terminal < 10:
                    terminal_id = str(terminal) + "000"
                else:
                    terminal_id = str(terminal) + "00"
                    transaction_type = "RetailTransaction"
                    transaction_type_code = "Transaction"
                    transaction_date = date(2015, month, day)
                    transaction_date_str = transaction_date.isoformat()
                    transaction_time = time(random.randint(0, 23), random.randint(0, 59))
                    transaction_datetime = datetime.combine(transaction_date, transaction_time)
                    transaction_datetime_str = transaction_datetime.isoformat()
                    max_transactions = num_transactions

                    for transaction_number in range (0, max_transactions):
                        inactive_time = random.randint(80, 200)
                        item_count = random.randint(1, 15)
                        sequence_number = terminal_id + str(transaction_number)
                    transaction_datetime = transaction_datetime + timedelta(0, ring_time + special_time + inactive_time)

                    transaction_summary = {} 
                    transaction_summary["transaction_type"] = transaction_type
                    transaction_summary["transaction_type_code"] = transaction_type_code
                    transaction_summary["store_number"] = store_number
                    transaction_summary["sequence_number"] = sequence_number                    
                    transaction_summary["data_load_path"] = data_load_path
                    index_list.append(transaction_summary.copy())    

                operator_counter += 10 
                operator_id = '{0:04d}'.format(operator_counter)

    chunk_list.append(index_list)

if __name__ == '__main__':
    store_num = 1
    process_number = 6
    num_transactions = 10
    p = multiprocessing.Pool(process_number)
    results = [p.apply(makeFakeTransactions, args = (store_num, num_transactions,)) for store_num in xrange(1, 30, 10)]
    results = [p.apply(elasticIndexing, args = (index_list,)) for index_list in chunk_list]

我有一个全局变量chunk_list ,它附加在我的makeFakeTransactions函数的末尾,基本上是一个列表列表。 但是,当我在chunk_list的3个过程之后进行makeFakeTransactions的测试打印时,即使应该将其附加3次, chunk_list显示为空。 关于多处理中的全局列表变量,我做错什么了吗? 有一个更好的方法吗?

编辑: makeFakeTransactions追加一个字典副本index_list ,一旦所有的字典被附加到index_list ,其附加index_list全局变量chunk_list

首先,您的代码实际上并未并行运行。 根据文档,p.apply将阻塞直到完成,因此您在进程池上顺序运行任务。 您需要使用p.map_async启动任务,而不要等待它完成。

第二,正如评论中所说,全局状态不会在进程之间共享。 您可以使用共享内存,但是在这种情况下,将结果从工作进程中传回要简单得多。 由于除了收集结果外,您不使用chunk_list进行其他操作,因此您可以在计算后将结果发送回去,并在调用过程中收集它们。 使用multiprocessing.Pool很容易,您只需从worker函数返回结果:

return index_list

这将使p.apply()返回index_list。 p.apply_async()将返回一个AsyncResult ,该结果将返回带有AsyncResult.get() index_list 由于您已经在使用列表推导,因此修改很小:

p = multiprocessing.Pool(process_number)
async_results = [p.apply_async(makeFakeTransactions, args = (store_num, num_transactions,)) for store_num in xrange(1, 30, 10)]
results = [ar.get() for ar in async_results]

您可以使用p.map其简化到第一步,从而有效地完成前两行的工作。 注意p.map块,直到所有结果可用。

p = multiprocessing.Pool(process_number)
results = p.map(lambda store_num: makeFakeTransactions(store_num, num_transactions), xrange(1, 30, 10))

由于p.map需要单个参数函数,因此您需要将其包装在lambda中。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM