[英]Global List not updating when using multiprocessing in python
我有一些代码(这不是完整的文件):
chunk_list = []
def makeFakeTransactions(store_num, num_transactions):
global chunk_list
startTime = datetime.now()
data_load_datetime = startTime.isoformat()
data_load_name = "Faked Data v2.2"
data_load_path = "data was faked"
index_list = []
number_of_stores = store_num + 10
number_of_terminals = 13
for month in range(1, 13):
number_of_days = 30
extra_day_months = [1, 3, 5, 7, 8, 10, 12]
if month == 2:
number_of_days = 28
elif month in extra_day_months:
number_of_days = 31
for day in range(1, number_of_days + 1):
for store in range(store_num, number_of_stores):
operator_id = "0001"
operator_counter = 1
if store < 11:
store_number = "0000" + str(store)
else:
store_number = "000" + str(store)
for terminal in range(1, number_of_terminals + 1):
if terminal < 10:
terminal_id = str(terminal) + "000"
else:
terminal_id = str(terminal) + "00"
transaction_type = "RetailTransaction"
transaction_type_code = "Transaction"
transaction_date = date(2015, month, day)
transaction_date_str = transaction_date.isoformat()
transaction_time = time(random.randint(0, 23), random.randint(0, 59))
transaction_datetime = datetime.combine(transaction_date, transaction_time)
transaction_datetime_str = transaction_datetime.isoformat()
max_transactions = num_transactions
for transaction_number in range (0, max_transactions):
inactive_time = random.randint(80, 200)
item_count = random.randint(1, 15)
sequence_number = terminal_id + str(transaction_number)
transaction_datetime = transaction_datetime + timedelta(0, ring_time + special_time + inactive_time)
transaction_summary = {}
transaction_summary["transaction_type"] = transaction_type
transaction_summary["transaction_type_code"] = transaction_type_code
transaction_summary["store_number"] = store_number
transaction_summary["sequence_number"] = sequence_number
transaction_summary["data_load_path"] = data_load_path
index_list.append(transaction_summary.copy())
operator_counter += 10
operator_id = '{0:04d}'.format(operator_counter)
chunk_list.append(index_list)
if __name__ == '__main__':
store_num = 1
process_number = 6
num_transactions = 10
p = multiprocessing.Pool(process_number)
results = [p.apply(makeFakeTransactions, args = (store_num, num_transactions,)) for store_num in xrange(1, 30, 10)]
results = [p.apply(elasticIndexing, args = (index_list,)) for index_list in chunk_list]
我有一个全局变量chunk_list
,它附加在我的makeFakeTransactions
函数的末尾,基本上是一个列表列表。 但是,当我在chunk_list
的3个过程之后进行makeFakeTransactions
的测试打印时,即使应该将其附加3次, chunk_list
显示为空。 关于多处理中的全局列表变量,我做错什么了吗? 有一个更好的方法吗?
编辑: makeFakeTransactions
追加一个字典副本index_list
,一旦所有的字典被附加到index_list
,其附加index_list
全局变量chunk_list
。
首先,您的代码实际上并未并行运行。 根据文档,p.apply将阻塞直到完成,因此您在进程池上顺序运行任务。 您需要使用p.map_async启动任务,而不要等待它完成。
第二,正如评论中所说,全局状态不会在进程之间共享。 您可以使用共享内存,但是在这种情况下,将结果从工作进程中传回要简单得多。 由于除了收集结果外,您不使用chunk_list进行其他操作,因此您可以在计算后将结果发送回去,并在调用过程中收集它们。 使用multiprocessing.Pool
很容易,您只需从worker函数返回结果:
return index_list
这将使p.apply()
返回index_list。 p.apply_async()
将返回一个AsyncResult
,该结果将返回带有AsyncResult.get()
index_list
。 由于您已经在使用列表推导,因此修改很小:
p = multiprocessing.Pool(process_number)
async_results = [p.apply_async(makeFakeTransactions, args = (store_num, num_transactions,)) for store_num in xrange(1, 30, 10)]
results = [ar.get() for ar in async_results]
您可以使用p.map
其简化到第一步,从而有效地完成前两行的工作。 注意p.map
块,直到所有结果可用。
p = multiprocessing.Pool(process_number)
results = p.map(lambda store_num: makeFakeTransactions(store_num, num_transactions), xrange(1, 30, 10))
由于p.map
需要单个参数函数,因此您需要将其包装在lambda中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.