[英]Python multiprocessing.Pool & memory
I'm using Pool.map
for a scoring procedure: 我正在使用
Pool.map
进行评分:
The results are independent. 结果是独立的。
I'm just wondering if I can avoid the memory demand. 我只是想知道我是否可以避免内存需求。 At first it seems that every array goes into python and then the 2 and 3 are proceed.
最初,似乎每个数组都进入python,然后继续执行2和3。 Anyway I have a speed improvement.
无论如何,我的速度都有提高。
#data src and sink is in mongodb#
def scoring(some_arguments):
### some stuff and finally persist ###
collection.update({uid:_uid},{'$set':res_profile},upsert=True)
cursor = tracking.find(timeout=False)
score_proc_pool = Pool(options.cores)
#finaly I use a wrapper so I have only the document as input for map
score_proc_pool.map(scoring_wrapper,cursor,chunksize=10000)
Am I doing something wrong or is there a better way with python for this purpose? 我是在做错什么,还是为此目的使用python有更好的方法?
The map
functions of a Pool
internally convert the iterable to a list if it doesn't have a __len__
attribute. 如果没有
__len__
属性,则Pool
的map
函数会在内部将其迭代为列表。 The relevant code is in Pool.map_async
, as that is used by Pool.map
(and starmap
) to produce the result - which is also a list. 相关代码位于
Pool.map_async
,因为Pool.map
(和starmap
)使用该代码来产生结果-这也是一个列表。
If you don't want to read all data into memory first, you should use Pool.imap
or Pool.imap_unordered
, which will produce an iterator that will yield the results as they come in. 如果您不想首先将所有数据读入内存,则应使用
Pool.imap
或Pool.imap_unordered
,这将产生一个迭代器,该迭代器将在输入结果时产生结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.