简体   繁体   English

使用地图进行多重处理

[英]Multiprocessing using map

I have a list of strings and on every string I am doing some changes that you can see in wordify() . 我有一个字符串列表,并且每个字符串都在做一些更改,您可以在wordify()看到这些更改。 Now, to speed this up, I split up the list into sublists using chunked() (the number of sublists is the number of CPU cores - 1). 现在,为了加快速度,我使用chunked()将列表分为多个子列表chunked()子列表的数量是CPU核心的数量-1)。 That way I get lists that look like [[,,],[,,],[,,],[,,]] . 这样,我得到的列表看起来像[[,,],[,,],[,,],[,,]]

What I try to achieve: 我想要达到的目标:

I want to do wordify() on every of these sublists simultaneously, returning the sublists as separate lists. 我想同时在每个这些子列表上执行wordify() ,将子列表作为单独的列表返回。 I want to wait until all processes finish and then join these sublists into one list. 我想等到所有进程完成后再将这些子列表加入一个列表中。 The approach below does not work. 下面的方法不起作用。

import multiprocessing
from multiprocessing import Pool
from contextlib import closing

def readFiles():
    words = []
    with open("somefile.txt") as f:
        w = f.readlines()
    words = words + w 
    return words


def chunked(words, num_cpu):
    avg = len(words) / float(num_cpu)
    out = []
    last = 0.0    
    while last < len(words):
        out.append(words[int(last):int(last + avg)])
        last += avg    
    return out    


def wordify(chunk,wl):
    wl.append([chunk[word].split(",", 1)[0] for word in range(len(chunk))]) 
    return wl


if __name__ == '__main__':
    num_cpu = multiprocessing.cpu_count() - 1
    words = readFiles()
    chunked = chunked(words, num_cpu)
    wordlist = []
    wordify(words, wordlist) # works
    with closing(Pool(processes = num_cpu)) as p:
        p.map(wordify, chunked, wordlist) # fails

You have write your code so that you're just passing a single function to map ; 您已经编写了代码,以便只传递一个map函数。 it's not smart enough to know that your hoping it passes wordlist into the second argument of your function. 知道您希望它将wordlist传递给函数的第二个参数还不够聪明。

TBH partial function application is a bit clunky in Python, but you can use functools.partial : TBH部分函数应用程序在Python中有点笨拙,但是您可以使用functools.partial

from functools import partial
p.map(partial(wordify, wordlist), chunked)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM