简体   繁体   English

Python 多处理列表问题

[英]Python Multi-Processing List Issue

I am attempting to dynamically open and parse through several text files (~10) to extract a particular value from key, for which I am utilizing multi-processing within Python to do this.我正在尝试动态打开和解析多个文本文件(~10)以从键中提取特定值,为此我在 Python 中利用多处理来执行此操作。 My issue is that the function that I am calling writes particular data to a class list which I can see in the method, however outside the method that list is empty.我的问题是我正在调用的函数将特定数据写入我可以在方法中看到的类列表,但在方法之外该列表是空的。 Refer to the following:请参阅以下内容:

class:班级:

class MyClass(object):
    __id_list = []

    def __init__(self):
        self.process_wrapper()

Caller Method:调用方法:

def process_wrapper(self):
    from multiprocessing import Pool
    import multiprocessing

    info_file = 'info*'
    file_list = []

    p = Pool(processes = multiprocessing.cpu_count() - 1)

    for file_name in Path('c:/').glob('**/*/' + info_file):
        file_list.append(str(os.path.join('c:/', file_name)))

    p.map_async(self.get_ids, file_list)

    p.close()
    p.join()

    print(self.__id_list) # this is showing as empty

Worker method:工人方法:

def get_ids(self, file_name):        
    try:
        with open(file_name) as data:
            for line in data:
                temp_split = line.split()
                for item in temp_split:
                    value_split = str(item).split('=')
                    if 'id' == value_split[0].lower():
                        if int(value_split[1]) not in self._id_list:
                            self.__id_list.append(int(value_split[1]))
    except:
        raise FileReadError(f'There was an issue parsing "{file_name}".')
    print(self.__id_list) # here the list prints fine

The map call returns a AysncResult class object.地图调用返回一个 AysncResult 类对象。 you should use that to wait for the processing to finish before checking self.__id_list.在检查 self.__id_list 之前,您应该使用它来等待处理完成。 also you might consider returning a local list, collected those lists and aggregating them into the final list.您也可以考虑返回一个本地列表,收集这些列表并将它们聚合到最终列表中。

1. It looks like you have a typo in your get_ids method ( self._id_list instead of self.__id_list ). 1. 看起来你的get_ids方法有错别字( self._id_list而不是self.__id_list )。 You can see it if you wait for the result:等结果就可以看到了:

result = p.map_async(self.get_ids, file_list)
result.get()

2. When a new child process is created, it gets a copy of the parent's address space however any subsequent changes (either by parent or child) are not reflected in the memory of the other process. 2. 当一个新的子进程被创建时,它会得到一份父进程地址空间的副本,但是任何后续的变化(无论是父进程还是子进程)都不会反映在另一个进程的内存中。 They each have their own private address space.他们每个人都有自己的私有地址空间。

Example:例子:

$ cat fork.py 
import os

l = []
l.append('global')

# Return 0 in the child and the child’s process id in the parent
pid = os.fork()

if pid == 0:
    l.append('child')
    print(f'Child PID: {os.getpid()}, {l}')
else:
    l.append('parent')
    print(f'Parent PID: {os.getpid()}, {l}')

print(l)

$ python3 fork.py 
Parent PID: 9933, ['global', 'parent']
['global', 'parent']
Child PID: 9934, ['global', 'child']
['global', 'child']

Now back to your problem, you can use multiprocessing.Manager.list to create an object that is shared between processes:现在回到您的问题,您可以使用multiprocessing.Manager.list创建一个在进程之间共享的对象:

from multiprocessing import Manager, Pool

m = Manager()
self.__id_list = m.list()

Docs: Sharing state between processes文档: 在进程之间共享状态

or use threads as your workload seems to be I/O bound anyway:或者使用线程,因为您的工作负载似乎无论如何都受 I/O 限制:

from multiprocessing.dummy import Pool as ThreadPool

p = ThreadPool(processes = multiprocessing.cpu_count() - 1)

Alternatively check concurrent.futures或者检查concurrent.futures

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM