[英]Reading and appending data from files to a list in parallel using python
我正在尝试读取几个文件,并将其中的某些元素附加到列表中。 读取文件似乎很慢,所以我认为multiprocessing
可能会帮助我。 我生成了以下代码来执行我想要的操作,基本上并行打开编号的file_%i
,并提取相关数据read_append
并将其附加到进程之间共享的global
数组res = manager.list()
。 下面给出的示例代码。 但是,这不起作用。 尝试打印a.shape
会给出示例代码下方包含的错误消息。 我不太确定如何修复这个错误的代码,并且对multiprocessing
很陌生。 我怀疑,我使用 SO 答案和用于多处理的手册页放在一起的这个 hacky 脚本远非理想。
import multiprocessing as mp
import numpy as np
from timeit import default_timer as timer
start = timer()
def read_append(input_list):
val, res_arr = input_list
data_file = np.load('file_%i.npz' %val, mmap_mode = 'r', allow_pickle=True)['data']
for i in range(len(data_file)):
res_arr.append(data_file[i][1])
return None
if __name__ == '__main__':
N= mp.cpu_count()
print(N)
with mp.Manager() as manager:
res = manager.list()
input_list = [(val, res) for val in range(2)]
with mp.Pool(processes = N) as p:
results = p.map(read_append,input_list)
end = timer()
print(end-start)
a = list(res)
print(a.shape)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/multiprocessing/managers.py in _callmethod(self, methodname, args, kwds)
810 try:
--> 811 conn = self._tls.connection
812 except AttributeError:
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
FileNotFoundError Traceback (most recent call last)
<ipython-input-13-35028af51086> in <module>
21 end = timer()
22 print(end-start)
---> 23 a = list(res)
24 print(a.shape)
<string> in __len__(self, *args, **kwds)
~/anaconda3/lib/python3.7/multiprocessing/managers.py in _callmethod(self, methodname, args, kwds)
813 util.debug('thread %r does not own a connection',
814 threading.current_thread().name)
--> 815 self._connect()
816 conn = self._tls.connection
817
~/anaconda3/lib/python3.7/multiprocessing/managers.py in _connect(self)
800 if threading.current_thread().name != 'MainThread':
801 name += '|' + threading.current_thread().name
--> 802 conn = self._Client(self._token.address, authkey=self._authkey)
803 dispatch(conn, None, 'accept_connection', (name,))
804 self._tls.connection = conn
~/anaconda3/lib/python3.7/multiprocessing/connection.py in Client(address, family, authkey)
490 c = PipeClient(address)
491 else:
--> 492 c = SocketClient(address)
493
494 if authkey is not None and not isinstance(authkey, bytes):
~/anaconda3/lib/python3.7/multiprocessing/connection.py in SocketClient(address)
617 with socket.socket( getattr(socket, family) ) as s:
618 s.setblocking(True)
--> 619 s.connect(address)
620 return Connection(s.detach())
621
FileNotFoundError: [Errno 2] No such file or directory
res
是一个global
变量,你为什么这么认为?shape
,numpy 数组有。res
。 因此,您需要在with mp.Manager() as manager
块中移动使用res
的代码:end-start
:示例固定代码:
import multiprocessing as mp
import numpy as np
from timeit import default_timer as timer
def read_append(input_list):
val, res_arr = input_list
data_file = np.load('file_%i.npz' %val, mmap_mode = 'r', allow_pickle=True)['data']
for i in range(len(data_file)):
res_arr.append(data_file[i][1])
return None
if __name__ == '__main__':
start = timer()
N= mp.cpu_count()
print(N)
with mp.Manager() as manager:
res = manager.list()
input_list = [(val, res) for val in range(2)]
with mp.Pool(processes = N) as p:
results = p.map(read_append,input_list)
a = np.array(res)
print(a.shape)
end = timer()
print(end - start)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.