python代码的并行处理

Question

我正在处理我的代码。 在单个cpu上完成该代码需要花费很长时间。
因此，我在考虑是否有可能使代码进行并行处理。
代码框架如下：

def analyze_data(target_path):
   import os
   import math
   import itertools
   import numpy
   import scipy 
   ....
   for files in target_path:
      <a real long series of calculations......
      ...................>

   return
#Providing the dir search path:
dir_path = "/usr/target_dir/"
analyze_data(target_path=dir_path)

这段代码要花很长时间才能完成（显然要处理的文件数量很大）。
现在有一种方法可以在多处理线程中执行这种简单的编码格式，以使其运行更快吗？

谢谢。

Answer 1

请参阅文档（适用于python3）： https : //docs.python.org/3.4/library/multiprocessing.html

如果可以拆分目录以进行处理：

from multiprocessing import Pool

def analyze_data(target_path):
   import os
   import math
   import itertools
   import numpy
   import scipy 
   ....
   for files in target_path:
      <a real long series of calculations......
      ...................>

   return
#Providing the dir search path:

analyze_data(target_path=dir_path)

if __name__ == '__main__':
    with Pool(5) as p:
        dir_path1 = "/usr/target_dir/1"
        dir_path2 = "/usr/target_dir/2"
        dir_path3 = "/usr/target_dir/3"
        print(p.map(analyze_data, [dir_path1, dir_path2, dir_path3]))

Answer 2

通过使用称为pathos.multiprocessing的multiprocessing pathos.multiprocessing ，这可以非常容易……并且可以从解释器中很自然地完成。 我还将利用pox ，它具有os和sys模块中的文件系统实用程序。 首先让我们检查一下我设置的测试文件。 每个目录中都有几个文件。

>>> import os
>>> os.path.abspath('.')
'/tmp'
>>> import pox
>>> # find all the .txt files in and below the current directory
>>> pox.find('*.txt', '.')
['/tmp/xxx/1.txt', 'tmp/xxx/2.txt', 'tmp/xxx/3.txt', 'tmp/yyy/1.txt', 'tmp/yyy/2.txt', 'tmp/zzz/1.txt', 'tmp/zzz/2.txt', 'tmp/zzz/3.txt', 'tmp/zzz/4.txt']
>>> # let's look at the contents of one of the files
>>> print open('xxx/1.txt', 'r').read()
45125123412
12341234123
12342134234
23421342134

所有文件都具有相似的内容...因此，让我们开始并行处理文件。

>>> import time
>>> import pathos
>>> # build a thread pool of workers
>>> thPool = pathos.multiprocessing.ThreadingPool 
>>> tp = thPool()
>>> 
>>> # expensive per-file processing
>>> def doit(file):
...     with open(file, 'r') as f:
...         x = sum(int(i) for i in f.readlines())
...     time.sleep(1) # make it 'expensive'
...     return len(str(x))**2  # some calculation
... 
>>> # grab all files from a directory, then do some final 'analysis'
>>> def analyze_data(target_path):
...     return min(*tp.uimap(doit, pox.find('*.txt', target_path)))
... 
>>> analyze_data('.')
121

实际上， analyze_data并不重要，因为find不需要在每个目录下工作……但这就是问题中指定的结构。 在这里，您将用昂贵的按文件任务替换大多数doit ，并用按目录处理替换min 。 根据计算的昂贵程度，您可能要使用pathos.multiprocessing.ProcessingPool而不是ThreadingPool －前者将产生多个进程，而后者仅产生多个线程。 前者具有更多的开销，但可以更好地并行处理更昂贵的任务。 在这里，我们使用uimap来调用上提供一个无序的迭代器doit上的每个文件。

在这里获取pathos和pox ： https : //github.com/uqfoundation

python代码的并行处理

问题描述

2 个解决方案

解决方案1
0 2015-05-23 09:35:12

解决方案2
0 2015-05-23 14:48:19

python代码的并行处理

问题描述

2 个解决方案

解决方案1 0 2015-05-23 09:35:12

解决方案2 0 2015-05-23 14:48:19

解决方案1
0 2015-05-23 09:35:12

解决方案2
0 2015-05-23 14:48:19