高效经济地运行python程序的多个实例？

Question

I wrote a program that calls a function with the following prototype: 我写了一个用以下原型调用函数的程序：

def Process(n):

    # the function uses data that is stored as binary files on the hard drive and 
    # -- based on the value of 'n' -- scans it using functions from numpy & cython.    
    # the function creates new binary files and saves the results of the scan in them.
    #
    # I optimized the running time of the function as much as I could using numpy &  
    # cython, and at present it takes about 4hrs to complete one function run on 
    # a typical winXP desktop (three years old machine, 2GB memory etc).

My goal is to run this function exactly 10,000 times (for 10,000 different values of 'n') in the fastest & most economical way. 我的目标是以最快和最经济的方式运行此功能10,000次（对于10,000个不同的'n'值）。 following these runs, I will have 10,000 different binary files with the results of all the individual scans. 在这些运行之后，我将有10,000个不同的二进制文件，其中包含所有单独扫描的结果。 note that every function 'run' is independent (meaning, there is no dependency whatsoever between the individual runs). 请注意，每个函数“run”都是独立的（意味着，各个运行之间没有任何依赖关系）。

So the question is this. 所以问题是这个。 having only one PC at home, it is obvious that it will take me around 4.5 years (10,000 runs x 4hrs per run = 40,000 hrs ~= 4.5 years) to complete all runs at home. 在家里只有一台PC，显然我需要大约4。5年（10,000次运行×每次运行4小时= 40,000小时〜= 4。5年）才能在家完成所有运行。 yet, I would like to have all the runs completed within a week or two. 但是，我希望在一到两周内完成所有的运行。

I know the solution would involve accessing many computing resources at once. 我知道解决方案将涉及一次访问许多计算资源。 what is the best (fastest / most affordable, as my budget is limited) way to do so? 什么是最好的（最快/最实惠，因为我的预算有限）这样做的方式？ must I buy a strong server (how much would it cost?) or can I have this run online? 我必须购买一台强大的服务器（费用多少？）或者我可以在线运行吗？ in such a case, is my propritary code gets exposed, by doing so? 在这种情况下，通过这样做，我的propritary代码是否暴露？

in case it helps, every instance of 'Process()' only needs about 500MB of memory. 如果有帮助，'Process（）'的每个实例只需要大约500MB的内存。 thanks. 谢谢。

Answer 1

Check out PiCloud: http://www.picloud.com/ 查看PiCloud： http ：//www.picloud.com/

import cloud
cloud.call(function)

Maybe it's an easy solution. 也许这是一个简单的解决方案。

Answer 2

Does Process access the data on the binary files directly or do you cache it in memory? Process是直接访问二进制文件上的数据还是将其缓存在内存中？ Reducing the usage of I/O operations should help. 减少I / O操作的使用应该有所帮助。

Also, isn't it possible to break Process into separate functions running in parallel? 此外，是否有可能将Process分解为并行运行的单独函数？ How is the data dependency inside the function? 函数内部的数据依赖性如何？

Finally, you could give some cloud computing service like Amazon EC2 a try (don't forget to read this for tools), but it won't be cheap (EC2 starts at $0.085 per hour) - an alternative would be going to an university with a computer cluster (they are pretty common nowadays, but it will be easier if you know someone there). 最后，你可以尝试一些像亚马逊EC2这样的云计算服务（不要忘记阅读这个工具），但它不会便宜（EC2起价为每小时0.085美元） - 另一种方法是去大学有一个计算机集群（它们现在非常普遍，但如果你认识某人就会更容易）。

Answer 3

Well, from your description, it sounds like things are IO bound... In which case parallelism (at least on one IO device) isn't going to help much. 好吧，根据你的描述，它听起来像IO绑定...在这种情况下，并行性（至少在一个IO设备上）不会有太大帮助。

Edit: I just realized that you were referring more to full cloud computing, rather than running multiple processes on one machine... My advice below still holds, though.... PyTables is quite nice for out-of-core calculations! 编辑：我刚刚意识到你更多地指的是完整的云计算，而不是在一台机器上运行多个进程...我的建议仍然有用，但是...... PyTables非常适合核心外计算！

You mentioned that you're using numpy's mmap to access the data. 您提到您正在使用numpy的mmap来访问数据。 Therefore, your execution time is likely to depend heavily on how your data is structured on the disc. 因此，您的执行时间很可能在很大程度上取决于您的数据在光盘上的结构。

Memmapping can actually be quite slow in any situation where the physical hardware has to spend most of its time seeking (eg reading a slice along a plane of constant Z in a C-ordered 3D array). 在物理硬件必须花费大部分时间来寻求的任何情况下（例如，在C有序3D阵列中沿着常数Z的平面读取切片），Memmapping实际上可能非常慢。 One way of mitigating this is to change the way your data is ordered to reduce the number of seeks required to access the parts you are most likely to need. 减轻这种情况的一种方法是更改订购数据的方式，以减少访问您最有可能需要的部件所需的搜索次数。

Another option that may help is compressing the data. 另一个可能有用的选项是压缩数据。 If your process is extremely IO bound, you can actually get significant speedups by compressing the data on disk (and sometimes even in memory) and decompressing it on-the-fly before doing your calculation. 如果您的进程受到极大的IO限制，您实际上可以通过压缩磁盘上的数据（有时甚至是内存中）并在进行计算之前即时解压缩来获得显着的加速。

The good news is that there's a very flexible, numpy-oriented library that's already been put together to help you with both of these. 好消息是，有一个非常灵活，面向numpy的库已经被整合在一起，可以帮助你解决这两个问题。 Have a look at pytables . 看看pytables 。

I would be very surprised if tables.Expr doesn't significantly (~ 1 order of magnitude) outperform your out-of-core calculation using a memmapped array. 如果tables.Expr没有显着（~1个数量级），使用memmapped数组优于你的核外计算，我会感到非常惊讶。 See here for a nice, (though canned) example. 在这里看到一个很好的（虽然罐头）的例子。 From that example: 从那个例子：

PyTables与Numpy Memmap

高效经济地运行python程序的多个实例？

问题描述

3 个解决方案

解决方案1
9 2010-08-29 19:31:52

解决方案2
1 2010-08-29 19:01:11

解决方案3
1 2010-08-30 15:45:20

高效经济地运行python程序的多个实例？

问题描述

3 个解决方案

解决方案1 9 2010-08-29 19:31:52

解决方案2 1 2010-08-29 19:01:11

解决方案3 1 2010-08-30 15:45:20

解决方案1
9 2010-08-29 19:31:52

解决方案2
1 2010-08-29 19:01:11

解决方案3
1 2010-08-30 15:45:20