简体   繁体   English

在Amazon ec2上部署科学python算法

[英]Deploying scientific python algorithm on Amazon ec2

I have a Python scientific model that calls some C code and uses numpy, scipy, and many geographic analysis modules. 我有一个Python科学模型,它调用一些C代码并使用numpy,scipy和许多地理分析模块。 I would like to deploy it on EC2 but I don't know much about EC2 yet. 我想在EC2上部署它,但我对EC2还不太了解。

I have checked that I could use the StarCluster package to deploy my stack after setting up AMIs that are derived from StarCluster AMIs. 我已经检查过,在设置了源自StarCluster AMI的AMI后,我可以使用StarCluster包来部署我的堆栈。 These already have numpy and scipy and ipython, so all I would have to do is add geographic modules. 这些已经有numpy和scipy以及ipython,所以我所要做的就是添加地理模块。

My plan was to write a standalone GUI that runs on customers' machines and makes sure their inputs are valid for my model. 我的计划是编写一个独立的GUI,在客户的机器上运行,并确保他们的输入对我的模型有效。 Then the standalone GUI sends up to about 10 GB zipped archives to an FTP location. 然后,独立GUI将最多约10 GB的压缩档案发送到FTP位置。 Then they sign in to my web page I run on EC2 where they configure the run properties (# of instances, # of model runs). 然后他们登录我在EC2上运行的网页,在那里他们配置运行属性(实例数,模型运行数)。 That web page starts a script that does the customer's job on the cluster of size they specified. 该网页启动一个脚本,该脚本在客户指定的大小集群上完成客户的工作。 The a post processor processes the model output and writes results web pages and graphs that are initially password-protected for the customer viewing only. 后处理器处理模型输出并写入结果网页和图形,这些网页和图形最初受密码保护,仅供客户查看。 My model runs consist of individual iterations that may take 5 minutes to 3 hours. 我的模型运行包括可能需要5分钟到3个小时的单独迭代。

Can anyone offer any advice for ideal set up with this model? 任何人都可以提供任何有关此型号理想设置的建议吗? I think I can figure out the scientific part of it, but I don't see what the starting point is for running the web interface... 我想我可以弄清楚它的科学部分,但我不知道运行网络界面的起点是什么......

Thanks 谢谢

Interesting project! 有趣的项目!

Adding modules to the AMI you deployed on AWS EC2 can be done via pip. 将模块添加到您在AWS EC2上部署的AMI可以通过pip完成。 First you'll need SSH access to your instance. 首先,您需要SSH访问您的实例。 Documentation on this is here: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html Then if you don't have it installed already, you can install pip & your additional packages & modules as follows: 关于这方面的文档在这里: http//docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html然后,如果你还没有安装它,你可以安装pip和你的附加软件包和模块如下:

sudo apt-get install -y python-pip
sudo pip install numpy (already installed so no need for this)
sudo pip install scipy (same as above)

Ubuntu & Debian sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose Ubuntu和Debian sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose

The versions in Ubuntu 12.10 and Debian 7.0 meet the current Scipy stack specification. Ubuntu 12.10和Debian 7.0中的版本符合当前的Scipy堆栈规范。 Users might also want to add the NeuroDebian repository for extra Scipy packages. 用户可能还想为额外的Scipy包添加NeuroDebian存储库。 Fedora sudo yum install numpy scipy python-matplotlib ipython python-pandas sympy python-nose Fedora sudo yum安装numpy scipy python-matplotlib ipython python-pandas sympy python-nose

Users of Fedora 17 and earlier should then upgrade IPython using pip: sudo pip install --upgrade ipython (info above found via scipy documentation: http://www.scipy.org/install.html ) Fedora 17及更早版本的用户应该使用pip升级IPy:sudo pip install --upgrade ipython(上面的信息通过scipy文档找到: http//www.scipy.org/install.html

As for your plans for the GUI & large file upload, take a look at AWS S3 (though this has some limitations) for file storage & depending on how far you want to push your solution, you may to use chunked file uploading or stream a multi-part request similar to these solutions for the file transfers: 至于您的GUI和大文件上传计划,请查看AWS S3(虽然这有一些限制)用于文件存储,并且根据您想要推送解决方案的程度,您可以使用分块文件上传或流式传输多部分请求类似于文件传输的这些解决方案:

https://github.com/blueimp/jQuery-File-Upload/wiki/Chunked-file-uploads
https://devcenter.heroku.com/articles/paperclip-s3
https://github.com/heiflo/play21-file-upload-streaming
https://github.com/netty/netty/issues/845
https://github.com/playframework/playframework/pull/884
https://github.com/floatingfrisbee/amazonfileupload
http://blog.assimov.net/blog/2011/04/03/multi-file-upload-with-uploadify-and--carrierwave-on-rails-3/

(a quick search for "chunked file uploads github" or "chunked file uploads google code" should turn up lots of options in terms of available code & detailed information.) (快速搜索“chunked file uploads github”或“chunked file uploads google code”应该会在可用代码和详细信息方面提供很多选项。)

However, an easier direction for the file uploads/transfer may be to look at solutions like these: 但是,文件上传/传输的更简单方向可能是查看以下解决方案:

http://www.bucketexplorer.com/be-download.html
https://forums.aws.amazon.com/thread.jspa?messageID=258228&tstart=0
https://forums.aws.amazon.com/thread.jspa?messageID=257781&tstart=0
http://www.jfileupload.com/products/js3upload/index.html
http://codeonaboat.wordpress.com/2011/04/22/uploading-a-file-to-amazon-s3-using-an-asp-net-mvc-application-directly-from-the-users-browser/

Regardless, you'll want to make sure your environment on your EC2 instance &/or your S3 buckets are configured to allow large file uploads & processing. 无论如何,您需要确保EC2实例和/或S3存储桶上的环境配置为允许大文件上载和处理。 For example, your AMIs php version needs to be compiled & setup via php.ini to upload files over certain sizes - there are also timeouts you'll need to be aware of - and you will likely need a 64bit AMI along with a large EBS to power all this. 例如,您的AMI php版本需要通过php.ini进行编译和设置,以上传超过特定大小的文件 - 您还需要注意超时 - 并且您可能需要64位AMI以及大型EBS为这一切提供动力。

As for the less complex, front-end components of your GUI, jQuery or node.js are good starting points. 对于GUI的不太复杂的前端组件,jQuery或node.js是很好的起点。 There are also tons of code packages & documentation on Github or in the AWS EC2/S3 forums such as the following: Github或AWS EC2 / S3论坛上还有大量代码包和文档,如下所示:

https://github.com/josegonzalez/upload

Without knowing your specific requirements, plans & time/budget limitations, that's the most advice I can give. 在不知道您的具体要求,计划和时间/预算限制的情况下,这是我能给出的最多建议。 However, feel free to reply to this thread or ping me directly with any other questions. 但是,请随时回复此主题或直接ping我与任何其他问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM