简体   繁体   English

使用AWS的雪(和降雪)在R中进行并行处理

[英]Using snow (and snowfall) with AWS for parallel processing in R

In relation to my earlier similar SO question , I tried using snow/snowfall on AWS for parallel computing. 关于我之前的类似SO问题 ,我尝试在AWS上使用雪/降雪进行并行计算。

What I did was: 我做的是:

  • In the sfInit() function, I provided the public DNS to socketHosts parameter like so sfInit(parallel=TRUE,socketHosts =list("ec2-00-00-00-000.compute-1.amazonaws.com")) sfInit()函数中,我提供了公共DNS到socketHosts参数,如sfInit(parallel=TRUE,socketHosts =list("ec2-00-00-00-000.compute-1.amazonaws.com"))
  • The error returned was Permission denied (publickey) 返回的错误是Permission denied (publickey)
  • I then followed the instructions (I presume correctly!) on http://www.imbi.uni-freiburg.de/parallel/ in the 'Passwordless Secure Shell (SSH) login' section 然后我按照'无密码安全外壳(SSH)登录'部分中的http://www.imbi.uni-freiburg.de/parallel/上的说明(我假设正确!)
  • I just cat the contents of the .pem file that I created on AWS into the ~/.ssh/authorized_keys of the AWS instance I want to connect to from my master AWS instance and for the master AWS instance as well 我只是将我在AWS上创建的.pem文件的内容捕获到我要从我的主AWS实例连接的AWS实例的〜/ .ssh / authorized_keys以及主AWS实例中。

Is there anything I am missing out ? 我有什么遗漏的吗? I would be very grateful if users can share their experiences in the use of snow on AWS. 如果用户可以分享他们在AWS上使用雪的经验,我将非常感激。

Thank you very much for your suggestions. 非常感谢你的建议。

UPDATE: I just wanted to update the solution I found to my specific problem: 更新:我只想更新我发现的具体问题的解决方案:

  • I used StarCluster to setup my AWS cluster : StarCluster 我使用StarCluster来设置我的AWS集群: StarCluster
  • Installed package snowfall on all the nodes of the cluster 在集群的所有节点上安装了软件包snowfall
  • From the master node issued the following commands 从主节点发出以下命令
  • hostslist <- list("ec2-xxx-xx-xxx-xxx.compute-1.amazonaws.com","ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com")
  • sfInit(parallel=TRUE, cpus=2, type="SOCK",socketHosts=hostslist)
  • l <- sfLapply(1:2,function(x)system("ifconfig",intern=T))
  • lapply(l,function(x)x[2])
  • sfStop()
  • The ip information confirmed that the AWS nodes were being utilized ip信息确认正在使用AWS节点

Looks not that bad but the pem file is wrong. 看起来不错,但pem文件是错误的。 But it is sometimes not that simple and many people have to fight with this issues. 但它有时并不那么简单,许多人不得不与这个问题作斗争。 A lot of tips you can find in this post: 你可以在这篇文章中找到很多提示:

From my experience most people have problems in these steps: 根据我的经验,大多数人在这些步骤中遇到问题:

  • Can you log onto the machines via ssh? 你能通过ssh登录机器吗? (ssh ec2-00-00-00-000.compute-1.amazonaws.com). (ssh ec2-00-00-00-000.compute-1.amazonaws.com)。 Try to use the public DNS, not the public IP to connect. 尝试使用公共DNS,而不是公共IP进行连接。
  • You should check your "Security groups" in AWS if the 22 port is open for all machines! 如果所有计算机都打开了22端口,您应该检查AWS中的“安全组”!

If you plan to start more than 10 worker machines you should work on a MPI installation on your machines (much better performance!) 如果您计划启动10台以上的工作机器,您应该在机器上安装MPI(性能要好得多!)

Markus from cloudnumbers.com :-) 来自cloudnumbers.com的Markus :-)

I believe @Anatoliy is correct: you're using an X.509 certificate. 我相信@Anatoliy是正确的:你正在使用X.509证书。 For the precise steps to take to add the SSH keys, look at the "Types of credentials" section of the EC2 Starters Guide . 有关添加SSH密钥的准确步骤,请查看“ EC2入门指南 ”的“凭据类型”部分。

To upload your own SSH keys, take a look at this page from Alestic . 要上传自己的SSH密钥,请查看Alestic的此页面

It is a little confusing at first, but you'll want to keep clear which are your access keys, your certificates, and your key pairs, which may appear in text files with DSA or RSA. 起初有点令人困惑,但您需要明确哪些是您的访问密钥,证书和密钥对,这些密钥对可能出现在带有DSA或RSA的文本文件中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM