[英]Requiring external libraries in ruby streaming scripts for Amazon EMR
How do I require external libraries when running Amazon EMR streaming jobs written in Ruby?在运行以 Ruby 编写的 Amazon EMR 流作业时,我如何需要外部库?
I've defined my mapper, and am getting this output in my logs:我已经定义了我的映射器,并在我的日志中得到了这个输出:
/mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201008110139_0001/attempt_201008110139_0001_m_000000_0/work/./mapper_stage1.rb: line 1: require: command not found /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201008110139_0001/attempt_201008110139_0001_m_000000_0/work/./mapper_stage1.rb:第 1 行:要求:找不到命令
My first reaction is that either the streaming jar isn't realizing that its executing a ruby script (I've got a shebang declaration at the top of the script pointing to /usr/bin/ruby) or that there's something funky going on with the way the streaming API deals with referencing external libraries.我的第一反应是流式 jar 没有意识到它正在执行一个 ruby 脚本(我在脚本顶部有一个指向 /usr/bin/ruby 的 shebang 声明)或者发生了一些奇怪的事情流式 API 处理引用外部库的方式。
Currently in Amazon Elastic Mapreduce, /usr/bin/ruby is a symbolic link pointing to /usr/bin/ruby1.8.目前在 Amazon Elastic Mapreduce 中,/usr/bin/ruby 是指向 /usr/bin/ruby1.8 的符号链接。 This is a dangerous interpreter to use, as it is ancient and buggy.这是一个使用起来很危险的解释器,因为它很古老而且有错误。
$ /usr/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]
If you're using one of the 64bit instances (like m1.xlarge) you can install Ruby Enterprise Edition in a bootstrap action.如果您使用的是 64 位实例之一(如 m1.xlarge),则可以在引导操作中安装Ruby Enterprise Edition 。 This goes into /usr/local/bin which has a higher path resolution precedence than the stock Ruby1.8, so service-nanny (which shebangs /usr/bin/ruby) still works, while your scripts can run on an interpreter that has been built in 2011, with a much higher patchlevel.这进入 /usr/local/bin ,它比普通的 Ruby1.8 具有更高的路径解析优先级,因此 service-nanny(shebangs /usr/bin/ruby)仍然有效,而您的脚本可以在具有的解释器上运行建于 2011 年,具有更高的补丁级别。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.