简体   繁体   English

在 Amazon EMR 的 ruby 流脚本中需要外部库

[英]Requiring external libraries in ruby streaming scripts for Amazon EMR

How do I require external libraries when running Amazon EMR streaming jobs written in Ruby?在运行以 Ruby 编写的 Amazon EMR 流作业时,我如何需要外部库?

I've defined my mapper, and am getting this output in my logs:我已经定义了我的映射器,并在我的日志中得到了这个输出:

/mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201008110139_0001/attempt_201008110139_0001_m_000000_0/work/./mapper_stage1.rb: line 1: require: command not found /mnt/var/lib/hadoop/mapred/taskTracker/jobcache/job_201008110139_0001/attempt_201008110139_0001_m_000000_0/work/./mapper_stage1.rb:第 1 行:要求:找不到命令

My first reaction is that either the streaming jar isn't realizing that its executing a ruby script (I've got a shebang declaration at the top of the script pointing to /usr/bin/ruby) or that there's something funky going on with the way the streaming API deals with referencing external libraries.我的第一反应是流式 jar 没有意识到它正在执行一个 ruby 脚本(我在脚本顶部有一个指向 /usr/bin/ruby 的 shebang 声明)或者发生了一些奇怪的事情流式 API 处理引用外部库的方式。

Currently in Amazon Elastic Mapreduce, /usr/bin/ruby is a symbolic link pointing to /usr/bin/ruby1.8.目前在 Amazon Elastic Mapreduce 中,/usr/bin/ruby 是指向 /usr/bin/ruby1.8 的符号链接。 This is a dangerous interpreter to use, as it is ancient and buggy.这是一个使用起来很危险的解释器,因为它很古老而且有错误。

$ /usr/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [x86_64-linux]

If you're using one of the 64bit instances (like m1.xlarge) you can install Ruby Enterprise Edition in a bootstrap action.如果您使用的是 64 位实例之一(如 m1.xlarge),则可以在引导操作中安装Ruby Enterprise Edition This goes into /usr/local/bin which has a higher path resolution precedence than the stock Ruby1.8, so service-nanny (which shebangs /usr/bin/ruby) still works, while your scripts can run on an interpreter that has been built in 2011, with a much higher patchlevel.这进入 /usr/local/bin ,它比普通的 Ruby1.8 具有更高的路径解析优先级,因此 service-nanny(shebangs /usr/bin/ruby)仍然有效,而您的脚本可以在具有的解释器上运行建于 2011 年,具有更高的补丁级别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 Amazon EMR 上运行 Python UDF - Running Python UDF on Amazon EMR 亚马逊 MWS 客户端库 - Amazon MWS client libraries Amazon EMR:为每个 EMR 实例设置唯一数量的映射器和缩减器 - Amazon EMR: Set unique number of mappers and reducers per EMR instance 如何在 Amazon EMR 集群上远程提交 hadoop MR 作业 - How to submit hadoop MR job remotely on Amazon EMR cluster 不带 SDK 的 Amazon Transcribe Streaming API - Amazon Transcribe Streaming API without SDK 如何通过 amazon workspace 中的 EMR jupyter lab notebook 读取 postgres DB 表 - How to read postgres DB tables through EMR jupyter lab notebook from amazon workspace Spark 是否允许使用 Amazon Assumed Role 和 STS 临时凭证在 EMR 上进行 Glue 跨账户访问 - Does Spark allow to use Amazon Assumed Role and STS temporary credentials for Glue cross account access on EMR 无法在 aws EMR 集群中使用配置单元创建外部表,其中位置指向某个 S3 位置 - Unable to create external table using hive in aws EMR cluster where location pointing to some S3 location Buffered/RingBuffer IO in Ruby + Amazon S3 非阻塞块读取 - Buffered/RingBuffer IO in Ruby + Amazon S3 non-blocking chunk reads 在 EMR 上的 PySpark 中运行自定义 Java 类 - Running Custom Java Class in PySpark on EMR
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM