简体   繁体   English

如何设置EMR类路径

[英]How do I set my EMR Classpath

I am running a job on an AWS EMR cluster, and am having issues with a Jackson library conflict. 我在AWS EMR集群上运行工作,并且遇到Jackson库冲突问题。 Based on the article here I tried to add a bootstrap step to set my classpath with the following script: 根据这里的文章我尝试使用以下脚本添加一个引导步骤来设置我的类路径:

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true;
echo "HADOOP_CLASSPATH=s3n://bucket/myjar.jar" > /home/hadoop/conf/hadoop-user-env.sh

I have built my jar so that all its dependencies are included with it. 我已经构建了我的jar,以便包含它的所有依赖项。 The first problem I have when I do this is that my enable debugging step that I have dies with the following error: 我这样做的第一个问题是我的启用调试步骤,我已经死了以下错误:

Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2427)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2440)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2479)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2461)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:372)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.fetchFile(ScriptRunner.java:39)
at com.amazon.elasticmapreduce.scriptrunner.ScriptRunner.main(ScriptRunner.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 13 more

So I have two questions, what is wrong with this regards to the enable debugging step also? 所以我有两个问题,这个启用调试步骤的问题是什么? Is it valid to give my classpath as a s3 location? 将我的类路径作为s3位置是否有效? If not what should the value of: 如果不是,那应该是什么价值:

/path/to/my.jar

be in the example on the page indicated above? 在上面指出的页面上的示例中?

Looking at your bootstrap action, it looks like there might be a mistake in your string. 查看您的引导操作,看起来您的字符串中可能存在错误。 The line should look like the following: 该行应如下所示:

#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true
echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh

Note the ' >> ' characters. 注意' >> '字符。 A single ' > ' means that you're replacing the entire file with the output of the 'echo' command, whereas a double '>>' means you're appending that line at the end of the script. 单个' > '表示您使用'echo'命令的输出替换整个文件,而double'>>表示您在脚本末尾附加该行。 Additionally, a semi-colon isn't needed in a Bash script. 此外,Bash脚本中不需要分号。

References : http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html 参考文献: http//docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html

PS : Amazon's awesome support found this question and replied to my email; PS:亚马逊非常棒的支持发现了这个问题并回复了我的电子邮件; although this question was not asked by me. 虽然我没有问过这个问题。 So this is the attribution to the author - AWS Support Engineer named Rendy O. 所以这是作者的归属 - AWS支持工程师名为Rendy O.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM