简体   繁体   English

覆盖AWS EMR主节点上的默认aws-sdk jar

[英]Overriding default aws-sdk jar on AWS EMR master node

I'm running into a problem with running my application on EMR master node. 我在EMR主节点上运行我的应用程序时遇到了问题。 It needs to access some AWS SDK methods added in ver 1.11. 它需要访问ver 1.11中添加的一些AWS SDK方法。 All the required dependencies were bundled into a fat jar and the application works as expected on my dev box. 所有必需的依赖项被捆绑到一个胖jar中,应用程序在我的开发框中按预期工作。

However, if the app is executed on EMR master node, it fail with NoSuchMethodError exception when calling a method, added in AWS SDK ver 1.11+, eg 但是,如果应用程序在EMR主节点上执行,则在调用AWS SDK ver 1.11+中添加的方法时,它会因NoSuchMethodError异常而失败,例如

java.lang.NoSuchMethodError:
 com.amazonaws.services.sqs.model.SendMessageRequest.withMessageDeduplicationId(Ljava/lang/String;)Lcom/amazonaws/services/sqs/model/SendMessageRequest;

I tracked it down to the classpath parameter passed to JVM instance, started by spark-submit: 我将它跟踪到传递给JVM实例的classpath参数,由spark-submit启动:

-cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/

In particular, it loads /usr/share/aws/aws-java-sdk/aws-java-sdk-sqs-1.10.75.1.jar instead of using ver 1.11.77 from my fat jar. 特别是,它加载/usr/share/aws/aws-java-sdk/aws-java-sdk-sqs-1.10.75.1.jar而不是使用我的胖罐中的ver 1.11.77。

Is there a way to force Spark to use the AWS SDK version I need? 有没有办法强制Spark使用我需要的AWS SDK版本?

Here is what I learned trying to troubleshoot this. 这是我学会了解决这个问题的方法。

The default class path parameter is constructed using spark.driver.extraClassPath settings from /etc/spark/conf/spark-defaults.conf . 默认的类路径参数使用来自spark.driver.extraClassPath设置构造/etc/spark/conf/spark-defaults.conf spark.driver.extraClassPath contains a reference to the older version AWS SDK, which is located in /usr/share/aws/aws-java-sdk/* spark.driver.extraClassPath包含对旧版AWS SDK的引用,该SDK位于/usr/share/aws/aws-java-sdk/*

To use the newer version of AWS API, I uploaded the jars to a dir I created in the home dir and specified it in --driver-class-path spark-submit parameter: 要使用较新版本的AWS API,我将jar上传到我在home目录中创建的目录,并在--driver-class-path spark-submit参数中指定:

--driver-class-path '/home/hadoop/aws/*'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM