[英]Conflict between httpclient version and Apache Spark
I'm developing a Java application using Apache Spark. 我正在使用Apache Spark开发Java应用程序。 I use this version:
我用这个版本:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.2</version>
</dependency>
In my code, there is a transitional dependency: 在我的代码中,存在过渡依赖:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.2</version>
</dependency>
I package my application into a single JAR file. 我将我的应用程序打包到一个JAR文件中。 When deploying it on EC2 instance using
spark-submit
, I get this error. 使用
spark-submit
在EC2实例上部署它时,我收到此错误。
Caused by: java.lang.NoSuchFieldError: INSTANCE
at org.apache.http.conn.ssl.SSLConnectionSocketFactory.<clinit>(SSLConnectionSocketFactory.java:144)
at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.getPreferredSocketFactory(ApacheConnectionManagerFactory.java:87)
at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:65)
at com.amazonaws.http.apache.client.impl.ApacheConnectionManagerFactory.create(ApacheConnectionManagerFactory.java:58)
at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:50)
at com.amazonaws.http.apache.client.impl.ApacheHttpClientFactory.create(ApacheHttpClientFactory.java:38)
This error shows clearly that SparkSubmit
has loaded an older version of the same Apache httpclient library and this conflict happens for this reason. 此错误清楚地表明
SparkSubmit
已加载相同Apache httpclient库的旧版本,因此发生此冲突。
What is a good way to solve this issue? 解决这个问题的好方法是什么?
For some reason, I cannot upgrade Spark on my Java code. 出于某种原因,我不能在我的Java代码上升级Spark。 However, I could do that with the EC2 cluster easily.
但是,我可以轻松地使用EC2群集。 Is it possible to deploy my java code on a cluster with a higher version say 1.6.1 version?
是否可以在具有更高版本的1.6.1版本的集群上部署我的Java代码?
As said in your post, Spark is loading an older version of the httpclient
. 正如你在帖子中所说,Spark正在加载一个旧版本的
httpclient
。 The solution is to use the Maven's relocation
facility to produce a neat conflict-free project. 解决方案是使用Maven的
relocation
设施来生成一个整洁的无冲突项目。
Here's an example of how to use it in your pom.xml
file : 以下是如何在
pom.xml
文件中使用它的示例:
<project>
<!-- Your project definition here, with the groupId, artifactId, and it's dependencies -->
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.3</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<relocations>
<relocation>
<pattern>org.apache.http.client</pattern>
<shadedPattern>shaded.org.apache.http.client</shadedPattern>
</relocation>
</relocations>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
This will move all files from org.apache.http.client
to shaded.org.apache.http.client
, resolving the conflict. 这会将所有文件从
org.apache.http.client
移动到shaded.org.apache.http.client
,解决冲突。
Original post : 原帖:
If this is simply a matter of transitive dependencies, you could just add this to your spark-core
dependency to exclude the HttpClient used by Spark : 如果这只是传递依赖的问题,您可以将它添加到
spark-core
依赖项中以排除Spark使用的HttpClient:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.2</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</exclusion>
</exclusions>
</dependency>
I also added the scope
as provided
in your dependency as it will be provided by your cluster. 我还添加了依赖项中
provided
的scope
,因为它将由您的群集提供。
However, that might muck around with Spark's internal behaviour. 然而,这可能会破坏Spark的内部行为。 If you still get an error after doing this, you could try using Maven's
relocation
facility that should produce a neat conflict-free project. 如果在执行此操作后仍然出现错误,您可以尝试使用Maven的
relocation
工具来生成一个整洁的无冲突项目。
Regarding the fact you can't upgrade Spark's version, did you use exactly this dependency declaration from mvnrepository ? 关于无法升级Spark版本的事实,你是否使用了mvnrepository中的这个依赖声明?
Spark being backwards compatible, there shouldn't be any problem deploying your job on a cluster with a higher version. Spark向后兼容,在具有更高版本的群集上部署作业应该没有任何问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.